Qwen3.5-4B模型部署

菜地里的小菜鸟

297人浏览 · 2026-06-24 11:07:08

菜地里的小菜鸟 · 2026-06-24 11:07:08 发布

1.系统环境

NVIDIA T4 * 2 /16G * 2 Driver Version: 535.154.05 CUDA Version: 12.2

2.打开modelscope选择对应的模型，我选择Qwen3.5-4B

https://www.modelscope.cn/models

3.vllm镜像下载，使用vllm容器加载模型

docker pull vllm/vllm-openai:latest

4.使用vllm镜像下载Qwen3.5-4B模型文件

docker run --rm -it \
    --gpus all \
    --entrypoint /bin/bash \
    --pids-limit -1 \
    --security-opt seccomp=unconfined \
    -v /root/lipengcheng/qwen35_4b:/models \
    -e OMP_NUM_THREADS=8 \
    vllm/vllm-openai:latest \
    -c "pip install modelscope && python3 -c \"from modelscope import snapshot_download; snapshot_download('Qwen/Qwen3.5-4B', cache_dir='/models')\""

5.使用vllm加载Qwen3.5-4B模型

version: '3.8'

services:
  vllm-qwen:
    image: vllm/vllm-openai:latest
    container_name: vllm-qwen35-4b
    privileged: true
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]
    volumes:
      - /root/lipengcheng/qwen35_4b/Qwen:/models
    ports:
      - "23333:8000"
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    command: >
      --model /models/Qwen3___5-4B
      --host 0.0.0.0
      --port 8000
      --dtype half
      --served-model-name qwen3.5-4b
      --max-model-len 8192
      --tensor-parallel-size 2
      --default-chat-template-kwargs '{"enable_thinking": false}'
    restart: unless-stopped

6.测试

curl http://localhost:23333/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-4b",
    "messages": [
      {"role": "user", "content": "你好"}
    ],
    "temperature": 0.7
  }'

亚马逊云科技技术品牌专区

更多推荐

Kiro Editor 开发实战：使用 Cargo 构建、测试与性能优化指南

欢迎来到这篇终极指南，我们将深入探索如何使用Rust构建高性能的终端文本编辑器Kiro Editor。无论你是Rust新手还是经验丰富的开发者，这篇完整教程将带你了解如何利用Cargo工具链进行高效的开发、测试和性能优化，打造一款快速、轻量且功能强大的UTF-8文本编辑器。## 什么是Kiro Editor？Kiro Editor是一款使用Rust编写的极简终端文本编辑器，它最初是著名编辑