Qwen2.5-Coder-32B-Instruct-AWQ模型部署

菜地里的小菜鸟

385人浏览 · 2026-06-28 15:52:29

菜地里的小菜鸟 · 2026-06-28 15:52:29 发布

1.系统环境

NVIDIA T4 * 2 /16G * 2 Driver Version: 535.154.05 CUDA Version: 12.2
Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

2.vllm镜像下载，使用vllm加载模型

docker pull vllm/vllm-openai:latest

3.模型下载

阿里魔搭社区

https://www.modelscope.cn/models

使用vllm容器下载

docker run --rm -it \
    --gpus all \
    --entrypoint /bin/bash \
    --pids-limit -1 \
    --security-opt seccomp=unconfined \
    -v /root/lipengcheng/qwen2532ia:/models \
    -e OMP_NUM_THREADS=8 \
    vllm/vllm-openai:latest \
    -c "pip install modelscope && python3 -c \"from modelscope import snapshot_download; snapshot_download('Qwen/Qwen2.5-Coder-32B-Instruct-AWQ', cache_dir='/models')\""

4.加载Qwen2.5-Coder-32B-Instruct-AWQ模型

docker run --gpus all -d -p 8000:8000 --name qwen2.5-coder32 \
    --ipc=host \
    --pids-limit -1 \
    --security-opt seccomp=unconfined \
    -v /root/lipengcheng/qwen2532ia/Qwen/Qwen2___5-Coder-32B-Instruct-AWQ:/model \
    -e HF_DATASETS_OFFLINE=1 \
    -e TRANSFORMERS_OFFLINE=1 \
    -e OMP_NUM_THREADS=16 \
    vllm/vllm-openai:latest \
    --model /model \
    --tensor-parallel-size 2 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

看到如下日志就说明加载成功了

5.模型测试

测试命令

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "/model",
  "messages": [{"role": "user", "content": "你好"}]
}'

返回内容

{"id":"chatcmpl-bf4f4555eeceea94","object":"chat.completion","created":1778649567,"model":"/model","choices":[{"index":0,"message":{"role":"assistant","content":"你好！有什么我可以帮忙的吗？","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":30,"total_tokens":39,"completion_tokens":9,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

加入AMD AI开发者计划！

免费领 200 小时云算力，进群参与显卡、AI PC 幸运抽奖

更多推荐

调查研究-202 SGLang 深度解析：为什么大模型推理框架不只是“把模型跑起来“

AMD开发者中国社区

AI-Gateway-LLM网关与多模型路由

企业——在之上，专述等多实现的路由、限流、、缓存与可观测契约；并与对齐。：vLLM/KV/量化见；Agent 应用组件见；Guardrails 见。

AMD开发者中国社区

如何在Oracle Agent Factory中配置国内厂商的LLM？

果想用国内LLM或者其他中转的LLM，通常是兼容OpenAI的模式，但是OpenAI这里配置是写死的，无法自定义baseurl：如果想直接通过vLLM配置，默认无法指定api key：看来默认就无法使用其他LLM了？已跟PM反馈过此问题，回复后续版本会考虑支持这个OpenAI兼容的LLM配置。但是目前有没有workaround呢？实际测试，通过Nginx配置结合vLLM的入口，可以实现连通国内LL