【vLLM大模型TPS测试三部曲】
【代码】【vLLM大模型TPS测试三部曲】
·
安装
pip install vllm
模型自行下载
- 例如: https://modelscope.cn/models/jackle/Qwen2.5-Coder-32B-GPTQ-Int4/
部署测试
export VLLM_MODEL=Qwen2.5-Coder-32B-GPTQ-Int4
# 启动
python3 -m vllm.entrypoints.openai.api_server --model $VLLM_MODEL --device=auto --enforce-eager --tensor-parallel-size=1 --max-model-len=4096 --dtype=float16 --block-size=32 --trust-remote-code --port=9000
# 测试
curl -X POST "http://127.0.0.1:9000/v1/chat/completions" \
-H "Authorization: Bearer xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "$VLLM_MODEL",
"messages": [
{"role": "user", "content": "What are some fun things to do in New York?"}
],
"max_tokens": 2048,
"temperature": 0.0,
"stream": false
}'
效果

更多推荐



所有评论(0)