显存需求较高, 本地4G显存0.5B都无法部署
支持多机多卡部署
支持GPU、CPU混合运行
支持运行格式pt,safetensors,npcache,dummy,gguf,bitsandbytes,layered

环境信息

机器01
操作系统:Debain 12.9/Ubuntu 24.04
CPU:i7-10750H
内存:32G
显卡:GTX 1650(4G)
硬盘:SSD(1T)
IP:192.168.3.17

基础组件安装

基础组件安装

创建python虚拟环境

python3 -m venv ~/sglang
source ~/sglang/bin/activate

安装python模块

# 使用清华大学python源,https://pypi.tuna.tsinghua.edu.cn/simple
pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sgl-kernel --force-reinstall --no-deps -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install "sglang[all]>=0.4.3.post2" -i https://mirrors.aliyun.com/pypi/simple/
# If you encounter ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama', try to use the specified version of transformers in pyproject.toml. Currently, just running
pip install modelscope unsloth unsloth_zoo bitsandbytes transformers==4.48.3 -i https://mirrors.aliyun.com/pypi/simple/

下载模型

modelscope download --model 'unsloth/DeepSeek-R1-Distill-Qwen-1.5B' --local_dir 'unsloth/DeepSeek-R1-Distill-Qwen-1.5B'

部署模型

python -m sglang.launch_server --model-path ~/ollama/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf --quantization gguf --cpu-offload-gb 4 --dtype float16 --context-length 16380 --api-key sg-5bgrMOCJ5OSBKQV5XbHz --trust-remote-code --host 0.0.0.0 --port 14144

本地资源有限,暂无效果图

Logo

更多推荐