vLLM cpu版可以支持哪些流行的大模型
本文记录了在CPU环境下编译安装vLLM并调试文心ERNIE-4.5系列模型的过程。测试发现ERNIE-4.5-0.3B小模型可以正常运行,但28B大模型调试失败,出现多种错误:包括需添加trust_remote_code参数、内存不足、AVX指令集缺失等问题。最终通过设置dtype=float和max_model_len参数降低内存需求后,仍因虚拟机内存溢出而终止测试。结论表明CPU仅适合运行小
vLLM cpu版的编译安装参考:https://skywalk.blog.csdn.net/article/details/154336915
经过测试文心ernie4.5-0.3b模型可以使用
28b模型没调通
同步启动
export VLLM_USE_ASYNC_ENGINE=False
dtype设为float ,max_model_len 设为默认的一半
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440 --dtype float
但是竟然SCNet的虚拟机被整崩了,道心破碎了,不去弄它了。
deepseek 14b模型正在调试
等以后心情恢复了再测试吧。
结论
cpu跑小模型还是可以的,比如ernie4.5-0.3b
但是,老的cpu无法跑,新的cpu,说实话,一般都配置gpu、dcu等加速卡,也轮不到cpu跑啊!
调试
28b模型报错
(APIServer pid=9014) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=9014) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=9014) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=9014) Value error, The repository baidu/ERNIE-4.5-VL-28B-A3B-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking .
(APIServer pid=9014) You can inspect the repository content at https://hf.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking.
(APIServer pid=9014) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
安装提示,加入trust_remote_code=True
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code
启动模型报错 Run `pip install decord`
(APIServer pid=9729) ImportError: This modeling file requires the following packages that were not found in your environment: decord. Run `pip install decord`
按照提示安装
启动模型报错Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
(EngineCore_DP0 pid=10197) raise ValueError(
(EngineCore_DP0 pid=10197) ValueError: To serve at least one request with the models's max seq len (131072), (7.00 GiB KV cache is needed, which is larger than the available KV cache memory (4.00 GiB). Based on the available memory, the estimated maximum model length is 74880. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
按照提示,将max_model_len
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440
报错Please set dtype to torch.float or set weights_prepack to False.
(EngineCore_DP0 pid=10587) ERROR 11-12 11:19:28 [core.py:855] AssertionError: BF16 weight prepack needs the cpu support avx_ne_convert or avx512bw, avx512vl and avx512dq, but the desired instruction sets are not available. Please set dtype to torch.float or set weights_prepack to False.
禁用权重预打包试试
export VLLM_WEIGHTS_PREPACK=False
不行
试试dtype设为float
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440 --dtype float
这个可以,但是出现新的问题
报错(APIServer pid=11605) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=11605) Traceback (most recent call last):
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/bin/vllm", line 33, in <module>
(APIServer pid=11605) sys.exit(load_entry_point('vllm==0.11.1rc7.dev21+gd381eb967.cpu', 'console_scripts', 'vllm')())
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=11605) args.dispatch_function(args)
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
(APIServer pid=11605) uvloop.run(run_server(args))
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=11605) return __asyncio.run(
(APIServer pid=11605) ^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=11605) return runner.run(main)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=11605) return self._loop.run_until_complete(task)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=11605) return await main
(APIServer pid=11605) ^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1944, in run_server
(APIServer pid=11605) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1963, in run_server_worker
(APIServer pid=11605) async with build_async_engine_client(
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=11605) return await anext(self.gen)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client
(APIServer pid=11605) async with build_async_engine_client_from_engine_args(
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=11605) return await anext(self.gen)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
(APIServer pid=11605) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/utils/func_utils.py", line 116, in inner
(APIServer pid=11605) return fn(*args, **kwargs)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
(APIServer pid=11605) return cls(
(APIServer pid=11605) ^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
(APIServer pid=11605) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=11605) return AsyncMPClient(*client_args)
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 808, in __init__
(APIServer pid=11605) super().__init__(
(APIServer pid=11605) File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 469, in __init__
(APIServer pid=11605) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=11605) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1160
使用同步启动试试
export VLLM_USE_ASYNC_ENGINE=False
结果:报错容器内存溢出退出
这是SCNet的服务器内存溢出,一个调试环境直接退出了,这没法搞了,只能到此为止了。
更多推荐




所有评论(0)