vLLM cpu版的编译安装参考:https://skywalk.blog.csdn.net/article/details/154336915

经过测试文心ernie4.5-0.3b模型可以使用

28b模型没调通

同步启动

export VLLM_USE_ASYNC_ENGINE=False

dtype设为float ,max_model_len 设为默认的一半

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440 --dtype float

但是竟然SCNet的虚拟机被整崩了,道心破碎了,不去弄它了。

deepseek 14b模型正在调试

等以后心情恢复了再测试吧。

结论

cpu跑小模型还是可以的,比如ernie4.5-0.3b

但是,老的cpu无法跑,新的cpu,说实话,一般都配置gpu、dcu等加速卡,也轮不到cpu跑啊!

调试

28b模型报错

(APIServer pid=9014)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=9014)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=9014) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=9014)   Value error, The repository baidu/ERNIE-4.5-VL-28B-A3B-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking .
(APIServer pid=9014)  You can inspect the repository content at https://hf.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking.
(APIServer pid=9014) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

安装提示,加入trust_remote_code=True

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code 

启动模型报错 Run `pip install decord`

(APIServer pid=9729) ImportError: This modeling file requires the following packages that were not found in your environment: decord. Run `pip install decord`

按照提示安装

启动模型报错Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

(EngineCore_DP0 pid=10197)     raise ValueError(
(EngineCore_DP0 pid=10197) ValueError: To serve at least one request with the models's max seq len (131072), (7.00 GiB KV cache is needed, which is larger than the available KV cache memory (4.00 GiB). Based on the available memory, the estimated maximum model length is 74880. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

按照提示,将max_model_len

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440

报错Please set dtype to torch.float or set weights_prepack to False.

(EngineCore_DP0 pid=10587) ERROR 11-12 11:19:28 [core.py:855] AssertionError: BF16 weight prepack needs the cpu support avx_ne_convert or avx512bw, avx512vl and avx512dq, but the desired instruction sets are not available. Please set dtype to torch.float or set weights_prepack to False.
禁用权重预打包试试

export VLLM_WEIGHTS_PREPACK=False

不行

试试dtype设为float

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust_remote_code --max_model_len 37440 --dtype float

这个可以,但是出现新的问题

报错(APIServer pid=11605)     with launch_core_engines(vllm_config, executor_class, log_stats) as (

(APIServer pid=11605) Traceback (most recent call last):
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/bin/vllm", line 33, in <module>
(APIServer pid=11605)     sys.exit(load_entry_point('vllm==0.11.1rc7.dev21+gd381eb967.cpu', 'console_scripts', 'vllm')())
(APIServer pid=11605)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=11605)     args.dispatch_function(args)
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
(APIServer pid=11605)     uvloop.run(run_server(args))
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=11605)     return __asyncio.run(
(APIServer pid=11605)            ^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=11605)     return runner.run(main)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=11605)     return self._loop.run_until_complete(task)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=11605)     return await main
(APIServer pid=11605)            ^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1944, in run_server
(APIServer pid=11605)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1963, in run_server_worker
(APIServer pid=11605)     async with build_async_engine_client(
(APIServer pid=11605)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=11605)     return await anext(self.gen)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client
(APIServer pid=11605)     async with build_async_engine_client_from_engine_args(
(APIServer pid=11605)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=11605)     return await anext(self.gen)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
(APIServer pid=11605)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=11605)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/utils/func_utils.py", line 116, in inner
(APIServer pid=11605)     return fn(*args, **kwargs)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
(APIServer pid=11605)     return cls(
(APIServer pid=11605)            ^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
(APIServer pid=11605)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=11605)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=11605)     return AsyncMPClient(*client_args)
(APIServer pid=11605)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 808, in __init__
(APIServer pid=11605)     super().__init__(
(APIServer pid=11605)   File "/root/.pyenv/versions/3.12.12/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 469, in __init__
(APIServer pid=11605)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=11605)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1160

使用同步启动试试

export VLLM_USE_ASYNC_ENGINE=False

结果:报错容器内存溢出退出

这是SCNet的服务器内存溢出,一个调试环境直接退出了,这没法搞了,只能到此为止了。

Logo

免费领 100 小时云算力,进群参与显卡、AI PC 幸运抽奖

更多推荐