有来有去9527 个人主页

@bmfire

有来有去9527

2023-08-14 17:33:37 加入 DevPress

简介

该用户还未填写简介

擅长的技术栈

未填写擅长的技术栈

可提供的服务

暂无可提供的服务

[昇腾推理优化] 基于昇腾npu的mooncake部署指导手册

本文解决了vllm-ascend官方镜像环境不匹配问题，详细记录了mooncake配置vllm-ascend的完整过程。首先完成mooncake环境搭建，包括安装依赖、编译和启动服务；随后解决vllm和torch版本冲突问题，统一降级到兼容版本；最后通过lmcache benchmark测试验证效果。测试结果表明：1)不使用mooncake时，TTFT增加拐点与NPU内存容量匹配；2)使用moon

#语言模型

[昇腾推理优化] 基于昇腾npu的mooncake部署指导手册

#语言模型

[模型量化]-大模型量化效果评价-Qwen2.5-72B

本文对Qwen2.5-72B-Instruct模型进行了量化效果评测，主要考察精度损失和推理性能。使用msit/msmodelslim工具进行w8a8和w4a16两种量化，并在evalscope工具上进行测试。结果显示：w8a8量化最大精度损失仅0.012，w4a16为0.0261；在性能方面，w8a8在8卡部署时吞吐提升1.46倍，而w4a16性能提升有限，适合并发需求低的场景。测试数据表明，w

#人工智能 #语言模型

[模型量化]-大模型量化效果评价-Qwen2.5-72B

#人工智能 #语言模型

pytorch转onnx报错： Failed to export an ONNX attribute ‘onnx::Gather’, since it’s not constant

python转onnx报错Failed to export an ONNX attribute ‘onnx::Gather’, since it’s not constant, please try to make things (e.g., kernel size) static if possible

#pytorch #python #深度学习

pytorch转onnx报错： Failed to export an ONNX attribute ‘onnx::Gather’, since it’s not constant

python转onnx报错Failed to export an ONNX attribute ‘onnx::Gather’, since it’s not constant, please try to make things (e.g., kernel size) static if possible

#pytorch #python #深度学习

trition模型注册和访问验证（易错点加粗）

trition服务启动后，关于模型注册和推理请求易错汇总

#python #开发语言

成功编译TensorRT-LLM

运行步骤参考/root/autodl-tmp/files/TensorRT-LLM/examples/gpt目录下的readme文档。由于系统中的cudnn是deb安装的，所以去Nvidia下载deb安装，可以直接对旧版本进行覆盖。于是决定通过在公有云申请资源，通过配置TRT-LLM编译依赖环境的方式进行编译。启动已下载的docker镜像，查看编译TRT-LLM的主要依赖项版本。模型保存在/roo

#人工智能 #深度学习

vscode_cuda调试环境搭建

vscode搭建cuda调试环境

#vscode #ide #编辑器

到底了