LM Studio使用MTP的qwen3.6-27B-以7840hs的780M为例

虎鲸不是鱼

1567人浏览 · 2026-05-25 12:20:16

虎鲸不是鱼 · 2026-05-25 12:20:16 发布

LM Studio使用MTP的qwen3.6-27B-以7840hs的780M为例

前言

体验了GPU卸载数拉高的好处：https://lizhiyong.blog.csdn.net/article/details/161386515

还有个更加惊喜的功能，那就是最近llama.cpp的MTP新特性。

更新版本

老版本肯定是不行的，需要更新LM Studio到新版本，7840hs的780M是基于Vulkan运行的，还需要把llama.cpp的依赖版本更新：

在这里插入图片描述

实测0.4.14(Build 4)的LM Studio和2.16.0的llama.cpp可运行MTP模型。

启用MTP配置

在这里插入图片描述

设置中的开发者模式必开！！！

在这里插入图片描述

模型要下载一个有MTP标签的，不然没有MTP开关配置！！！选使用人数更多的unsloth的版本就行。

在这里插入图片描述

带MTP的模型就可以启用开关，当然GPU卸载还是要拉满！！！96GB大内存跑30B级别的小模型不存在OOM。重启后配置会生效。

Chat

虎鲸是鱼嘛？

在这里插入图片描述

灰常好，GPU负载很高，CPU负载很低，且内存占用>30GB，显然核显开始发力了！！！

Think

在这里插入图片描述

思考过程只用了1min32s。

Here's a thinking process:

Analyze User Input:
Question: 虎鲸是鱼嘛？ (Is an orca/killer whale a fish?)
Key Entity: 虎鲸 (Orca/Killer Whale)
Core Question: Is it a fish? (Classification question)
Identify Key Biological Facts:
Orca (Orcinus orca) is a marine mammal.
It belongs to the family Delphinidae (oceanic dolphins).
Mammals vs. Fish:
Mammals breathe air with lungs, give birth to live young, nurse them with milk, are warm-blooded.
Fish breathe water with gills, typically lay eggs, are cold-blooded (mostly), have scales/fins.
Therefore, orcas are definitely NOT fish.
Formulate Response (Mental Refinement in Chinese):
Direct answer: 不是，虎鲸不是鱼。
Explain classification: 虎鲸是哺乳动物，属于鲸目海豚科。
Key differences from fish:
用肺呼吸（需要浮出水面换气）
胎生、哺乳
恒温动物
没有鳃和鳞片
Add interesting/contextual info (optional but helpful): 虽然生活在海里，外形像鱼，但生物学分类上属于哺乳纲。常被误认为是鱼是因为“鲸”字带“鱼”旁（汉字构字特点），但实际与鱼类完全不同。
Keep it concise and accurate.
Draft Response (Chinese): 不是，虎鲸不是鱼。
虎鲸是哺乳动物，属于鲸目海豚科。虽然它生活在海洋中、外形流线型像鱼，但和鱼类有本质区别：
🐋 用肺呼吸（必须定期浮出水面换气）
👶 胎生并用乳汁哺育幼崽
🌡️ 恒温动物（体温稳定）
❌ 没有鳃、鳞片或鱼鳍结构
中文里“鲸”字带“鱼”旁是汉字造字习惯，并不代表它是鱼类。生物学上，虎鲸和海豚、蓝鲸一样，都属于海洋哺乳动物。
Self-Correction/Verification:
Accuracy: Correct. Orcas are cetaceans (mammals), not fish.
Clarity: Clear distinction between mammals and fish.
Language: Natural Chinese, appropriate tone.
Structure: Direct answer first, then bullet points for key differences, then etymology note.
No overcomplication, stays focused on the question.
All facts verified against standard biological knowledge.
Ready. Output matches the draft.✅

Token生成速度

在这里插入图片描述

生成速度达到了6.71t/s。

对比情况

Mode	无MTP	启用MTP	提升比例
Think	1min49s	1min32s	15.6%
Answer	3.5t/s	6.71t/s	91.71%

虽然Think阶段提升不是很大，但Answer阶段接近Double，在GPU卸载最大的情况下，7840hs的780M核显已经枯木逢春！！！本地私有化部署及内网开发从勉强能跑，到基本能用的伟大跃迁。

可以暂时不用急着换AI Max 395和128GB大内存了。

转载请注明出处：https://lizhiyong.blog.csdn.net/article/details/161387982

在这里插入图片描述

加入AMD AI开发者计划！

免费领 200 小时云算力，进群参与显卡、AI PC 幸运抽奖

更多推荐

VLLMService Operator 开发第六篇：给模型服务增加 Service 自动创建能力

做到这里，VLLMService Operator 已经不只是创建 Deployment 了，它开始具备完整服务编排的雏形。-> Pod-> Pod-> Service这一步非常关键，因为 Service 是后续接入流量入口的基础。没有 Service，HTTPRoute 就没有稳定的后端目标；-> Pod-> Service-> Gateway也就是说，Service 是从“模型 Pod 能跑起

AMD开发者中国社区

VLLMService Operator 开发第五篇：部署 Operator 并验证模型服务

存储准备完成后，就可以创建 VLLMService 资源了。metadata:spec:labels:port: 8000resources:requests:cpu: "2"limits:cpu: "4"storage:这里需要明确一点：VLLMService 本身只是用户声明的期望状态，它并不会直接运行模型。真正运行模型的是 Operator 根据这个 VLLMService 自动创建出来的

AMD开发者中国社区

Higress v2.2.3 发布：AI Gateway 能力增强，Gateway API 及其推理扩展持续打磨

Higress v2.2.3 版本发布，带来多项核心更新：新增 AI Gateway 的 vLLM 协议透传和上下文限制插件，增强 AI 安全防护与负载均衡；优化 Gateway API 隔离与推理路由配置；改进 Ingress 迁移体验；加固安全认证组件与 TLS 校验；修复多项 Console 问题。社区层面，Higress 正式入驻 CNCF Sandbox。