LM Studio的GPU卸载对Token生成速度的影响-以7840hs的780M核显为例

背景

1年前已经验证了核显跑32b的可行性:https://lizhiyong.blog.csdn.net/article/details/147237895

据说最近新出了更聪明的小模型,正好也验证下如何提升下老爷机的Token生成速度。

其中有个关键的参数就是GPU卸载。

默认情况

LM Studio中默认的GPU卸载=4:

在这里插入图片描述

chat

虎鲸是鱼嘛?

在这里插入图片描述

此时GPU核显的负载并不高,但是CPU很高,显然负载大部分给到了CPU,导致Token生成速度并不是非常理想:

Think

Here's a thinking process:

Analyze User Input:
Question: "虎鲸是鱼嘛?" (Are orcas/fish?)
Language: Chinese
Core question: Classification of 虎鲸 (orca/killer whale) - is it a fish?
Identify Key Facts:
虎鲸 (Orca/Killer Whale) is a marine mammal.
It belongs to the order Cetacea, family Delphinidae.
Characteristics: breathes air with lungs, gives birth to live young, nurses them with milk, maintains constant body temperature, has hair (at least at birth).
Fish are cold-blooded, breathe through gills, lay eggs (mostly), have scales.
Therefore, 虎鲸 is NOT a fish.
Formulate Response (Internal Refinement - Chinese):
Direct answer: 不是。
Explanation: 虎鲸是哺乳动物,属于鲸目海豚科。它用肺呼吸、胎生、哺乳、恒温,不具备鱼类特征(如鳃呼吸、卵生、变温等)。虽然生活在海洋里且外形似鱼,但分类学上明确属于哺乳类。
Keep it concise as per system prompt.
Check against Constraints:
Plain, complete, no moralizing/disclaimers/hedging.
Follow exact format if requested (none here).
Concise by default.
Language matches user (Chinese).
Draft: 不是。虎鲸是哺乳动物,属于鲸目海豚科。它用肺呼吸、胎生、哺乳、恒温,不具备鱼类的鳃呼吸和卵生等特征。虽然生活在海洋且外形似鱼,但分类学上明确属于哺乳类。

Final Output Generation: (matches draft)Direct, accurate, concise, matches all constraints.No extra fluff.Ready.
Output matches the refined draft.
Proceeds.
[Done]
Self-Correction/Verification during thought:
Is it a fish? No.
What is it? Marine mammal (cetacean).
Key differentiators: lungs, live birth, milk, warm-blooded vs gills, eggs, cold-blooded.
All correct. Output matches.Output generation.
“不是。虎鲸是哺乳动物,属于鲸目海豚科。它用肺呼吸、胎生、哺乳、恒温,不具备鱼类用鳃呼吸和卵生等特征。虽生活在海洋且外形似鱼,但分类学上明确属于哺乳类。” (Slightly polished for flow) -> matches exactly.Done.
Output matches response.
[Proceeds]

在这里插入图片描述

显然思考过程是3min58s。

Token生成速度

在这里插入图片描述

不是。虎鲸是哺乳动物,属于鲸目海豚科。它用肺呼吸、胎生、哺乳、恒温,不具备鱼类用鳃呼吸和卵生等特征。虽生活在海洋且外形似鱼,但分类学上明确属于哺乳类。

Token生成速度是2.64t/s。

提高GPU卸载数

由于96GB内存,有48GB给核显,跑个Q4的30b级别小模型还是够的:

在这里插入图片描述

直接把GPU卸载拉满!!!改成64!!!改完要重启才会生效。

chat

在这里插入图片描述

可以看到此时CPU负载明显下降,GPU负载终于是拉起来了!!!而且内存占用也升高了!!!

Think

Here's a thinking process:

Analyze User Input:
Question: "虎鲸是鱼嘛?" (Is an orca a fish?)
Language: Chinese
Core concept: Biological classification of orcas (killer whales)
Identify Key Facts:
Orca = 虎鲸 (hǔ jīng) in Chinese
Scientific name: Orcinus orca
Taxonomy: Animalia > Chordata > Mammalia > Cetacea > Odontoceti > Delphinidae
Key characteristic: It's a marine mammal, not a fish.
Distinguishing features from fish: Breaths air with lungs (not gills), gives birth to live young (viviparous), nurses them with milk, has hair (though minimal), is warm-blooded, etc.
Formulate Response (Internal Refinement - Chinese):
Direct answer: 不是。
Explanation: 虎鲸是哺乳动物,属于鲸目齿鲸亚目海豚科。
Key differences from fish: 用肺呼吸、胎生、哺乳、恒温等。
Keep it concise as requested.
Draft Response (Chinese):
不是。虎鲸是哺乳动物,属于鲸类(海豚科)。它们用肺呼吸、胎生、哺乳、恒温,与鱼类在分类和生理结构上完全不同。

Check against constraints:

Plain, complete, no moralizing/disclaimers
Follow exact format if requested (none here)
Concise by default
Matches language (Chinese)
Accurate scientifically
All good. Ready to output.

在这里插入图片描述

同一个问题,Think耗时缩短到1min49s。

Token生成速度

在这里插入图片描述

明显打到了3.5t/s。

对比情况

相同的模型,同一个问题,结果差异不大,性能提升还是比较明显的:

Mode 默认情况的Time 提高GPU卸载后的Time 提升比例
Think 3min58s 1min49s 54.2%
Answer 2.64t/s 3.5t/s 32.58%

提升还是蛮可观的。

转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/161386515

在这里插入图片描述

Logo

免费领 100 小时云算力,进群参与显卡、AI PC 幸运抽奖

更多推荐