根据零刻官方测试数据,SER9 Pro系列(HX370与AI 365机型)已全面支持本地部署DeepSeek-R1大模型,并可通过AMD Radeon核显加速运算。
详细性能数据
机型 | 内存 | 显存 | 显卡 |
---|---|---|---|
SER9 Pro 370 | 64G | 48G | Radeon 890M |
大模型 | 显存占用大小 | 你好 | 你是谁 | 写一个贪吃蛇的代码 |
---|---|---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B-Q2 | 1.1G | 67.65 tok/sec, 37 tokens, 0.02s to first token | 60.61 tok/sec, 58 tokens, 0.03s to first token | 57.40 tok/sec, 1358 tokens, 0.17s to first token |
DeepSeek-R1-Distill-Qwen-1.5B-Q4 | 1.2G | 69.80 tok/sec, 17 tokens, 0.03s to first token | 66.70 tok/sec, 40 tokens, 0.16s to first token | 56.30 tok/sec, 1816 tokens, 0.11s to first token |
DeepSeek-R1-Distill-Qwen-1.5B-Q8 | 1.9G | 50.60 tok/sec, 31 tokens, 0.21s to first token | 50.21 tok/sec, 40 tokens, 0.17s to first token | 44.15 tok/sec, 1609 tokens, 0.11s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q2 | 3.2G | 24.02 tok/sec, 16 tokens, 0.10s to first token | 22.08 tok/sec, 171 tokens, 0.06s to first token | 20.27 tok/sec, 1829 tokens, 0.63s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q4 | 4.6G | 19.52 tok/sec, 32 tokens, 0.08s to first token | 18.68 tok/sec, 128 tokens, 0.06s to first token | 16.72 tok/sec, 1614 tokens, 0.38s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q8 | 7.5G | 12.28 tok/sec, 37 tokens, 0.10s to first token | 12.26 tok/sec, 222 tokens, 0.09s to first token | 11.52 tok/sec, 1684 tokens, 0.37s to first token |
DeepSeek-R1-Distill-Qwen-8B-Q2 | 3.6G | 22.42 tok/sec, 25 tokens, 0.38s to first token | 20.77 tok/sec, 294 tokens, 0.10s to first token | 19.40 tok/sec, 1060 tokens, 0.38s to first token |
DeepSeek-R1-Distill-Llama-8B-Q4 | 5.1G | 19.36 tok/sec, 23 tokens, 0.38s to first token | 18.29 tok/sec, 203 tokens, 0.08s to first token | 16.58 tok/sec, 1145 tokens, 0.37s to first token |
DeepSeek-R1-Distill-Llama-8B-Q8 | 8.3G | 11.66 tok/sec, 40 tokens, 0.33s to first token | 11.32 tok/sec, 331 tokens, 0.10s to first token | 9.43 tok/sec, 3123 tokens, 0.67s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q2 | 6.5G | 11.49 tok/sec, 31 tokens, 0.18s to first token | 10.82 tok/sec, 197 tokens, 0.13s to first token | 9.79 tok/sec, 1534 tokens, 1.53s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q4 | 9G | 10.66 tok/sec, 31 tokens, 0.16s to first token | 10.03 tok/sec, 239 tokens, 0.11s to first token | 9.27 tok/sec, 1351 tokens, 1.37s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q8 | 14.2G | 6.71 tok/sec, 17 tokens, 0.19s to first token | 6.30 tok/sec, 224 tokens, 0.17s to first token | 5.94 tok/sec, 1206 tokens, 0.67s to first token |
机型 | 内存 | 显存 | 显卡 |
---|---|---|---|
SER9 Pro 365 | 32G | 24G | Radeon 880M |
大模型 | 显存占用大小 | 你好 | 你是谁 | 写一个贪吃蛇的代码 |
---|---|---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B-Q2 | 1.3G | 60.97 tok/sec, 41 tokens, 0.32s to first token | 64.73 tok/sec, 181 tokens, 0.32s to first token | 49.53 tok/sec, 8834 tokens, 0.17s to first token |
DeepSeek-R1-Distill-Qwen-1.5B-Q4 | 1.3G | 67.41 tok/sec, 31 tokens, 0.17s to first token | 63.32 tok/sec, 192 tokens, 0.22s to first token | 53.61 tok/sec, 2919 tokens, 0.28s to first token |
DeepSeek-R1-Distill-Qwen-1.5B-Q8 | 2G | 52.14 tok/sec, 17 tokens, 0.24s to first token | 50.12 tok/sec, 40 tokens, 0.20s to first token | 44.54 tok/sec, 602 tokens, 0.26s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q2 | 3.2G | 23.65 tok/sec, 17 tokens, 0.10s to first token | 22.13 tok/sec, 176 tokens, 0.09s to first token | 20.79 tok/sec, 921 tokens, 0.39s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q4 | 4.8G | 19.90 tok/sec, 31 tokens, 0.10s to first token | 18.84 tok/sec, 222 tokens, 0.07s to first token | 17.24 tok/sec, 1758 tokens, 0.78s to first token |
DeepSeek-R1-Distill-Qwen-7B-Q8 | 7.8G | 12.94 tok/sec, 32 tokens, 0.12s to first token | 12.47 tok/sec, 128 tokens, 0.11s to first token | 11.55 tok/sec, 1797 tokens, 0.39s to first token |
DeepSeek-R1-Distill-Qwen-8B-Q2 | 3.6G | 22.86 tok/sec, 34 tokens, 0.42s to first token | 21.64 tok/sec, 180 tokens, 0.10s to first token | 19.67 tok/sec, 828 tokens, 0.43s to first token |
DeepSeek-R1-Distill-Qwen-8B-Q4 | 5.3G | 18.97 tok/sec, 40 tokens, 0.49s to first token | 18.06 tok/sec, 522 tokens, 0.10s to first token | 15.87 tok/sec, 1874 tokens, 0.81s to first token |
DeepSeek-R1-Distill-Qwen-8B-Q8 | 8.6G | 11.92 tok/sec, 40 tokens, 0.65s to first token | 11.56 tok/sec, 179 tokens, 0.13s to first token | 10.65 tok/sec, 1648 tokens, 0.45s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q2 | 6.4G | 13.21 tok/sec, 17 tokens, 0.18s to first token | 12.24 tok/sec, 171 tokens, 0.17s to first token | 10.73 tok/sec, 1323 tokens, 0.93s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q4 | 9.2G | 10.81 tok/sec, 31 tokens, 0.19s to first token | 10.25 tok/sec, 249 tokens, 0.13s to first token | 9.07 tok/sec, 1541 tokens, 1.58s to first token |
DeepSeek-R1-Distill-Qwen-14B-Q8 | 15G | 6.80 tok/sec, 17 tokens, 0.23s to first token | 6.44 tok/sec, 199 tokens, 0.21s to first token | 6.03 tok/sec, 1231 tokens, 1.44s to first token |
说明:
- 显存占用大小:随着模型复杂度的增加,显存占用也随之增大。
- 响应速度(tokens/sec):在“你好”和“你是谁”任务中,DeepSeek-R1-Distill-Qwen-1.5B-Q4模型的响应速度最快;而在“写一个贪吃蛇的代码”任务中,DeepSeek-R1-Distill-Qwen-1.5B-Q2模型的响应速度最快。
- 首token响应时间(seconds):在大多数任务中,DeepSeek-R1-Distill-Qwen-7B-Q4模型的首token响应时间最短。
以下是零刻SER9 Pro HX370(AI370)与SER9 Pro 365(AI365)在本地部署DeepSeek-R1大模型的性能对比分析:
硬件配置对比
机型 | CPU | 核显 | 内存 | 显存 |
---|---|---|---|---|
SER9 Pro 370 | AMD Ryzen AI 9 370 | Radeon 890M | 64GB | 48GB |
SER9 Pro 365 | AMD Ryzen AI 9 365 | Radeon 880M | 32GB | 24GB |
性能对比总结
1. 显存占用与模型兼容性
- HX370(48G显存):可支持更高量化级别(如Q8)的14B大模型(显存占用14.2G),显存利用率更高,冗余显存可能支持多任务并行。
- AI365(24G显存):部分高量化模型(如14B-Q8)显存占用达15G,接近显存上限,可能影响稳定性或无法运行更大模型。
2. 推理速度(tok/sec)
- 小模型(1.5B):
- Q2/Q4量化下,AI365(如Owen-1.5B-Q4)表现略优(67.41 tok/sec vs 69.80 tok/sec),但差距较小;
- Q8量化下,HX370(Queen-1.5B-Q8)速度下降更明显(50.60→44.15 tok/sec),推测与核显性能差异有关。
- 大模型(7B/8B/14B):
- HX370在相同量化级别下普遍更快(如7B-Q4:19.52 vs 19.90 tok/sec),Radeon 890M的核显性能优势显现。
3. 首Token生成时间
- HX370:首Token延迟更低,尤其在复杂任务(如“写贪吃蛇代码”)中优势显著(14B-Q8:0.67s vs 1.44s),可能受益于更大显存和更高带宽。
- AI365:首Token时间波动较大,高负载任务下延迟增加明显(如14B-Q4:1.58s)。
4. 任务负载适应性
- 短任务(“你好”“你是谁”):两者差异较小,AI365在部分场景甚至略优(如1.5B-Q4的“你好”任务)。
- 长任务(“写代码”):HX370显著领先,14B-Q2模型速度达9.79 tok/sec(AI365为10.73 tok/sec),且总token处理量更高(1534 vs 1323)。
结论
- HX370(Ryzen AI 9 HX370):
适合部署大型高量化模型(如14B-Q8),显存充足,长任务处理速度更快,首Token延迟更低,综合性能更强。 - AI365(Ryzen AI 9 365):
性价比更高,适合中小模型(1.5B-7B)和短任务场景,但在高负载任务中显存和核显性能可能成为瓶颈。