(转载)高频交易程序员的硬件优化水平非常高

STEM版,合并数学,物理,化学,科学,工程,机械。不包括生物、医学相关,和计算机相关内容。

版主: verdeliteTheMatrix

回复
头像
huangchong(净坛使者)楼主
论坛元老
论坛元老
2023-24年度优秀版主
帖子互动: 4114
帖子: 61059
注册时间: 2022年 7月 22日 01:22

#1 (转载)高频交易程序员的硬件优化水平非常高

帖子 huangchong(净坛使者)楼主 »

此帖转自 Caravel 在 军事天地(Military) 的帖子:高频交易程序员的硬件优化水平非常高

以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
头像
(ヅ)
论坛支柱
论坛支柱
帖子互动: 549
帖子: 11819
注册时间: 2022年 8月 21日 14:20

#2 Re: (转载)高频交易程序员的硬件优化水平非常高

帖子 (ヅ) »

拉光纤,搞微波基站的这帮人
赖美豪中(my pronouns: ha/ha)
论坛元老
论坛元老
2023年度优秀版主
帖子互动: 4485
帖子: 46395
注册时间: 2022年 9月 6日 12:50

#3 Re: (转载)高频交易程序员的硬件优化水平非常高

帖子 赖美豪中(my pronouns: ha/ha) »

这是搞笑把,这是cuda马工的基本功啊
huangchong 写了: 2025年 1月 26日 13:35 此帖转自 Caravel 在 军事天地(Military) 的帖子:高频交易程序员的硬件优化水平非常高

以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
If printing money would end poverty, printing diplomas would end stupidity.
回复

回到 “STEM”