（转载）高频交易程序员的硬件优化水平非常高

huangchong

此帖转自 Caravel 在军事天地（Military）的帖子：高频交易程序员的硬件优化水平非常高

以前买买提有个魏老师，也是对硬件非常精通，跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.

（ヅ） · 帖子由 **（ヅ）** » 2025年 1月 27日 00:14

拉光纤，搞微波基站的这帮人

赖美豪中

这是搞笑把，这是cuda马工的基本功啊

huangchong 写了： 2025年 1月 26日 13:35 此帖转自 Caravel 在军事天地（Military）的帖子：高频交易程序员的硬件优化水平非常高

以前买买提有个魏老师，也是对硬件非常精通，跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.

新未名空间

（转载）高频交易程序员的硬件优化水平非常高

#1 （转载）高频交易程序员的硬件优化水平非常高

#2 Re: （转载）高频交易程序员的硬件优化水平非常高

#3 Re: （转载）高频交易程序员的硬件优化水平非常高