高频交易程序员的硬件优化水平非常高

对应老买买提的军事天地,观点交锋比较激烈,反驳不留情面,请作好心理准备。因为此版帖子太多,所以新帖不出现在首页新帖列表,防止首页新帖刷屏太快。


版主: Softfist

回复
Caravel楼主
论坛元老
论坛元老
Caravel 的博客
帖子互动: 693
帖子: 27375
注册时间: 2022年 7月 24日 17:21

#1 高频交易程序员的硬件优化水平非常高

帖子 Caravel楼主 »

以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
弃婴千枝
论坛支柱
论坛支柱
帖子互动: 1485
帖子: 13750
注册时间: 2022年 7月 27日 10:51

#2 Re: 高频交易程序员的硬件优化水平非常高

帖子 弃婴千枝 »

就是个稀疏矩阵的parallel计算问题

你如果熟悉早年超级计算机平行编程,就很容易理解了
rtscts
论坛精英
论坛精英
帖子互动: 872
帖子: 8230
注册时间: 2023年 9月 10日 15:11

#3 Re: 高频交易程序员的硬件优化水平非常高

帖子 rtscts »

高频交易那点latency optimization的破玩意,华为的人一上来就可以直接操翻。

整天玩骨干网核心路由交换机的,搞个几把cpu offloading不跟他妈的玩似的。
helpme
论坛元老
论坛元老
帖子互动: 634
帖子: 21800
注册时间: 2022年 7月 24日 20:20

#4 Re: 高频交易程序员的硬件优化水平非常高

帖子 helpme »

魏老师花了十几年做的smart home system,也不知道卖出去一套没有。他应该是不愁钱的,但那段人生,真有点浪费

Caravel 写了: 2025年 1月 26日 13:27 以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。

For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.

This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
看不懂脸色、分不清局势、见不惯人心,三者得其一,便是取祸之道
Caravel楼主
论坛元老
论坛元老
Caravel 的博客
帖子互动: 693
帖子: 27375
注册时间: 2022年 7月 24日 17:21

#5 Re: 高频交易程序员的硬件优化水平非常高

帖子 Caravel楼主 »

弃婴千枝 写了: 2025年 1月 26日 13:36 就是个稀疏矩阵的parallel计算问题

你如果熟悉早年超级计算机平行编程,就很容易理解了
这都是driver层面的东西,一般人不会优化它
回复

回到 “军事天地(Military)”