以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。
For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.
This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
高频交易程序员的硬件优化水平非常高
版主: Softfist
#3 Re: 高频交易程序员的硬件优化水平非常高
高频交易那点latency optimization的破玩意,华为的人一上来就可以直接操翻。
整天玩骨干网核心路由交换机的,搞个几把cpu offloading不跟他妈的玩似的。
整天玩骨干网核心路由交换机的,搞个几把cpu offloading不跟他妈的玩似的。
#4 Re: 高频交易程序员的硬件优化水平非常高
魏老师花了十几年做的smart home system,也不知道卖出去一套没有。他应该是不愁钱的,但那段人生,真有点浪费
Caravel 写了: 2025年 1月 26日 13:27 以前买买提有个魏老师,也是对硬件非常精通,跟goodegg打赌铁路买票用几台server打赢了。
For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks.
This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications.
看不懂脸色、分不清局势、见不惯人心,三者得其一,便是取祸之道
-
- 论坛元老
Caravel 的博客 - 帖子互动: 693
- 帖子: 27375
- 注册时间: 2022年 7月 24日 17:21