According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.
Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。
Deepseek r1 vs open ai o3 on arc-agi tests
版主: verdelite, TheMatrix
-
- 论坛支柱
2024年度优秀版主
TheMatrix 的博客 - 帖子互动: 279
- 帖子: 13694
- 注册时间: 2022年 7月 26日 00:35
#2 Re: Deepseek r1 vs open ai o3 on arc-agi tests
o3到目前为止谁也没见过,都是自己说。itspid 写了: 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.
Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。
-
- 论坛元老
Caravel 的博客 - 帖子互动: 693
- 帖子: 27369
- 注册时间: 2022年 7月 24日 17:21
#4 Re: Deepseek r1 vs open ai o3 on arc-agi tests
不用担心,只要发布了2小时就能对标。我实测过ds错了gpt对的,基本2小时到一天内都会和gpt的答案一致,我估计后来有人工实测校正
itspid 写了: 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.
Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。
If printing money would end poverty, printing diplomas would end stupidity.