Deepseek r1 vs open ai o3 on arc-agi tests

itspid · 帖子由 **itspid楼主** » 2025年 2月 5日 19:06

According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了，是不是要想办法再蒸一遍模型。

TheMatrix · 帖子由 **TheMatrix** » 2025年 2月 5日 19:42

itspid 写了： 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了，是不是要想办法再蒸一遍模型。

o3到目前为止谁也没见过，都是自己说。

Caravel · 帖子由 **Caravel** » 2025年 2月 5日 23:19

TheMatrix 写了： 2025年 2月 5日 19:42 o3到目前为止谁也没见过，都是自己说。

这些新的benchmark先澄清一下有没有oai的投资

赖美豪中

不用担心，只要发布了2小时就能对标。我实测过ds错了gpt对的，基本2小时到一天内都会和gpt的答案一致，我估计后来有人工实测校正

itspid 写了： 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了，是不是要想办法再蒸一遍模型。

新未名空间

Deepseek r1 vs open ai o3 on arc-agi tests

#1 Deepseek r1 vs open ai o3 on arc-agi tests

#2 Re: Deepseek r1 vs open ai o3 on arc-agi tests

#3 Re: Deepseek r1 vs open ai o3 on arc-agi tests

#4 Re: Deepseek r1 vs open ai o3 on arc-agi tests