分页: 1 / 1

#1 Deepseek r1 vs open ai o3 on arc-agi tests

发表于 : 2025年 2月 5日 19:06
itspid
According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。

#2 Re: Deepseek r1 vs open ai o3 on arc-agi tests

发表于 : 2025年 2月 5日 19:42
TheMatrix
itspid 写了: 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。
o3到目前为止谁也没见过,都是自己说。

#3 Re: Deepseek r1 vs open ai o3 on arc-agi tests

发表于 : 2025年 2月 5日 23:19
Caravel
TheMatrix 写了: 2025年 2月 5日 19:42 o3到目前为止谁也没见过,都是自己说。
这些新的benchmark先澄清一下有没有oai的投资

#4 Re: Deepseek r1 vs open ai o3 on arc-agi tests

发表于 : 2025年 2月 6日 11:35
赖美豪中
不用担心,只要发布了2小时就能对标。我实测过ds错了gpt对的,基本2小时到一天内都会和gpt的答案一致,我估计后来有人工实测校正
itspid 写了: 2025年 2月 5日 19:06 According to recent analysis, DeepSeek's "R1" model achieved a score of around 15% on the ARC-AGI benchmark, with its "R1-Zero" variant scoring slightly lower at approximately 14%. This means that while DeepSeek models show promising reasoning abilities, they still fall significantly below the scores achieved by top-performing AI systems like OpenAI's "o3" which scored around 87.5% on the same test.

Deepseek r1 比新出的o3差得远去了,是不是要想办法再蒸一遍模型。