ChatGPT 4o vs o3

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 10:33

时间线有误，o3 比 4o 新。

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 10:47

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 11:06

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 11:09

CoT - chain of thought 就是prompt engineering。原来是人来做的，现在希望AI自己做。

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 11:19

TheMatrix 写了： 2025年 1月 1日 11:09 CoT - chain of thought 就是prompt engineering。原来是人来做的，现在希望AI自己做。

这个路线和ChatGPT是合拍的。任何人用过ChatGPT一段时间就会发现，ChatGPT需要ask the right question。一小步一小步的问，还需要refine questions，这样才能得到更好的答案。问题分解里有很多机械的步骤，把它自动化起来，再加点AI assisted自动分解，对于现有的AI是一个小跳跃，但是效果很丰厚。

有下一步可走，这是好事。说明OpenAI还在正确的道路上。

wass · 帖子由 **wass** » 2025年 1月 1日 11:30

好像就是reasoning

O1、O3就是干这个的

cot公开更好，至少openai不愿意

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 11:39

wass 写了： 2025年 1月 1日 11:30 好像就是reasoning

O1、O3就是干这个的

cot公开更好，至少openai不愿意

cot的idea不存在秘密，本身就是公开的。但是需要ChatGPT，没有ChatGPT，CoT也没有效果。

wass · 帖子由 **wass** » 2025年 1月 1日 11:46

TheMatrix 写了： 2025年 1月 1日 11:39 cot的idea不存在秘密，本身就是公开的。但是需要ChatGPT，没有ChatGPT，CoT也没有效果。

说的是reasoning steps不愿意公开

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 16:22

TheMatrix 写了： 2025年 1月 1日 11:09 CoT - chain of thought 就是prompt engineering。原来是人来做的，现在希望AI自己做。

CoT应该还是为了挖掘LLM已有的潜力，也就是CoT能得到的最好的回答，用人工prompt engineering也能得到。

我说的是目前的CoT。应该不能推理。

wildthing · 帖子由 **wildthing** » 2025年 1月 1日 16:26

TheMatrix 写了： 2025年 1月 1日 16:22 CoT应该还是为了挖掘LLM已有的潜力，也就是CoT能得到的最好的回答，用人工prompt engineering也能得到。

我说的是目前的CoT。应该不能推理。

reinforcement learning model is difficult to train.

FoxMe · 帖子由 **FoxMe（令狐）** » 2025年 1月 1日 17:06

token是啥？

TheMatrix 写了： 2025年 1月 1日 10:33

时间线有误，o3 比 4o 新。

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 17:06

wildthing 写了： 2025年 1月 1日 16:26 reinforcement learning model is difficult to train.

reinforcement learning没有labeled数据，也就是没有正确答案。要用rewards and punishments来优化。它的难点在哪？

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 1日 17:07

FoxMe 写了： 2025年 1月 1日 17:06token是啥？

token可以理解为单词。

ChatGPT不是一个单词一个单词往外蹦吗？这就是token。

wildthing · 帖子由 **wildthing** » 2025年 1月 1日 17:11

TheMatrix 写了： 2025年 1月 1日 17:06 reinforcement learning没有labeled数据，也就是没有正确答案。要用rewards and punishments来优化。它的难点在哪？

no local extrema. It is a saddle point and very unstable. You can't determine the stopping criterion using training losses.

新未名空间

ChatGPT 4o vs o3

#1 ChatGPT 4o vs o3

#2 Re: ChatGPT 4o vs o3

#1 CoT - chain of thought

#2 Re: CoT - chain of thought

#3 Re: CoT - chain of thought

#4 Re: CoT - chain of thought

#5 Re: CoT - chain of thought

#6 Re: CoT - chain of thought

#3 Re: CoT - chain of thought

#4 Re: CoT - chain of thought

#5 Re: ChatGPT 4o vs o3

#6 Re: CoT - chain of thought

#7 Re: ChatGPT 4o vs o3

#8 Re: CoT - chain of thought