这个题目很重要 - 多模态 next token prediction

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 14日 13:12

我还没看，先放在这。

我关注的是数据准备，数据统一，不同来源的数据如何处理以统一。

https://zhuanlan.zhihu.com/p/17728210584

https://arxiv.org/pdf/2412.18619

Caravel · 帖子由 **Caravel** » 2025年 1月 14日 16:07

之前看到lecun的报告说，

next token这种办法对图像效果不行

因为图像像素是连续变量，不像token是离散的

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 14日 16:29

Caravel 写了： 2025年 1月 14日 16:07 之前看到lecun的报告说，

next token这种办法对图像效果不行

因为图像像素是连续变量，不像token是离散的

肯定不能用像素做token。

赖美豪中

自信一点对复杂得语义也不行

Caravel 写了： 2025年 1月 14日 16:07 之前看到lecun的报告说，

next token这种办法对图像效果不行

因为图像像素是连续变量，不像token是离散的

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 15日 14:17

TheMatrix 写了： 2025年 1月 14日 13:12 我还没看，先放在这。

我关注的是数据准备，数据统一，不同来源的数据如何处理以统一。

https://zhuanlan.zhihu.com/p/17728210584

https://arxiv.org/pdf/2412.18619

看了一下。是一个review，没有什么有启发的东西。

我主要关心tokenization，image或者video的tokenization，输入方面的。

它里面介绍的两种，无论discrete还是continuous tokenization，都没有看到我想要看到的东西：

viewtopic.php?p=4812354#p4812354

Caravel · 帖子由 **Caravel** » 2025年 1月 15日 14:59

TheMatrix 写了： 2025年 1月 15日 14:17 看了一下。是一个review，没有什么有启发的东西。

我主要关心tokenization，image或者video的tokenization，输入方面的。

它里面介绍的两种，无论discrete还是continuous tokenization，都没有看到我想要看到的东西：

viewtopic.php?p=4812354#p4812354

目前这种用LLM方法搞出的多模态不行，要从机器人那里bottom up 世界模型

TheMatrix · 帖子由 **TheMatrix楼主** » 2025年 1月 19日 09:32

TheMatrix 写了： 2025年 1月 15日 14:17 看了一下。是一个review，没有什么有启发的东西。

我主要关心tokenization，image或者video的tokenization，输入方面的。

它里面介绍的两种，无论discrete还是continuous tokenization，都没有看到我想要看到的东西：

viewtopic.php?p=4812354#p4812354

新未名空间

这个题目很重要 - 多模态 next token prediction

#1 这个题目很重要 - 多模态 next token prediction

#2 Re: 这个题目很重要 - 多模态 next token prediction

#3 Re: 这个题目很重要 - 多模态 next token prediction

#4 Re: 这个题目很重要 - 多模态 next token prediction

#5 Re: 这个题目很重要 - 多模态 next token prediction

#6 Re: 这个题目很重要 - 多模态 next token prediction

#7 Re: 这个题目很重要 - 多模态 next token prediction