迈向个人语言模型

Tlexander · 帖子由 **Tlexander楼主** » 2023年 1月 25日 22:13

https://ai2healthcare.github.io/news/20 ... Dr.Yi_Wang

Language models provide the joint probability distribution of a symbolic sequence. A language model can generate novel sequences which enables article writing and dialogues. It can also predict the likelihood of given sequences which enables blank filling, choices on multiple answers or judgement of propositions. Thus, language models are keys to future artificial general intelligence (AGI). Currently huge language models dominate the field. They cost huge computation, emit tons of CO2, require expensive GPU server to deploy and block small labs and individual researchers. In this study, I explored various technologies to a personal language model which is small, elegant, cheap, fast and affordable to everyone. These technologies include: (1) A simple bare CUDA/C++ implementation of every operator from the scratch. (2) Several novel candidate architectures. (3) A novel entropy-based sampling method for text generation, aka Top-E sampling. (4) Elegant designs, such as byte level modeling, extreme deep and narrow design, single head batch computation etc. (5) Quantization with VNNI instructions. I open sourced the June version with two pretrained models: PubMed English model and WuDao Chinese models. A more recent Traditional Chinese Medicine model is also available on WeChat based on a state-of-the-art model with only 3 million parameters.

语言模型提供符号序列的联合概率分布。语言模型可以生成新颖的序列，从而实现文章写作和对话。它还可以预测给定序列的可能性，从而能够填空、选择多个答案或判断命题。因此，语言模型是未来通用人工智能 (AGI) 的关键。目前，庞大的语言模型在该领域占据主导地位。它们需要巨大的计算成本，排放大量的二氧化碳，需要昂贵的 GPU 服务器来部署和阻止小型实验室和个人研究人员。在这项研究中，我探索了各种技术，以形成一种小巧、优雅、廉价、快速且人人都能负担得起的个人语言模型。这些技术包括：(1) 从头开始对每个运算符进行简单的裸 CUDA/C++ 实现。 (2) 几种新颖的候选架构。 (3) 一种新的基于熵的文本生成抽样方法，又名 Top-E 抽样。 (4) 优雅的设计，如字节级建模、极深极窄设计、单头批计算等。 (5) VNNI 指令量化。我开源了 6 月版本的两个预训练模型：PubMed 英文模型和五道中文模型。基于只有 300 万个参数的最先进模型，微信上也提供了更新的中医模型。