May 5, 2023
Google document leaked on Discord
A researcher within Google leaked a document on a public Discord server recently. Discord is an open-source community platform. Many other groups also use it, but Discord is primarily designed for communities of gamers to facilitate voice, video, and text chat. There is much controversy surrounding the document’s authenticity. But what interests people most is its analysis of LLMs (large language models).

Open-Source Models Surpassing Commercial Counterparts

The paper states that the work happening in the open-source community is quickly outdoing the efforts of Google and OpenAI, competing for the title of the most potent language model. The document claims that open-source models are faster, more customizable, more private, and pound-for-pound more capable than their commercial counterparts.

Innovative Developments in Open-Source Community

One of the most significant findings of the document is that many open-source models are doing things with $100 and 13B params that commercial models struggle with at $10M and 540B. This is happening at an astonishing pace of weeks rather than months. The chart in the Vicuna 13-B announcement illustrates how quickly LLaMA Vicuna and Alpaca followed LLaMA. There has been a tremendous outpouring of innovation, with just days between significant developments. Many of these new ideas come from ordinary people, thanks to the lowered barrier to entry for training and experimentation.
The document argues that this shouldn’t surprise anyone, as it comes right after a renaissance in image generation. The similarities between the two communities have not gone unnoticed, with many calling this the “Stable Diffusion moment” for LLMs.

LoRA Fine-Tuning Technique

Perhaps the most exciting part of the document is when it discusses “What We Missed.” The author is very bullish on LoRA, a technique that allows models to be fine-tuned in just a few hours of consumer hardware, producing improvements that can then be stacked on top of each other. As new and better datasets and tasks become available, the model can be cheaply kept up to date without ever having to pay the cost of an entire run.

The Future of Language Model Development

With this leaked Google document on Discord, the open-source community seems to have taken the lead toward developing the most potent LLMs. At the same time, many people may question the document’s authenticity. One cannot deny that the open-source community has been making significant strides in language models.

Our Say

As the world increasingly relies on natural language processing technology, it will be interesting to see how the tech giants respond to this open-source challenge. Will they continue to pour more resources into developing their models or embrace the community’s innovations to stay ahead? Only time will tell.

