Google Afraid of Open-Source Community Outpacing Tech Giants in Language Model Race
A researcher within Google leaked a document on a public Discord server recently. Discord is an open-source community platform. Many other groups also use it, but Discord is primarily designed for communities of gamers to facilitate voice, video, and text chat. There is much controversy surrounding the document’s authenticity. But what interests people most is its analysis of LLMs (large language models).
Learn More: An Introduction to Large Language Models (LLMs)
Open-Source Models Surpassing Commercial Counterparts
The paper states that the work happening in the open-source community is quickly outdoing the efforts of Google and OpenAI, competing for the title of the most potent language model. The document claims that open-source models are faster, more customizable, more private, and pound-for-pound more capable than their commercial counterparts.
Innovative Developments in Open-Source Community
One of the most significant findings of the document is that many open-source models are doing things with $100 and 13B params that commercial models struggle with at $10M and 540B. This is happening at an astonishing pace of weeks rather than months. The chart in the Vicuna 13-B announcement illustrates how quickly LLaMA Vicuna and Alpaca followed LLaMA. There has been a tremendous outpouring of innovation, with just days between significant developments. Many of these new ideas come from ordinary people, thanks to the lowered barrier to entry for training and experimentation.
The document argues that this shouldn’t surprise anyone, as it comes right after a renaissance in image generation. The similarities between the two communities have not gone unnoticed, with many calling this the “Stable Diffusion moment” for LLMs.
LoRA Fine-Tuning Technique
Perhaps the most exciting part of the document is when it discusses “What We Missed.” The author is very bullish on LoRA, a technique that allows models to be fine-tuned in just a few hours of consumer hardware, producing improvements that can then be stacked on top of each other. As new and better datasets and tasks become available, the model can be cheaply kept up to date without ever having to pay the cost of an entire run.
The Future of Language Model Development
With this leaked Google document on Discord, the open-source community seems to have taken the lead toward developing the most potent LLMs. At the same time, many people may question the document’s authenticity. One cannot deny that the open-source community has been making significant strides in language models.
As the world increasingly relies on natural language processing technology, it will be interesting to see how the tech giants respond to this open-source challenge. Will they continue to pour more resources into developing their models or embrace the community’s innovations to stay ahead? Only time will tell.