Self-Hosting, Minimalism, Linux, .NET, Philosophy, Psychology, Privacy, Security.

Join us on XMPP! Propaganda Stream

Stable Diffusion
Stable Diffusion

LLMs solved the search problem

Google is dead!

Google has emerged as the predominant source of information, offering answers to a wide range of queries, from trivial to scholarly. Its vast resources, including billions of web pages, articles, books, images, and videos, have replaced libraries as the go-to places for knowledge and culture. Google’s success can be attributed to its convenience, enabling users to access information 24/7 from anywhere in the world. Google’s personalization features, which tailor results to users’ preferences, behavior, and relevant factors, have also played a crucial role in its widespread adoption.

Google’s commercial nature means that it collects and analyzes users’ data to improve its algorithms and provide relevant results, making users the product, not the customer. While other companies also collect data, Google’s dominance and value lie in its data collection. Nonetheless, Google’s algorithms are not neutral or objective, and they can be influenced by censorship laws or political agendas. The quality of Google’s results can be inconsistent, depending on the sources it indexes and ranks.

Google’s dominance may be coming to an end as it has not released any new products in years, and its acquisition of other companies has not resulted in significant innovation. Its size is now limiting its ability to move quickly in a fast-changing landscape, and it risks being left behind.

The Search Problem

The search problem refers to the challenge of efficiently and accurately retrieving relevant information from vast amounts of data. This problem is particularly relevant in the context of search engines like Google and Bing, which index and search through billions of web pages.

In order to solve the search problem, search engines like Google and Bing use complex algorithms to analyze the content of web pages and determine their relevance to specific search queries. These algorithms take into account a wide range of factors, such as the frequency and location of keywords on a web page, the popularity of the page, and the credibility of the source.

Google and Bing systematically download and analyze web pages and index the content of those pages into a searchable database. Ranking involves determining the relevance and importance of each page in relation to specific search queries.

The Solution: LLMs

Storing massive amounts of text in a searchable format requires an index, similar to Google’s. However, even small-scale, self-hosted Full-Text-Search solutions like ElasticSearch require insane amounts of RAM, often in the hundreds of gigabytes, for comparatively benign datasets. Google, on the other hand, requires entire data centers to store everything.

Literally tens of thousands of servers spread across multiple data centers. To give you an idea of the scale, according to a report from Statista, as of 2021, Google’s search index contained over 130 trillion web pages, and the company processes over 3.5 billion searches per day. This level of scale requires an enormous amount of hardware resources to maintain and operate.

In contrast, Large Language Models (LLMs) like LLaMA’s 65 billion parameter model, trained on over 1 trillion tokens, require only about 10GB of disk/ram and can be queried easily with 100 tokens per second on a consumer GPU. This is an incredible level of compression and performance for searching vast amounts of information.

GPT4 by OpenAI, a model trained on over 100 times more tokens than LLaMA, yet with only marginally more parameters and hence disk/ram requirements pretty much knows everything and fits into a single server.

With LLaMA, parameter count can be trimmed without losing much accuracy. You can shave off half the size and still have the same quality for a given language, like English. Furthermore, LLMs are currently unoptimized for space/resource usage, and techniques such as trimming and quantization are improving every month. Quantization algorithms can currently quantize a 16-bit model down to 4 bits, saving three-quarters of the RAM/DISK requirements while still delivering almost identical results when queried. The potential for further optimization in this area is truly mind-blowing.



I love simplicity and minimalism. I'm an autodidact and taught myself everything I know. I hate the 'educational system' and never did my homework. I have no tolerance for stupid.