LLMs like GPT and Llama have completely transformed how we tackle language tasks, from creating intelligent chatbots to generating complex pieces of code. Cloud platforms like Hugging Face simplify using these models, but there are times when running an LLM locally on your own computer is the smarter choice. Why? Because it offers greater privacy, allows for customizations tailored to your specific needs, and can significantly reduce costs. Running LLMs locally gives you full control, letting you leverage their power on your own terms.
Let me show you how to run an LLM on your system in just a few simple steps using Ollama and Hugging Face!
Here’s a video that explains it step by step:
Step 1: Download Ollama
First, search for “Ollama” on your browser, download it, and install it on your system.
Step 2: Find the Best Open-Source LLMs
Next, search for “HuggingFace LLM leaderboard” to find a list of the top open-source language models.
Step 3: Filter the Models for Your Device
Once you see the list, apply filters to find models that work best for your setup. For example:
Click on a top-ranked model, such as Qwen/Qwen2.5-35B. On the top-right corner of the screen, click “Use this model.” However, you won’t find Ollama listed here as an option.
That’s because Ollama uses a specialized format called gguf, which is a smaller, faster, and quantized version of the model.
(Note: Quantization slightly reduces quality but makes it more efficient for local use.)
To get a model in the gguf format:
Look for models with “gguf” in their name, like Bartowski. This is a good choice.
Step 5: Download and Start Using the Model
Copy the command provided for your selected model and paste it into your terminal. Hit “Enter” and wait for the download to complete.
Once it’s downloaded, you can start chatting with the model just like you would with any other LLM. Simple and fun!
And there you go! You’re now running a powerful LLM locally on your device. Let me know if these steps worked for you in the comment section below.