Here are 15+ Small LLMs that You can Run on Local Devices

Pankaj Singh 19 Apr, 2024 • 7 min read


Imagine harnessing the power of advanced language models right on your personal computer or mobile device without relying on cloud services or powerful servers. Sounds incredible, doesn’t it? Well, these tiny language models make this dream a reality. In NLP, we’ve observed the advent of enormous language models that assimilate and create text just like a human. While the results are often remarkable, the computational requirements are equally large. As a result, it’s difficult to run them outside of a processing center. But that’s quickly changing! The good news is that the researchers and engineers have poured their hearts into producing small LLMs that are enough to run on your local devices and have sufficient power to be applied to any useful task.

In this article, we’ll explore the smallest and mightiest language models you can run locally from the comfort of your own device. These compact marvels strike a perfect balance between performance and resource efficiency, opening up a world of possibilities for developers, researchers, and enthusiasts alike.

Smallest LLMs

What are the Benefits of Small LLMs?

Here are some key benefits of using small LLMs (Large Language Models) compared to their larger counterparts:

  1. Lower Hardware Requirements: Small LLMs have significantly fewer parameters and require less computational power, making them ideal for running on devices with limited hardware resources, such as laptops, smartphones, and embedded systems. This makes them more accessible and democratizes using LLMs for a broader range of users and applications.
  2. Faster Inference: With fewer parameters and smaller model sizes, small LLMs can perform faster inference, which means quicker response times and lower latency. This is particularly important for real-time applications like conversational AI, where responsiveness is crucial.
  3. Lower Energy Consumption: Smaller models require less energy to run, making them more energy-efficient and environmentally friendly. This is especially beneficial for battery-powered devices, where energy efficiency is critical.
  4. Easier Deployment and Portability: Small LLMs are easier to deploy and distribute due to their compact size. They can be integrated into various applications and systems without specialized hardware or large-scale infrastructure. This portability allows for broader adoption and enables the development of more decentralized and edge-based applications.
  5. Privacy and Data Sovereignty: By running small LLMs locally, users can maintain greater control over their data and reduce the need to send sensitive information to remote servers or cloud platforms. This can help address privacy concerns and comply with data protection regulations.
  6. Cost-effectiveness: Smaller models generally require fewer computational resources, which can translate into lower operational costs, especially when running on cloud platforms or rented hardware. This cost-effectiveness can make LLM technology more accessible to smaller organizations and individual developers.
  7. Specialized Applications: While smaller models may not achieve the same level of performance as larger models on general tasks, they can be fine-tuned and optimized for specific applications or domains, potentially outperforming larger models in those specialized areas.

It’s important to note that the benefits of small LLMs come with trade-offs in performance and capabilities compared to their larger counterparts. However, small LLMs’ advantages in resource efficiency, portability, and cost-effectiveness can make them a compelling choice for many applications where high-end performance is not a critical requirement.

Smallest LLMs You Can Run on Local Devices


  • Model Size: The base version has around 66M parameters, significantly smaller than BERT’s 110M parameters.
  • Description: DistilBERT is a distilled version of the BERT model, designed to be smaller and faster while retaining most of BERT’s performance. It uses knowledge distillation techniques to compress the large BERT model into a smaller version, making it more efficient and easier to deploy on local devices.
  • Hardware Requirements: DistilBERT’s compact size allows it to run on various local devices, including laptops, desktops, and even high-end mobile devices.

Hugging Face Link: DistilBERT


  • Model Size: TinyBERT-4 has around 14M parameters, while TinyBERT-6 has around 67M.
  • Description: TinyBERT is an even more compact version of BERT, developed by researchers at Carnegie Mellon University and Google Brain. It uses advanced techniques like layer-wise and attention distillation to achieve significant model compression while maintaining competitive performance on various NLP tasks.
  • Hardware Requirements: TinyBERT’s extremely small size allows it to run on a wide range of local devices, including low-end laptops, embedded systems, and mobile devices.

Hugging Face Link: TinyBERT


  • Model Size: MobileBERT has around 25M parameters, significantly smaller than the original BERT base.
  • Description: MobileBERT is a compact and efficient BERT model for mobile and edge devices. It uses techniques like knowledge distillation and quantization to reduce the model size while maintaining high performance on a wide range of NLP tasks.
  • Hardware Requirements: As the name suggests, MobileBERT is optimized for running on mobile devices and other resource-constrained environments.

Hugging Face Link: MobileBERT


  • Model Size: It varies depending on the configuration; one of the smallest is an ALBERT base with 12 layers and 12 attention heads.
  • Description: ALBERT (A Lite BERT) is designed for efficient memory usage and faster inference. It features a cross-layer parameter-sharing mechanism and reduced embedding size. It’s effective for various NLP tasks while lighter than the original BERT.
  • Hardware Requirements: ALBERT’s efficient design allows it to run on various local devices with moderate processing power.

Hugging Face Link: ALBERT

GPT-2 Small

  • Model Size: GPT-2 Small has around 117M parameters, significantly smaller than the larger GPT-2 models.
  • Description: GPT-2 Small is a smaller version of the popular GPT-2 (Generative Pre-trained Transformer 2) model developed by OpenAI. While not as compact as some of the other models, GPT-2 Small is still relatively lightweight and can be used for tasks like text generation, summarization, and language modeling.
  • Hardware Requirements: GPT-2 Small can be run on personal computers with moderate hardware specifications, such as mid-range laptops or desktops.

Hugging Face Link: GPT-2 Small


  • Model Size: 1 billion parameters
  • Description: DeciCoder-1B is a language model focused on code generation and understanding. It can assist with coding tasks like code completion, translation between programming languages, and explaining code. It is trained on a large corpus of source code and natural language descriptions.
  • Hardware Requirements: With its relatively small 1 billion parameter size, DeciCoder-1B can run on various local devices like laptops, desktops, and potentially high-end mobile devices or single-board computers.

Hugging Face Link: DeciCoder – 1B


  • Model Size: 1.5 billion parameters
  • Description: Phi-1.5 is a general-purpose language model capable of generating text, answering questions, and understanding natural language, and other NLP tasks. It is designed to adapt to different domains and tasks through fine-tuning or prompting.
  • Hardware Requirements: Phi-1.5’s compact 1.5 billion parameter size allows it to be deployed on local devices with moderate computing resources, such as laptops, desktops, and potentially higher-end mobile or single-board computing devices.

Hugging Face Link: Phi-1.5


  • Model Size: 3 billion parameters
  • Description: Dolly-v2-3b is an instruction-following language model that excels at understanding and executing detailed, multi-step prompts and instructions across various tasks.
  • Hardware Requirements: With 3 billion parameters, Dolly-v2-3b requires local devices with moderate to high computing power, like high-end laptops, desktops, or workstations.

Hugging Face Link: Dolly-v2-3b


  • Model Size: 3 billion parameters
  • Description: StableLM-Zephyr-3B is a language model trained to provide reliable and truthful responses. It is designed to be a stable and trustworthy model for various natural language processing tasks.
  • Hardware Requirements: Like Dolly-v2-3b, the 3 billion parameters StableLM-Zephyr-3B can run on local devices with moderate to high computing capabilities, such as high-end laptops, desktops, or workstations.

Hugging Face Link: StableLM-Zephyr-3B


  • Model Size: 7 billion parameters
  • Description: DeciLM-7B is a general-purpose language model for various natural language processing tasks. Its larger 7 billion parameter size offers improved performance over smaller models while still being compact enough for local deployment.
  • Hardware Requirements: To run DeciLM-7B locally, users will need access to systems with more powerful hardware, such as high-end desktops or workstations with capable GPUs or TPUs.

Hugging Face Link: DeciLM-7B


  • Model Size: 7 billion parameters
  • Description: Mistral-7B-Instruct-v0.2 is an instruction-following language model that can effectively handle complex multi-step instructions and tasks.
  • Hardware Requirements: Similar to DeciLM-7B, Mistral-7B-Instruct-v0.2 requires high-end local hardware, such as powerful desktops or workstations, to run its 7 billion parameters.

Hugging Face Link: Mistral-7B-Instruct-v0.2


  • Model Size: 7 billion parameters
  • Description: Orca-2-7B is an open-source language model that provides safe, truthful, and human-aligned responses. It aims to generate outputs aligned with human values and ethics.
  • Hardware Requirements: The 7 billion parameter Orca-2-7B necessitates powerful local hardware like high-performance desktops or workstations to operate effectively.

Hugging Face Link: Orca-2-7B


  • Model Size: 7 billion parameters
  • Description: Amber is a multi-task language model designed to handle various natural language processing tasks with high performance across domains and applications.
  • Hardware Requirements: Running Amber’s 7 billion parameters locally requires access to high-end hardware, such as powerful desktops or workstations with capable GPUs or TPUs.

Hugging Face Link: Amber


  • Model Size: 7 billion parameters
  • Description: OpenHathi-7B-Hi-v0.1-Base is a large Hindi language model, one of the biggest openly available models for the Hindi language. It can understand and generate Hindi text.
  • Hardware Requirements: Like other 7B models, OpenHathi-7B-Hi-v0.1-Base requires high-performance local hardware, such as powerful desktops or workstations, to run effectively.

Hugging Face Link: OpenHathi-7B-Hi-v0.1-Base


  • Model Size: 10.7 billion parameters
  • Description: SOLAR-10.7B-v1.0 is a large general language model pushing the limits of what can run locally on consumer hardware. It offers enhanced performance for various NLP tasks.
  • Hardware Requirements: To deploy SOLAR-10.7B-v1.0 locally, users will need access to high-end consumer hardware with powerful GPUs or multi-GPU setups.

Hugging Face Link: SOLAR-10.7B-v1.0


  • Model Size: 13 billion parameters
  • Description: NexusRaven-V2-13B is a large language model focused on open-ended text generation across different domains and applications.
  • Hardware Requirements: At 13 billion parameters, NexusRaven-V2-13B requires very powerful hardware, such as high-end workstations or multi-GPU setups, to run locally on consumer devices.

Hugging Face Link: NexusRaven-V2-13B

While these compact LLMs offer significant portability and resource efficiency advantages, it’s important to note that they may not achieve the same level of performance as their larger counterparts on certain complex NLP tasks. However, for many applications that don’t require state-of-the-art performance, these smaller models can be a practical and accessible solution, especially when running on local devices with limited computational resources.


In conclusion, the availability of small language models that can run locally on your devices marks a significant step forward in AI and NLP. These models offer an ideal blend of power, efficiency, and accessibility, allowing you to perform advanced natural language processing tasks without relying on cloud services or powerful data centers. As you experiment with these compact LLMs, you open up new avenues for innovation and creativity in your projects, whether you’re a seasoned developer, a researcher, or a hobbyist. The future of AI is no longer limited to massive models; instead, it’s about maximizing the potential of the hardware you already have. Discover what these small yet mighty models can achieve for you!

So, if you’re looking to learn Generative AI, then join our exclusive GenAI Pinnacle Program! Dive into 200+ hours of learning and 1:1 mentorship to master 26+ GenAI tools. Join now!

I hope you found this article insightful. If you have any suggestions regarding the article, comment below. For more articles, you can refer to this link.

Pankaj Singh 19 Apr 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers