Getting Started with LlaMA 2: A Beginner’s Guide

Ajay Last Updated : 08 Oct, 2024

7 min read

Introduction

With the release of GPT from OpenAI, many companies entered the race to create robust Generative Large Language Models of their own. Creating a Generative AI from scratch can involve a pretty cumbersome process, as it requires conducting thorough research in the field of Generative AI and performing numerous trials and errors. It also entails carefully curating a high-quality dataset, as the effectiveness of Large Language Models heavily depends on the data they are trained on. And lastly, it requires enormous computation power to train these models, which many companies cannot access. So as of now, only a few companies can create these LLMs, including OpenAI and Google, and now finally, Meta has joined this race with the introduction of LlaMA.

Learning Objectives

Get to know about the new version of LlaMA
Understanding the model’s versions, parameters, and model benchmarks
Getting access to the Llama 2 family of models
Trying LlaMA 2 with different prompts and observing the outputs

This article was published as a part of the Data Science Blogathon.

Introduction
What is LlaMA?
What is LlaMA 2?
How to Access to LlaMA 2?
Using LlaMA 2 with Hugging Face and Colab
- Hugging Face API Key
- Prompt Template
Conclusion
Frequently Asked Questions

What is LlaMA?

LlaMA (Large Language Model Meta AI) is a Generative AI model, specifically a group of foundational Large Language Models developed by Meta AI, a company owned by Meta(Formerly Facebook). Meta announced Llama in Feb of 2023. Meta released Llama in different sizes(based on parameters), i.e., 7,13,33, and 65 billion parameters with a context length of 2k tokens. The model is with the intent to help researchers advance their knowledge in the field of AI. The small 7B models allow researchers with low computation power to study these models.

With the introduction of LlaMa, Meta has entered the LLM space and is now competing with OpenAI’s GPT and Google’s PaLM models. Meta believes that retraining or fine-tuning small models with limited computation resources can achieve results on par with state-of-the-art models in their respective fields. Meta AI’s LlaMa differs from OpenAI and Google’s LLM because the LlaMA model family is completely Open Source and free for anyone to use, and it even released the LlaMA weights for researchers for non-commercial uses.

What is LlaMA 2?

LlaMA 2 surpasses the previous version, LlaMA version 1, which Meta released in July of 2023. It came out in three sizes: 7B, 13B, and 70B parameter models. Upon its release, LlaMA 2 achieved the highest score on Hugging Face. Even across all segments (7B, 13B, and 70B), the top-performing model on Hugging Face originates from LlaMA 2, having been fine-tuned or retrained.

Llama 2 was trained on 2 Trillion Pretraining Tokens. The context length for all the Llama 2 models is 4k(2x the context length of Llama 1). Llama 2 outperformed state-of-the-art open-source models such as Falcon and MPT in various benchmarks, including MMLU, TriviaQA, Natural Question, HumanEval, and others (You can find the comprehensive benchmark scores on Meta AI’s website). Furthermore, Llama 2 underwent fine-tuning for chat-related use cases, involving training with over 1 million human annotations. Developers can integrate the Llama 2 API into their applications, making it easier to deploy and leverage the model for real-time language generation tasks. These chat models are readily available to use on the Hugging Face website.

How to Access to LlaMA 2?

The source code for Llama 2 is available on GitHub. If you want to work with the original weights, these are also available, but for this, you need to provide your name and email to the Meta AIs website. So go to the Meta AI by clicking here, then enter your name, email address, and organization(student if you are not working). Then scroll down and click on accept and continue. Now you will get a mail stating that you can download the model weights. The form will look like the one below.

Now there are two ways to work with your model. One is to directly download the model through the instructions and link provided in the email(the hard way, and only good if you have a decent GPU), and the other is to use Hugging Face and Google Colab. In this article, I will go through the easy way, which anyone can try. Before going to Google Colab, we need to set up a Hugging Face account and create an Inference API. Then we need to go to the llama 2 model in Hugging Face(which you can do by clicking here), and then provide the email you provided to the Meta AI website. Then you will be authenticated and will be shown something similar to the below.

Now, we can download any Llama 2 model through Hugging Face and start working with it.

Using LlaMA 2 with Hugging Face and Colab

In the last section, we have seen the prerequisites before testing the Llama 2 model. We will start with importing necessary libraries in the Google Colab, which we can do with the pip command.

!pip install -q transformers einops accelerate langchain bitsandbytes

We need to install these necessary packages to start working with Llama 2. Also, the transformers library from hugging face to download the model. The einops function performs easy matrix multiplications within the model(it uses Einstein Operations/Summation notation), accelerates bits and bytes to speedup the inference, and langchain integrates our llama.

Next, to login into the Hugging Face through colab through the Hugging Face API Key, we can download the llama model; for this, we do the following.

!huggingface-cli login

Now we provide the Hugging Face Inference API key we created earlier. Then if it prompts Add token as git credential? (Y/n), Then you can reply with n. Now we are logged into Hugging Face API Key and are ready to download the model.

Hugging Face API Key

Now to download our model, we will write the following.

from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
    "text-generation", 
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    eos_token_id=tokenizer.eos_token_id
)

Here we are specifying the path to the Llama 2 7B version in Hugging Face to the model variable, which runs perfectly with Google Colab’s free-tier GPU. Anything above that will require additional VRAM, which is impossible with Colab’s free tier.
Then we download the tokenizer for the Llama 2 7B model by specifying the model name to the AutoTokenizer.from_pretrained() function.
Then we use the transformer pipeline function and pass all the parameters to it, like the model we will work with. The device_map = auto tokenizer will allow the model to use the GPU in colab if present.
We even specify the max output tokens as 1000 and set the torch data type to float16. Finally, we pass the eos_token_id, which the model will use to know when to stop while writing the answer.
After running this, the model will be downloaded to Colab, which will take some time as it is around 10GB. Now we will create a HuggingFacePipeline out of it through the below code.


llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

Here we set the model’s temperature and pass the pipeline we created to the pipeline variable. This HuggingFacePipeline will now allow us to use the model that we have downloaded.

Prompt Template

We shall create a Prompt Template for our model and then test it.

from langchain import PromptTemplate,  LLMChain

template = """
              You are an intelligent chatbot that gives out useful information to humans.
              You return the responses in sentences with arrows at the start of each sentence
              {query}
           """

prompt = PromptTemplate(template=template, input_variables=["query"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

Here, the template is simple. We want the Llama model to answer the user’s query and return it as points with numbering.
Then we pass this template to the PrompTemplate function and assign the template and the input_variable parameters.
Finally, we chain our Llama LLM and the Prompt to start inferencing the model. Let’s ask a question about our model now.

print(llm_chain.run('What are the 3 causes of glacier meltdowns?'))

So we asked the model to list the three possible causes of glacier meltdowns, and the model returned the following:

We see that the model has done exceptionally well. The best part is that it used emoji numbering to represent the points and has exactly returned 3 points to the output. It even used the water tide emoji to represent the glaciers. This way, you can start working with the Llama 2 from Hugging Face and Colab.

Conclusion

In this article, we have briefly examined the LlaMA(Large Language Model Meta AI)models created and released by Meta AI. We have learned about the different model sizes of its and seen how version 2, i.e., Llama 2, clearly defeats the state-of-the-art Open Source LLMs at different benchmarks. Finally, we have gone through the process of getting access to the Llama 2 model trained weights. Finally, we walked through the Llama-2 7B chat version in the Google Colab through the Hugging Face and LangChain libraries.

Key Takeaways

Some of the key takeaways from this article include:

Meta develops llama models to help researchers understand more about AI.
Llama models, especially the smaller 7B version, can be trained efficiently and perform exceptionally well.
Through different benchmarks, it was proven that Llama 2 was ahead of the competition when compared to other state-of-the-art Open LLMs.
The main thing that makes Meta’s Llama 2 different from OpenAI’s GPT and Google’s PaLM is that it is Open Source, and anyone can use it for commercial applications.

Frequently Asked Questions

Q1. What is Llama / Llama 2?

A. LlaMA is a group of foundational LLMs developed by Meta AI, owned by Meta(Formerly Facebook); this was announced to the public in February 2023.

Q2. In how many sizes does Llama 2 come?

A. Llama 2 comes in 3 different sizes, they are 7B, 13B, and the 70B parameter model. All three of them work exceptionally well and can be fine-tuned easily.

Q3. Can we run Llama 2 on the local machine?

A. Yeah. It is possible to run the 7B model of Llama 2 on the local machine, which requires you to have at least 10GB of GPU VRAM for the model to work properly. Though quantized versions of Llama 2 7B are available, they require even less VRAM, and some can run only with the CPU.

Q4. Is Llama Open Source?

A. Meta AI has announced that Llama and Llama 2 will be open-sourced. They even provide the model weights if requested through a form on their website. Within hours after releasing Llama 2, many alternative Llama 2 models have sprung up in the Hugging Face.

Q5. What application can Llama be used?

A. With Llama, we can create applications like conversation chatbots, sentiment classification systems, summarization tools, and many more. In the future, developers will create even smaller versions that can work to develop Generative AI-enabled mobile applications.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

AI blogathon colab Generative AI google hugging face llama 2 Models OpenAI

Ajay

I work as a Developer in the field of Data Science. I constantly spend time learning new things be it related to AI, DataSceine, and CyberSecurity. Deep learning and machine learning are two topics that I find particularly fascinating, and Python is my preferred language for programming. Cyber Security is another field that I'm touching upon recently. I have experience with large-scale data analysis, and I have a solid grasp of a variety of deep learning and machine learning approaches, including neural networks, regression models, and natural language processing. I'm eager to take on new challenges and make a meaningful contribution to the industry, so I'm constantly seeking for ways to enlarge and deepen my knowledge and skills in the subject.

Beginner Generative AI LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Getting Started with LlaMA 2: A Beginner’s Guide

Introduction

Learning Objectives

Table of contents

What is LlaMA?

What is LlaMA 2?

How to Access to LlaMA 2?

Using LlaMA 2 with Hugging Face and Colab

Hugging Face API Key

Prompt Template

Conclusion

Key Takeaways

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set