10 Exciting Projects on Large Language Models(LLM)

Aayush Tyagi Last Updated : 30 Jul, 2024

6 min read

Hey job seekers! Want to get noticed? Share your work with potential employers. Especially if you’re in software development or data science. A portfolio of your projects, blog posts, and open-source contributions can set you apart from other candidates. You can demonstrate your skills by creating smaller projects from start to finish. With advanced large language models (LLMs), even developers with limited experience can create impressive projects. So, go ahead and build cool things and show off your skills in new and exciting ways!

This article will share 15 side project ideas that utilize LLMs for downstream tasks. These LLM projects will help you demonstrate your capabilities and creativity.

So what are you waiting for? Start building that portfolio and let your skills and passion shine!

Calling all data science and AI enthusiasts! Get ready to ignite your passion and take a deep dive into the world of data at the highly anticipated DataHack Summit 2023. From the 2nd to the 5th of August, we’re taking over the prestigious NIMHANS Convention Centre in Bangalore for an unforgettable event. Whether you’re a seasoned pro or just starting your journey in the world of data, this summit is tailor-made for you. Brace yourself for a thrilling experience filled with cutting-edge workshops, insightful sessions, and unparalleled networking opportunities. It’s time to immerse yourself in the latest trends, connect with industry leaders, and take your skills to new heights. Don’t miss out on this incredible opportunity to be a part of the data revolution. See you at DataHack Summit 2023!

List of Top 10 Projects on Large Language Models(LLM)
Cover Letter Generator
Customized ChatBot
Youtube or Podcast Summarizer
Information Extraction
Web Scrapper
Question Answering as Document
Clustering and Classification of Documents into Topics
- Classifying Inquiries
Plagiarism Checker
News Projects
- Fake News Detector
- Personalized News Aggregator
Speech Recognition
Conclusion

List of Top 10 Projects on Large Language Models(LLM)

Here is list of top 10 projects on Large Language Models(LLMs)

Cover Letter Generator
Customized ChatBot
Youtube or Podcast Summarizer
Information Extraction
Web Scrapper
Question Answering as Document
Clustering and Classification of Documents into Topics
Plagiarism Checker
Fake News Detector
Personalized News Aggregator
Speech Recognition

All the procedures and steps are classified below for the specified LLM projects above.

Cover Letter Generator

Large language models (LLMs) can generate coherent text, which is useful for a variety of purposes, such as copywriting, programming, and writing cover letters. While some people express concern that LLMs could facilitate the creation of fake news or enable cheating on schoolwork, others are actively leveraging LLMs to enhance productivity and foster creativity.

If you are looking for a new job, you might want to consider creating a cover letter generator using an LLM. While you could technically create a cover letter generator by manually engineering the perfect prompt and filling it with the relevant information about each job, this would be time-consuming and repetitive.

An LLM-powered cover letter generator could save you a lot of time and effort, and it could help you to create more effective cover letters.

Customized ChatBot

You’ve heard of ChatGPT. I don’t need to go into detail here. Its conversational capabilities are pretty impressive. But it lacks personality and has limited information. What if you could give it access to specific knowledge or even a full personality?

The first example is not only a cute and whimsical idea, but it also serves a therapeutic purpose. Michelle Huang built a chatbot based on her diaries to chat with her childhood self.

In a “Black Mirror” episode called “Be Right Back” from 2013, the grieving protagonist reconnects with her late boyfriend after learning about a service that lets people stay in touch with the deceased.

Ten years later, you could technically build this on your laptop as a weekend project…

Although this example is a bit morbid, who’s to say we won’t see this technology help us grieve in the future?

Here are the rough steps you would follow to realize a project like these:

Collect data from your old diaries or chat history and load it into documents
Feed an LLM the contextual information in the prompt
Add conversational memory

Youtube or Podcast Summarizer

LLMs are useful in summarizing the vast amount of AI-generated content available today, especially across different mediums like text, audio (e.g., podcasts), and video.

It can be challenging to understand references to older episodes that we may have missed, making it convenient to search for relevant episodes and get their key points.

For instance, YouTube videos can be summarized, and making episodes searchable could help content creators’ databases answer questions about specific topics. To achieve this, one would need to download the transcript, split it into manageable chunks, summarize the text using an LLM, and optionally create a user-friendly interface.

Here are the rough steps you would follow to realize this project:

Download the video or podcast transcript and load it into documents
Split long documents into chunks
Summarize the transcript with an LLM
Optional: Wrap it all in a user-friendly command line interface or even a web application

Information Extraction

LLMs can be utilized for information extraction by providing them with examples containing text and the desired information to extract. By adding a component to extract relevant information from job postings directly, the cover letter generator can be further enhanced.

To achieve this, one would need to load the job description into a document and use prompt engineering to create a prompt with examples for the LLM to extract the relevant information.

Here are the rough steps you would follow to realize this project:

Load job description from job posting into a document
Extract the relevant information with the LLM by prompt engineering a prompt using examples

Web Scrapper

LLMs are highly proficient in transforming texts to suit various needs such as changing the writing style to match that of a particular publication like “The Economist” or “New Yorker.”

They can also adjust the reading level for easy comprehension, reformat information across different formats, correct spelling and grammar, and translate text from one language to another. It is common practice to use LLMs for converting text from one form to another.

An innovative way to utilize the rewriting potential of LLMs is through web scraping. Writing a web scraper can be tedious, but with LLMs, you could develop a more versatile solution for extracting data from unstructured websites.

Here are the rough steps you would follow to realize this project:

Scrape the website’s source code and load it into a document
Split long documents into chunks
Extract the relevant data from the source code using the LLM (see extraction)
Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples

Question Answering as Document

The process of question-answering can be seen as a fusion of search and summarization techniques. It has the potential to facilitate a more user-friendly approach to dealing with any type of document.

If you wish to undertake a similar project, then consider following these basic steps:

Transform source code into documents.
Divide lengthy documents into smaller segments.
Create embeddings using an embedding model and save them for each document.
Specify an index query that can gather relevant context and trigger the LLM (Language Model) to generate an answer based on it.

Clustering and Classification of Documents into Topics

In addition to retrieving information from documents, embeddings can be employed for categorizing documents by utilizing clustering techniques through unsupervised learning.

If you are interested in undertaking a similar project, here’s a basic outline of the steps involved:

Transform content into documents.
Segment lengthy documents into smaller parts.
Use an embeddings model to create embeddings from the documents and save them.
Apply a clustering algorithm that takes embeddings as input to cluster those documents.

Classifying Inquiries

Classification techniques can categorize documents in a supervised manner, similar to clustering.

If you want to create a similar project, here’s a brief guide on the key steps:

Transform emails into documents.
Create embeddings using an embedding model and save them for each document.
Utilize the embeddings to train a classifier that can categorize the documents based on certain criteria.

Plagiarism Checker

The prevalence of plagiarism is high both online and in academic settings, making it difficult to identify instances of copied content. Various individuals such as bloggers, educators, and news organizations may need to check for plagiarism in written works.

News Projects

Fake News Detector

With the rise of fake news online, there is a growing need for tools to detect false information. LLMs can be used to identify inconsistencies and inaccuracies in news articles.
To undertake this project, you would need to train the model on a dataset of real and fake news articles, test the accuracy of the model on new articles, and present the results in a user-friendly manner.

Personalized News Aggregator

News aggregators can personalize content for users by using LLMs to analyze their reading history and present articles that align with their interests.
To undertake this project, you would need to collect data on the user’s reading habits, use an LLM to analyze the text of news articles and present the results in a user-friendly manner. This could involve creating a mobile app or browser extension.

Speech Recognition

LLMs can also be used for speech recognition, which involves transcribing spoken words into text. This technology has practical applications in areas such as virtual assistants and transcription services.
To undertake this project, you need to train the model on a dataset of audio files and their corresponding transcripts, test its accuracy on new audio files, and create a user interface for users to input audio files to be transcribed.

Conclusion

Creating a portfolio of your projects, blog posts, and open-source contributions is an excellent way to showcase your skills and set yourself apart from other job candidates, especially in software development or data science. With the help of advanced large language models (LLMs), even developers with limited experience can create impressive projects. This article has shared 15 side project ideas that utilize LLMs for downstream tasks such as cover letter generation, web scraping, speech recognition, question answering as document, and more. By creating smaller projects from start to finish and utilizing LLMs, you can demonstrate your creativity, productivity, and problem-solving skills. So, don’t wait any longer–start building your portfolio today and let your skills and passion shine with these exciting LLM projects!

How can I showcase my skills using projects with large language models (LLMs)?

Build a portfolio with projects like cover letter generators, customized chatbots, and web scrapers. These demonstrate your creativity, productivity, and problem-solving skills.

What are some practical project ideas involving LLMs to build my portfolio?

1. Cover Letter Generator
2. Customized ChatBot
3. YouTube or Podcast Summarizer
4. Information Extraction Tool
5. Web Scraper

Large Language Models LLM projects

Aayush Tyagi

Data Analyst with over 2 years of experience in leveraging data insights to drive informed decisions. Passionate about solving complex problems and exploring new trends in analytics. When not diving deep into data, I enjoy playing chess, singing, and writing shayari.

Generative AI LLMs NLP

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

10 Exciting Projects on Large Language Models(LLM)

Table of contents

List of Top 10 Projects on Large Language Models(LLM)

Cover Letter Generator

Customized ChatBot

Youtube or Podcast Summarizer

Information Extraction

Web Scrapper

Question Answering as Document

Clustering and Classification of Documents into Topics

Classifying Inquiries

Plagiarism Checker

News Projects

Fake News Detector

Personalized News Aggregator

Speech Recognition

Conclusion

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp