Debugging And Testing LLMs in LangSmith

Sahitya Arya Last Updated : 13 Jun, 2024

11 min read

Introduction

With the advancements in Artificial Intelligence, developing and deploying large language model (LLM) applications has become increasingly complex and demanding. To address these challenges, let’s explore LangSmith. LangSmith is a new cutting-edge DevOps platform designed to develop, collaborate, test, deploy, and monitor LLM applications. This article will explore how to debug and test LLMs in LangSmith.

Overview

Learn about LangSmith to simplify the development, testing, deployment, and monitoring of large language model (LLM) applications.
Gain an understanding of why LangSmith is essential in managing the complexities of LLMs.
Discover the comprehensive suite of features LangSmith offers.
Learn how LangSmith integrates with LangChain to streamline the transition from prototyping to production.
Understand the core components of LangSmith’s user interface to manage and refine LLM applications effectively.

What is LangSmith?
Why is there a Need for LangSmith?
Why Should One Choose LangSmith?
LangChain Integration
How LangSmith Comes Handy in LLM Application Development?
Other Services LangSmith Offers for LLM Application Deployment
Core Components of LangSmith UI
How to Create a New Project in LangSmith?
Experimenting with System and Human Messages in LangSmith
Frequently Asked Questions

What is LangSmith?

LangSmith is a comprehensive platform that streamlines the entire lifecycle of LLM application development, from ideation to production. It is a robust solution tailored to the unique requirements of working with LLMs, which are inherently massive and computationally intensive. When these LLM applications are deployed into production or specific use cases, they require a robust platform to evaluate their performance, enhance their speed, and trace their operational metrics.

Why is there a Need for LangSmith?

As the adoption of LLMs soars, the need for a dedicated platform to manage their complexities has become clear. Large Language Models are computationally intensive and require continuous monitoring, optimization, and collaboration for real-world effectiveness and reliability. LangSmith addresses these needs by providing a comprehensive suite of features, including the productionization of LLM applications, ensuring seamless deployment, efficient monitoring, and collaborative development.

Why Should One Choose LangSmith?

LangSmith offers a comprehensive suite of features for bringing LLMs into real-world production. Let’s explore these features:

Ease of Setup: LangSmith is user-friendly and allows rapid experiment initiation. Even a single programmer can efficiently manage and prototype AI applications with this framework.
Performance Monitoring and Visualization: Continuous monitoring and visualization are crucial for evaluating any deep learning model or application. LangSmith provides an excellent architecture for ongoing evaluation, ensuring optimal performance and reliability.
Collaborative Development: LangSmith facilitates seamless collaboration among developers, enabling efficient teamwork and streamlined project management.
Testing and Debugging: The platform simplifies the debugging process for new chains, agents, or sets of tools, ensuring quick issue resolution.
Dataset Management: LangSmith supports the creation and management of datasets for fine-tuning, few-shot prompting, and evaluation, ensuring models are trained with high-quality data.
Production Analytics: LangSmith captures detailed production analytics, providing valuable insights for continuous improvement and informed decision-making.

LangChain Integration

LangChain, a popular framework for building applications with large language models, simplifies the prototyping of LLM applications and agents. However, transitioning these applications to production can be unexpectedly challenging. Iterating on prompts, chains, and other components is essential for creating a high-quality product, and LangSmith streamlines this process by offering dedicated tools and features.

How LangSmith Comes Handy in LLM Application Development?

LangSmith addresses the critical needs of developing, deploying, and maintaining high-quality LLM applications in a production environment. With LangSmith, you can:

Quickly debug a new chain, agent, or set of tools, saving valuable time and resources.
Create and manage datasets for fine-tuning, few-shot prompting, and evaluation, ensuring your models are trained on high-quality data.
Run regression tests to advance your application confidently, minimizing the risk of introducing bugs or regressions.
Capture production analytics for product insights and continuous improvements, enabling data-driven decision-making.

Other Services LangSmith Offers for LLM Application Deployment

In addition to its core features, LangSmith offers several powerful services specifically tailored for LLM application development and deployment:

Traces: Traces provide insights into how language model calls are made using LCEL (LangChain Expression Language). You can trace the details of LLM calls to help with debugging, identify prompts that took a long time to execute, or detect failed executions. By analyzing these traces, you can improve the overall performance.
Hub: The Hub is a collaborative space for crafting, versioning, and commenting on prompts. As a team, you can create an initial version of a prompt, share it, and compare it with other versions to understand differences and improvements.
Annotation Queues: Annotation queues allow for adding human labels and feedback to traces, enhancing the accuracy and effectiveness of the LLM calls.

With its comprehensive suite of features and services, LangSmith is poised to revolutionize the way LLM applications are developed, deployed, and maintained. By addressing the unique challenges of working with these powerful models, LangSmith empowers developers and organizations to unlock the full potential of LLMs, paving the way for a future where AI-driven applications become an integral part of our daily lives.

Core Components of LangSmith UI

Core components of LangSmith's UI | debugging and testing LLMs | LLM development

LangSmith UI comprises four core components:

Projects: The Projects component is the foundation for building new LLM applications. It seamlessly integrates multiple LLM models from leading providers such as OpenAI and other organizations. This versatile component allows developers to leverage the capabilities of various LLMs, enabling them to create innovative and powerful applications tailored to their specific needs.
Datasets & Testing: Ensuring the quality and reliability of LLM applications is crucial, and LangSmith’s Datasets & Testing feature plays a pivotal role in this regard. It empowers developers to create and upload datasets designed for evaluation and training. These datasets can be used for benchmarking, establishing ground truth for evaluation, or fine-tuning the LLMs to enhance their performance and accuracy.
Annotation Queues: LangSmith recognizes the importance of human feedback in improving LLM applications. The Annotation Queues component lets users add valuable human annotations and feedback directly to their LLM projects. This feature facilitates the incorporation of human insights, helping to refine the models and enhance their effectiveness in real-world scenarios.
Prompts: The Prompts section is a centralized hub for managing and interacting with prompts essential for guiding LLM applications. Here, developers can create, modify, and experiment with prompts, tweaking them to achieve the desired results. This component streamlines the prompt development process and enables iterative improvements, ensuring that LLM applications deliver accurate and relevant responses.

With its comprehensive features and robust architecture, LangSmith empowers developers to efficiently build, test, and refine LLM applications throughout their entire lifecycle. From leveraging the latest LLM models to incorporating human feedback and managing datasets, LangSmith provides a seamless and streamlined experience, enabling developers to unlock the full potential of these powerful AI technologies.

How to Create a New Project in LangSmith?

Step 1: Explore the Default Project

Upon signing up for LangSmith, you’ll find that a default project is already enabled and ready to explore. However, as you delve deeper into LLM application development, you’ll likely want to create custom projects tailored to your needs.

Step 2: Create a New Project

To embark on this journey, simply navigate to the “Create New Project” section within the LangSmith platform. Here, you’ll be prompted to provide a name for your project, which should be descriptive and representative of the project’s purpose or domain.

Step 3: Add a Project Description

Additionally, LangSmith offers the option to include a detailed description of your project. This description can serve as a comprehensive overview, outlining the project’s objectives, intended use cases, or any other relevant information that will help you and your team members effectively collaborate and stay aligned throughout the development process.

Step 4: Incorporate Datasets

One of LangSmith’s key features is its ability to incorporate datasets for evaluation and training purposes. When creating a new project, you’ll notice a dropdown menu labeled “Choose Default.” Initially, this menu may not display any available datasets. However, LangSmith provides a seamless way to add your custom datasets.

By clicking on the “Add Dataset” button, you can upload or import the dataset you wish to use for your project. This could be a collection of text files, structured data, or any other relevant data source that will be the foundation for evaluating and fine-tuning your LLM models.

Step 5: Include Project Metadata

Furthermore, LangSmith allows you to include metadata with your project. Metadata can encompass a wide range of information, such as project tags, categories, or any other relevant details that will help you organize and manage your projects more effectively.

Step 6: Submit Your Project

Once you’ve provided the necessary project details, including the name, description (if applicable), dataset, and metadata, you can submit your new project for creation. With just a few clicks, LangSmith will set up a dedicated workspace for your LLM application development with the tools and resources you need to bring your ideas to life.

How to Create a New Project in LangSmith?

Step 7: Access and Manage Your Project

After creating your new project in LangSmith, easily access it by navigating to the “Projects” icon and sorting the list alphabetically by name.

Your newly created project will be visible. Simply click on its name or details to open the dedicated workspace tailored for LLM application development. Within this workspace, you’ll find all the necessary tools and resources to develop, test, and refine your LLM application.

How to Create a New Project in LangSmith? | debugging and testing LLMs | LLM development

Step 8: Explore the “Test-1-Demo” Section

Access the “Test-1-Demo” Section

As you delve into your new project within LangSmith, you’ll notice the “Test-1-Demo” section. This area provides a comprehensive overview of your project’s performance, including detailed information about prompt testing, LLM calls, input/output data, and latency metrics.

Understand Initial Empty Sections

Initially, since you haven’t yet tested any prompts using the Prompt Playground or executed any Root Runs or LLM Calls, the sections for “All Runs,” “Input,” “Output,” and “All About Latency” may appear empty. However, this is where LangSmith’s analysis and filtering capabilities truly shine.

Step 8.3: Utilize “Stats Total Tokens”

On the right-hand side, you’ll find the “Stats Total Tokens” section, which offers various filtering options to help you gain insights into your project’s performance. For instance, you can apply filters to identify whether there were any interruptions during the execution or to analyze the time taken to generate the output.

Let’s explore LangSmith’s default project to understand these filtering capabilities better. By navigating to the default project and accessing the “Test-1-Demo” section, you can observe real-world examples of how these filters can be applied and the insights they can provide.

Apply Filtering Options

The filtering options within LangSmith allow you to slice and dice the performance data. Moreover, they enable you to identify bottlenecks, optimize prompts, and fine-tune your LLM models for optimal efficiency and accuracy. Whether you’re interested in analyzing latency, token counts, or any other relevant metrics, LangSmith’s powerful filtering tools empower you to comprehensively understand your project’s performance, paving the way for continuous improvement and refinement.

Explore Additional Filters

You’ll find various options and filters to explore under the “Default” project in the “Test-1-Demo” section. One option lets you view data from the “Last 2 Days,” providing insights into recent performance metrics. Additionally, you can access the “LLM Calls” option. This option offers detailed information about the interactions between your application and the LLMs employed. Therefore, enabling you to optimize performance and resource utilization.

Step 9: Create and Test Prompts

To analyze your project’s performance, you’ll need to begin by creating a prompt. Navigate to the left-hand icons and select the “Prompts” option, the last icon in the list. Here, you can create a new prompt by providing a descriptive name. Once you’ve created the prompt, proceed to the “Prompt Playground” section. In this area, you can input your prompt, execute it, and observe various factors such as latency, outputs, and other performance metrics. By leveraging the “Prompt Playground,” you can gain valuable insights into your project’s behavior, enabling you to optimize root runs, LLM calls, and overall efficiency.

To explore LangSmith’s capabilities, start by navigating to the “Prompts” section, represented by the last icon on the left-hand side of the interface. Here, you can create a new prompt by providing a descriptive name. Once you’ve named your prompt, proceed to the “Prompt Playground” area. This dedicated space allows you to input and execute your prompt, enabling you to analyze its performance and observe various metrics, such as latency and outputs.

Step 11: Integrate API Keys and Models

Next, click on the “+prompt” button. You will find fields for a System Message and a Human Message. Furthermore, you can also provide your OpenAI API key to use models like ChatGPT 3.5 or enter their respective API keys to use other available models. You can test several free models.

Experimenting with System and Human Messages in LangSmith

Here’s a sample System Message and Human Message to experiment with and analyze using LangSmith:

System Message

You are a counselor who answers students’ general questions to help them with their career options. You need to extract information from the user’s message, including the student’s name, level of studies, current grades, and preferable career options.

Human Message

Good morning. I am Shruti, and I am very confused about what subjects to take in high school next semester. In class 10, I took mathematics majors and biology. I am also interested in arts as I am very good at fine arts. However, my grades in maths and biology were not very good. They went down by 0.7 CGPA from a 4 CGPA in class 9. The response should be formatted like this: {student name: “”, current level of studies: “”, current grades: “”, career: “”}

When you submit it by selecting the model, you can adjust parameters like temperature to fine-tune, tweak, and improve its performance. After receiving the output, you can monitor the results for further performance enhancement.

Experimenting with System and Human Messages in LangSmith | debugging and testing LLMs | LLM development

Return to the project icon to see an update regarding the prompt experimentation. Click on it to review and analyze the results.

When you select the prompt versions you have tested, you can review their detailed characteristics to refine and enhance the output responses.

You will see information such as the number of tokens used, latency, and associated costs. Additionally, you can apply filters on the right-side panel to identify failed prompts or those that took more than 10 seconds to generate. This allows you to experiment, conduct further analysis, and improve performance.

Using the WebUI provided by LangSmith, you can trace, evaluate, and monitor your prompt versions. You can create prompts and choose to keep them public for sharing or private. Additionally, you can experiment with annotations and datasets for benchmarking purposes.

Conclusion

In conclusion, you can create a Retrieval-Augmented Generation (RAG) application with a vector database and integrate it seamlessly with LangChain and LangSmith. This integration allows for automated updates within LangSmith, enhancing the efficiency and effectiveness of your LLM development and its application. Stay tuned for the next article to delve deeper into this process. Additionally, we will explore additional advanced features and techniques to optimize your LLM workflows further.

Frequently Asked Questions

Q1. What is the difference between LangSmith and LangChain?

A. LangSmith is a DevOps platform designed for developing, testing, deploying, and monitoring large language model (LLM) applications. It offers tools for performance monitoring, dataset management, and collaborative development. LangChain, on the other hand, is a framework for building applications using LLMs, focusing on creating and managing prompts and chains. While LangChain aids in prototyping LLM applications, LangSmith supports their productionization and operational monitoring.

Q2. Is LangSmith free to use?

A. LangSmith offers a free tier that provides access to its core features, allowing users to start developing, testing, and deploying LLM applications without initial cost. However, for advanced features, larger datasets, and more extensive usage, LangSmith may require a subscription plan or pay-as-you-go model.

Q3. Can I use LangSmith without LangChain?

A. Yes, LangSmith can be used independently of LangChain.

Q4. Can I use LangSmith locally?

A. Currently, LangSmith is primarily a cloud-based platform, providing a comprehensive suite of tools and services for LLM application development and deployment. While local usage is limited, LangSmith offers robust API and integration capabilities, allowing developers to manage aspects of their LLM applications locally while leveraging cloud resources for more intensive tasks such as monitoring and dataset management.

Sahitya Arya

I'm Sahitya Arya, a seasoned Deep Learning Engineer with one year of hands-on experience in both Deep Learning and Machine Learning. Throughout my career, I've authored more than three research papers and have gained a profound understanding of Deep Learning techniques. Additionally, I possess expertise in Large Language Models (LLMs), contributing to my comprehensive skill set in cutting-edge technologies for artificial intelligence.

Beginner Langchain Large Language Models LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Debugging And Testing LLMs in LangSmith

Introduction

Overview

Table of contents

What is LangSmith?

Why is there a Need for LangSmith?

Why Should One Choose LangSmith?

LangChain Integration

How LangSmith Comes Handy in LLM Application Development?

Other Services LangSmith Offers for LLM Application Deployment

Core Components of LangSmith UI

How to Create a New Project in LangSmith?

Step 1: Explore the Default Project

Step 2: Create a New Project

Step 3: Add a Project Description

Step 4: Incorporate Datasets

Step 5: Include Project Metadata

Step 6: Submit Your Project

Step 7: Access and Manage Your Project

Step 8: Explore the “Test-1-Demo” Section

Access the “Test-1-Demo” Section

Understand Initial Empty Sections

Step 8.3: Utilize “Stats Total Tokens”

Apply Filtering Options

Explore Additional Filters

Step 9: Create and Test Prompts

Step 11: Integrate API Keys and Models

Experimenting with System and Human Messages in LangSmith

System Message

Human Message

Conclusion

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us