Debugging And Testing LLMs in LangSmith

Sahitya Arya 13 Jun, 2024
11 min read


With the advancements in Artificial Intelligence, developing and deploying large language model (LLM) applications has become increasingly complex and demanding. To address these challenges, let’s explore LangSmith. LangSmith is a new cutting-edge DevOps platform designed to develop, collaborate, test, deploy, and monitor LLM applications. This article will explore how to debug and test LLMs in LangSmith.


  • Learn about LangSmith to simplify the development, testing, deployment, and monitoring of large language model (LLM) applications.
  • Gain an understanding of why LangSmith is essential in managing the complexities of LLMs.
  • Discover the comprehensive suite of features LangSmith offers.
  • Learn how LangSmith integrates with LangChain to streamline the transition from prototyping to production.
  • Understand the core components of LangSmith’s user interface to manage and refine LLM applications effectively.

What is LangSmith?

LangSmith is a comprehensive platform that streamlines the entire lifecycle of LLM application development, from ideation to production. It is a robust solution tailored to the unique requirements of working with LLMs, which are inherently massive and computationally intensive. When these LLM applications are deployed into production or specific use cases, they require a robust platform to evaluate their performance, enhance their speed, and trace their operational metrics.

Why is there a Need for LangSmith?

As the adoption of LLMs soars, the need for a dedicated platform to manage their complexities has become clear. Large Language Models are computationally intensive and require continuous monitoring, optimization, and collaboration for real-world effectiveness and reliability. LangSmith addresses these needs by providing a comprehensive suite of features, including the productionization of LLM applications, ensuring seamless deployment, efficient monitoring, and collaborative development.

Why Should One Choose LangSmith?

LangSmith offers a comprehensive suite of features for bringing LLMs into real-world production. Let’s explore these features:

  • Ease of Setup: LangSmith is user-friendly and allows rapid experiment initiation. Even a single programmer can efficiently manage and prototype AI applications with this framework.
  • Performance Monitoring and Visualization: Continuous monitoring and visualization are crucial for evaluating any deep learning model or application. LangSmith provides an excellent architecture for ongoing evaluation, ensuring optimal performance and reliability.
  • Collaborative Development: LangSmith facilitates seamless collaboration among developers, enabling efficient teamwork and streamlined project management.
  • Testing and Debugging: The platform simplifies the debugging process for new chains, agents, or sets of tools, ensuring quick issue resolution.
  • Dataset Management: LangSmith supports the creation and management of datasets for fine-tuning, few-shot prompting, and evaluation, ensuring models are trained with high-quality data.
  • Production Analytics: LangSmith captures detailed production analytics, providing valuable insights for continuous improvement and informed decision-making.

LangChain Integration

LangChain, a popular framework for building applications with large language models, simplifies the prototyping of LLM applications and agents. However, transitioning these applications to production can be unexpectedly challenging. Iterating on prompts, chains, and other components is essential for creating a high-quality product, and LangSmith streamlines this process by offering dedicated tools and features.

How LangSmith Comes Handy in LLM Application Development?

LangSmith addresses the critical needs of developing, deploying, and maintaining high-quality LLM applications in a production environment. With LangSmith, you can:

  • Quickly debug a new chain, agent, or set of tools, saving valuable time and resources.
  • Create and manage datasets for fine-tuning, few-shot prompting, and evaluation, ensuring your models are trained on high-quality data.
  • Run regression tests to advance your application confidently, minimizing the risk of introducing bugs or regressions.
  • Capture production analytics for product insights and continuous improvements, enabling data-driven decision-making.

Other Services LangSmith Offers for LLM Application Deployment

In addition to its core features, LangSmith offers several powerful services specifically tailored for LLM application development and deployment:

  • Traces: Traces provide insights into how language model calls are made using LCEL (LangChain Expression Language). You can trace the details of LLM calls to help with debugging, identify prompts that took a long time to execute, or detect failed executions. By analyzing these traces, you can improve the overall performance.
  • Hub: The Hub is a collaborative space for crafting, versioning, and commenting on prompts. As a team, you can create an initial version of a prompt, share it, and compare it with other versions to understand differences and improvements.
  • Annotation Queues: Annotation queues allow for adding human labels and feedback to traces, enhancing the accuracy and effectiveness of the LLM calls.

With its comprehensive suite of features and services, LangSmith is poised to revolutionize the way LLM applications are developed, deployed, and maintained. By addressing the unique challenges of working with these powerful models, LangSmith empowers developers and organizations to unlock the full potential of LLMs, paving the way for a future where AI-driven applications become an integral part of our daily lives.

Core Components of LangSmith UI

Core components of LangSmith's UI | debugging and testing LLMs | LLM development

LangSmith UI comprises four core components:

  • Projects: The Projects component is the foundation for building new LLM applications. It seamlessly integrates multiple LLM models from leading providers such as OpenAI and other organizations. This versatile component allows developers to leverage the capabilities of various LLMs, enabling them to create innovative and powerful applications tailored to their specific needs.
  • Datasets & Testing: Ensuring the quality and reliability of LLM applications is crucial, and LangSmith’s Datasets & Testing feature plays a pivotal role in this regard. It empowers developers to create and upload datasets designed for evaluation and training. These datasets can be used for benchmarking, establishing ground truth for evaluation, or fine-tuning the LLMs to enhance their performance and accuracy.
  • Annotation Queues: LangSmith recognizes the importance of human feedback in improving LLM applications. The Annotation Queues component lets users add valuable human annotations and feedback directly to their LLM projects. This feature facilitates the incorporation of human insights, helping to refine the models and enhance their effectiveness in real-world scenarios.
  • Prompts: The Prompts section is a centralized hub for managing and interacting with prompts essential for guiding LLM applications. Here, developers can create, modify, and experiment with prompts, tweaking them to achieve the desired results. This component streamlines the prompt development process and enables iterative improvements, ensuring that LLM applications deliver accurate and relevant responses.

With its comprehensive features and robust architecture, LangSmith empowers developers to efficiently build, test, and refine LLM applications throughout their entire lifecycle. From leveraging the latest LLM models to incorporating human feedback and managing datasets, LangSmith provides a seamless and streamlined experience, enabling developers to unlock the full potential of these powerful AI technologies.

How to Create a New Project in LangSmith?

Step 1: Explore the Default Project

Upon signing up for LangSmith, you’ll find that a default project is already enabled and ready to explore. However, as you delve deeper into LLM application development, you’ll likely want to create custom projects tailored to your needs.

Step 2: Create a New Project

To embark on this journey, simply navigate to the “Create New Project” section within the LangSmith platform. Here, you’ll be prompted to provide a name for your project, which should be descriptive and representative of the project’s purpose or domain.

Step 3: Add a Project Description

Additionally, LangSmith offers the option to include a detailed description of your project. This description can serve as a comprehensive overview, outlining the project’s objectives, intended use cases, or any other relevant information that will help you and your team members effectively collaborate and stay aligned throughout the development process.

Step 4: Incorporate Datasets

One of LangSmith’s key features is its ability to incorporate datasets for evaluation and training purposes. When creating a new project, you’ll notice a dropdown menu labeled “Choose Default.” Initially, this menu may not display any available datasets. However, LangSmith provides a seamless way to add your custom datasets.

By clicking on the “Add Dataset” button, you can upload or import the dataset you wish to use for your project. This could be a collection of text files, structured data, or any other relevant data source that will be the foundation for evaluating and fine-tuning your LLM models.

Step 5: Include Project Metadata

Furthermore, LangSmith allows you to include metadata with your project. Metadata can encompass a wide range of information, such as project tags, categories, or any other relevant details that will help you organize and manage your projects more effectively.

Step 6: Submit Your Project

Once you’ve provided the necessary project details, including the name, description (if applicable), dataset, and metadata, you can submit your new project for creation. With just a few clicks, LangSmith will set up a dedicated workspace for your LLM application development with the tools and resources you need to bring your ideas to life.

How to Create a New Project in LangSmith?

Step 7: Access and Manage Your Project

After creating your new project in LangSmith, easily access it by navigating to the “Projects” icon and sorting the list alphabetically by name. 

Your newly created project will be visible. Simply click on its name or details to open the dedicated workspace tailored for LLM application development. Within this workspace, you’ll find all the necessary tools and resources to develop, test, and refine your LLM application.

How to Create a New Project in LangSmith? | debugging and testing LLMs | LLM development
How to Create a New Project in LangSmith?

Step 8: Explore the “Test-1-Demo” Section

Access the “Test-1-Demo” Section

As you delve into your new project within LangSmith, you’ll notice the “Test-1-Demo” section. This area provides a comprehensive overview of your project’s performance, including detailed information about prompt testing, LLM calls, input/output data, and latency metrics.

Understand Initial Empty Sections

Initially, since you haven’t yet tested any prompts using the Prompt Playground or executed any Root Runs or LLM Calls, the sections for “All Runs,” “Input,” “Output,” and “All About Latency” may appear empty. However, this is where LangSmith’s analysis and filtering capabilities truly shine.

Step 8.3: Utilize “Stats Total Tokens”

On the right-hand side, you’ll find the “Stats Total Tokens” section, which offers various filtering options to help you gain insights into your project’s performance. For instance, you can apply filters to identify whether there were any interruptions during the execution or to analyze the time taken to generate the output.

Let’s explore LangSmith’s default project to understand these filtering capabilities better. By navigating to the default project and accessing the “Test-1-Demo” section, you can observe real-world examples of how these filters can be applied and the insights they can provide.

Apply Filtering Options

The filtering options within LangSmith allow you to slice and dice the performance data. Moreover, they enable you to identify bottlenecks, optimize prompts, and fine-tune your LLM models for optimal efficiency and accuracy. Whether you’re interested in analyzing latency, token counts, or any other relevant metrics, LangSmith’s powerful filtering tools empower you to comprehensively understand your project’s performance, paving the way for continuous improvement and refinement.

How to Create a New Project in LangSmith? | debugging and testing LLMs | LLM development

Explore Additional Filters

You’ll find various options and filters to explore under the “Default” project in the “Test-1-Demo” section. One option lets you view data from the “Last 2 Days,” providing insights into recent performance metrics. Additionally, you can access the “LLM Calls” option. This option offers detailed information about the interactions between your application and the LLMs employed. Therefore, enabling you to optimize performance and resource utilization.

How to Create a New Project in LangSmith?

Step 9: Create and Test Prompts

To analyze your project’s performance, you’ll need to begin by creating a prompt. Navigate to the left-hand icons and select the “Prompts” option, the last icon in the list. Here, you can create a new prompt by providing a descriptive name. Once you’ve created the prompt, proceed to the “Prompt Playground” section. In this area, you can input your prompt, execute it, and observe various factors such as latency, outputs, and other performance metrics. By leveraging the “Prompt Playground,” you can gain valuable insights into your project’s behavior, enabling you to optimize root runs, LLM calls, and overall efficiency.

How to Create a New Project in LangSmith? | debugging and testing LLMs | LLM development

To explore LangSmith’s capabilities, start by navigating to the “Prompts” section, represented by the last icon on the left-hand side of the interface. Here, you can create a new prompt by providing a descriptive name. Once you’ve named your prompt, proceed to the “Prompt Playground” area. This dedicated space allows you to input and execute your prompt, enabling you to analyze its performance and observe various metrics, such as latency and outputs.

Step 11: Integrate API Keys and Models

Next, click on the “+prompt” button. You will find fields for a System Message and a Human Message. Furthermore, you can also provide your OpenAI API key to use models like ChatGPT 3.5 or enter their respective API keys to use other available models. You can test several free models.

How to Create a New Project in LangSmith?

Experimenting with System and Human Messages in LangSmith

Here’s a sample System Message and Human Message to experiment with and analyze using LangSmith:

System Message

You are a counselor who answers students’ general questions to help them with their career options. You need to extract information from the user’s message, including the student’s name, level of studies, current grades, and preferable career options.

Human Message

Good morning. I am Shruti, and I am very confused about what subjects to take in high school next semester. In class 10, I took mathematics majors and biology. I am also interested in arts as I am very good at fine arts. However, my grades in maths and biology were not very good. They went down by 0.7 CGPA from a 4 CGPA in class 9. The response should be formatted like this: {student name: “”, current level of studies: “”, current grades: “”, career: “”}

Experimenting with System and Human Messages in LangSmith

When you submit it by selecting the model, you can adjust parameters like temperature to fine-tune, tweak, and improve its performance. After receiving the output, you can monitor the results for further performance enhancement.

Experimenting with System and Human Messages in LangSmith | debugging and testing LLMs | LLM development

Return to the project icon to see an update regarding the prompt experimentation. Click on it to review and analyze the results.

Experimenting with System and Human Messages in LangSmith

When you select the prompt versions you have tested, you can review their detailed characteristics to refine and enhance the output responses.

You will see information such as the number of tokens used, latency, and associated costs. Additionally, you can apply filters on the right-side panel to identify failed prompts or those that took more than 10 seconds to generate. This allows you to experiment, conduct further analysis, and improve performance.

Experimenting with System and Human Messages in LangSmith | debugging and testing LLMs | LLM development

Using the WebUI provided by LangSmith, you can trace, evaluate, and monitor your prompt versions. You can create prompts and choose to keep them public for sharing or private. Additionally, you can experiment with annotations and datasets for benchmarking purposes.


In conclusion, you can create a Retrieval-Augmented Generation (RAG) application with a vector database and integrate it seamlessly with LangChain and LangSmith. This integration allows for automated updates within LangSmith, enhancing the efficiency and effectiveness of your LLM development and its application. Stay tuned for the next article to delve deeper into this process. Additionally, we will explore additional advanced features and techniques to optimize your LLM workflows further.

Frequently Asked Questions

Q1. What is the difference between LangSmith and LangChain?

A. LangSmith is a DevOps platform designed for developing, testing, deploying, and monitoring large language model (LLM) applications. It offers tools for performance monitoring, dataset management, and collaborative development. LangChain, on the other hand, is a framework for building applications using LLMs, focusing on creating and managing prompts and chains. While LangChain aids in prototyping LLM applications, LangSmith supports their productionization and operational monitoring.

Q2. Is LangSmith free to use?

A. LangSmith offers a free tier that provides access to its core features, allowing users to start developing, testing, and deploying LLM applications without initial cost. However, for advanced features, larger datasets, and more extensive usage, LangSmith may require a subscription plan or pay-as-you-go model.

Q3. Can I use LangSmith without LangChain?

A. Yes, LangSmith can be used independently of LangChain.

Q4. Can I use LangSmith locally?

A. Currently, LangSmith is primarily a cloud-based platform, providing a comprehensive suite of tools and services for LLM application development and deployment. While local usage is limited, LangSmith offers robust API and integration capabilities, allowing developers to manage aspects of their LLM applications locally while leveraging cloud resources for more intensive tasks such as monitoring and dataset management.

Sahitya Arya 13 Jun, 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers