Let’s be honest! Building AI agents is exciting but debugging them, not so much. As we are pushing the boundaries of agentic AI the complexity of our system is skyrocketing. We have all been there staring at a trace with hundreds of steps, trying to figure out why agent hallucinated or chose the wrong tool. Integrated into LangSmith, Polly is an AI-powered assistant designed to help developers debug, analyze, and engineer better agents. It is a meta layer of intelligence, ironically an Agent for Agents. This article goes over Polly setup, its capabilities, and how it helps in creating better agents.
The transition from simple LLM chains to autonomous agents has introduced a new class of debugging challenges that manual inspection can no longer solve efficiently. Langchain identified that agents are fundamentally harder to engineer due to three factors:
Polly solves this by acting as a partner that understands agent’s architectures, allowing you to bypass manual log scanning and instead ask natural language questions about your system’s performance.
Since Polly is an embedded feature of LangSmith, you don’t install Polly directly. Instead, you enable LangSmith tarcing in your application. Once your agent’s data is flowing into the platform, Polly activates automatically.
First, ensure you have LangSmith SDK in your environment. Run the following command in the command line of your operating system:
pip install –U langsmith
Get your API key from the LangSmith setting page and set the folowing environment variables. This tells your application to start logging traces to LangSmith cloud.
import os
# Enable tracing (required for Polly to see your data)
os.environ["LANGSMITH_TRACING"] = "true"
# Set your API Key
os.environ["LANGSMITH_API_KEY"] = "ls__..."
# Optional: Organize your traces into a specific project
os.environ["LANGSMITH_PROJECT"] = "my-agent-production"
That’s it, If you’re using LangChain, tracing is automatic. If, you’re using the OpenAI SDK directly wrap your client to enable visibility.
from openai import OpenAI
from langsmith import wrappers
# Wrap the OpenAI client to capture inputs/outputs automatically
client = wrappers.wrap_openai(OpenAI())
# Run your agent as normal
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze the latest Q3 financial report."}]
)
Once you run the above steps, navigate to the trace view or threads view in the LangSmith UI. You will see a Polly icon in the bottom right corner.
Polly is not just a chatbot wrapper. It is deeply integrated into the LangSmith infrastructure to perform three critical tasks:
In the Trace view, Polly analyses individual agent executions to identify subtle failure modes that might be buried in the middle of a long run. You can ask specific diagnostic questions like:

Polly doesn’t just surface information. It understands agent behaviour patterns and can identify issues you’d miss.
Debugging state is notoriously difficult, especially when an agent works fine for ten turns and fails on the eleventh. Polly can access information from entire conversation threads, allowing it to spot patterns over time, summarize interactions, and identify exactly when and why an agent lost track of critical context.
You can ask questions like:

This is especially powerful for debugging those frustrating issues where the agent was working fine and then suddenly it wasn’t. Polly can pinpoint exactly where and why things changed.
Perhaps the most powerful feature for developers is Polly’s ability to act as an expert prompt engineer. The system prompt is the brain of any deep agent, and Polly can help iterate on it. You can describe the desired behaviour in natural language, and polly will update the prompt, define structured output schemas, configure tool definitions, and optimize prompt length without losing critical instructions.

Polly’s intelligence is built on top of LangSmith robust tracing infrastructure which captures everything your agent does. It ingests three layers of data.
Because LangSmith already captures the inputs, outputs, latency, and token counts for every step, Polly has perfect information about the agent’s world. It doesn’t need to guess what happened.
Polly represents a significant shift in how we approach the lifecycle of AI development. It acknowledges that as our agents become more autonomous and complex, the tools we use to maintain them must evolve in parallel. By transforming debugging from a manual, forensic search through logs into a natural language dialogue, Polly allows developers to focus less on hunting for errors and more on architectural improvements. Ultimately, having an intelligent partner that understands your system’s state isn’t just a convenience, it is becoming a necessity for engineering the next generation of reliable, production-grade agents.
A. It helps you debug and analyze complex agents without digging through enormous prompts or long traces. You can ask direct questions about mistakes, decision points, or odd behavior, and Polly pulls the answers from your LangSmith data.
A. You just turn on LangSmith tracing with the SDK and your API key. Once your agent runs and logs show up in LangSmith, Polly becomes available automatically in the UI.
A. It has full access to runs, traces, and threads, so it understands how your agent works internally. That context lets it diagnose failures, track long-term behavior, and even help refine system prompts.