The advancements in LLM world is growing fast and the next chapter in AI application development is here. LangChain Expression Language (LCEL) isn’t just an upgrade—it’s a game-changer. Initially known for proof-of-concepts, LangChain has rapidly evolved into a powerhouse Python library for LLM interactions. With the introduction of LCEL in August 2023, it’s now easier than ever to turn ideas into robust, scalable applications. This blog dives deep into LCEL, demonstrating its knack for simplifying complex workflows and empowering developers to harness the full potential of AI. Whether you’re new to LLM applications or a seasoned coder, LCEL promises to revolutionize how you build and deploy custom LLM chains.
In this article, we’ll learn what LCEL is, how it works, and the essentials of LCEL chains, pipes, and Runnables.
This article was published as a part of the Data Science Blogathon.
A “minimalist” code layer for creating chains of LangChain components is made possible by the LangChain Expression Language (LCEL), which is an abstraction of some intriguing Python ideas. It basically uses the pipe operator which is similar to Unix commands where we can pass output of previous function to next function using pipe operator.
LCEL comes with strong support for:
Using LCEL we create our chain differently using pipe operators (|) rather than Chains objects.
Let us first refresh some concepts related to LLM chain creation . A basic LLM Chain consists of following 3 components there can be many variations into this which we will learn later in code examples.
Let us understand how pipe operator works by creating our own small pipe friendly function.
When the Python interpreter sees the | operator between two objects (like a | b) it attempts to feed object a into the __or__ method of object b. That means these patterns are equivalent:
Let us use this pipe operator to create our own Runnable Class. It will consume a function and turn it into a function which can be chained with other functions using | operator.
class Runnable:
def __init__(self, func):
self.func = func
def __or__(self, other):
print('or')
def chained_func(*args, **kwargs):
# this is nested function in which we create chain of funtion
#here the other function will consume output on this first function
#upon which we call the or operator first element
return other(self.func(*args, **kwargs))
print('chained func end')
return Runnable(chained_func)
def __call__(self, *args, **kwargs):
return self.func(*args, **kwargs)
#Let's implement this to take the value 3, add 5
Now let us use this runnable class to chain 2 functions together one is double and second is add one . The below code chains these 2 functions together on input 5.
def double(x):
return 2 * x
def add_one(x):
return x + 1
# wrap the functions with Runnable
runnnable_double = Runnable(double)
runnable_add_one = Runnable(add_one)
# run them using the object approach
chain = runnnable_double.__or__(runnable_add_one)
chain(5) # should return 11
#chain the runnable functions together
double_then_add_one = runnnable_double | runnable_add_one
#invoke the chainLCEL
result = double_then_add_one(5)
print(result) # Output: 11
Let us understand the working of above code one by one :
runnable_double | runnable_add_one: This operation triggers the __or__ magic method (operator method) of runnable_double.
double_then_add_one(5): This calls the calls the __call__ method of the double_then_add_one object.
In essence, the Runnable class and the overloaded | operator provide a mechanism to chain functions together, where the output of one function becomes the input of the next. This can lead to more readable and maintainable code when dealing with a series of function calls.
Now we will create a simple LLM chain using LCEL to see how it makes code more readable and intuitive.
# Install Libraries
!pip install langchain_cohere langchain --quiet
We need to generate the free API key for using Cohere LLM. Visit website and log in using Google account or github account. Once logged in you will land at a cohere dashboard page as shown below.
Click on API Keys option . You will see a Trial Free API key is generated.
### Setup Keys
import os
os.environ["COHERE_API_KEY"] = "YOUR API KEY"
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_cohere import ChatCohere
from langchain.schema.output_parser import StrOutputParser
# LLM Instance
llm = ChatCohere(model="command-r", temperature=0)
#Create Prompt
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
#Create Ouput Parser
output_parser = StrOutputParser()
# LCEL CHAIN
chain = prompt | llm | output_parser
question = """
I have five apples. I throw two away. I eat one. How many apples do I have left?
"""
response = chain.invoke({"question": question})
print(response)
When we are working with LCEL we may have the need to modify the flow of values, or the values themselves as they are passed between components — for this, we can use runnables. We can understand how to use Runnables class provided by Langchain using RAG example.
One point about LangChain Expression Language is that any two runnables can be “chained” together into sequences. The output of the previous runnable’s .invoke() call is passed as input to the next runnable. This can be done using the pipe operator (|), or the more explicit .pipe() method, which does the same thing.
We shall learn about 3 types of Runnables
The workflow for the RAG is defined in the image below . Let us now build this RAG to understand usage of Runnable Interfaces.
!pip install --quiet langchain langchain_cohere langchain_community docarray
We create 2 vector stores to demonstrate the use of Runnable Parallel and Pass through
from langchain.embeddings import CohereEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch
embedding = CohereEmbeddings(
model="embed-english-light-v3.0",
)
vecstore_a = DocArrayInMemorySearch.from_texts(
["half the info will be here", "Zoozoo birthday is the 17th September"],
embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
["and half here", "Zoozoo was born in 1990"],
embedding=embedding
)
Here the input to the “chain.invoke” will be passed to component retrieval where this input is simultaneously passed to two different paths. One is to retriever_a whose output is stored in context and passed to next component in chain. The RunnablePassthrough object is used as a “passthrough” take takes any input to the current component (retrieval) and allows us to provide it in the component output via the “question” key. Thus input question is available to prompt component in “question” key.
from langchain_core.runnables import (
RunnableParallel,
RunnablePassthrough
)
retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()
# LLM Instance
llm = ChatCohere(model="command-r", temperature=0)
prompt_str = """Answer the question below using the context:
Context: {context}
Question: {question}
Answer: """
prompt = ChatPromptTemplate.from_template(prompt_str)
retrieval = RunnableParallel(
{"context": retriever_a, "question": RunnablePassthrough()}
)
chain = retrieval | prompt | llm | output_parser
Invoke chain
out = chain.invoke("when was Zoozoo born exact year?")
print(out)
Output:
We now pass the question to both retrievers parallelly to provide additional context in the prompt.
# Using Both retrievers parallely
prompt_str = """Answer the question below using the context:
Context:
{context_a}
{context_b}
Question: {question}
Answer: """
prompt = ChatPromptTemplate.from_template(prompt_str)
retrieval = RunnableParallel(
{
"context_a": retriever_a, "context_b": retriever_b,
"question": RunnablePassthrough()
}
)
chain = retrieval | prompt | llm | output_parser
Output:
out = chain.invoke("when was Zoozoo born exact date?")
print(out)
Now we will see an example of using runnable Lambda for normal python function similar to what we did earlier in understanding or operator
from langchain_core.runnables import RunnableLambda
def add_five(x):
return x + 5
def multiply_by_two(x):
return x * 2
# wrap the functions with RunnableLambda
add_five = RunnableLambda(add_five)
multiply_by_two = RunnableLambda(multiply_by_two)
chain = add_five | multiply_by_two
chain.invoke(3)
We can use runnable lambda to define our own custom functions and add them into llm chain.
The output of LLM response contains different attributes we will create a custom function extract_token to display token count for input question and output response
prompt_str = "You know 1 short line about {topic}?"
prompt = ChatPromptTemplate.from_template(prompt_str)
def extract_token(x):
token_count = x.additional_kwargs['token_count']
response=f'''{x.content} \n Input Token Count: {token_count['input_tokens']}
\n Output Token Count:{token_count['output_tokens']}'''
return response
get_token = RunnableLambda(extract_token)
chain = prompt | llm | get_token
Output:
output = chain.invoke({"topic": "Artificial Intelligence"})
print(output)
LCEL has a number of other features also such as async stream batch processing .
prompt_str = "You know 1 short line about {topic}?"
prompt = ChatPromptTemplate.from_template(prompt_str)
chain = prompt | llm | output_parser
# ---------invoke--------- #
result_with_invoke = chain.invoke("AI")
# ---------batch--------- #
result_with_batch = chain.batch(["AI", "LLM", "Vector Database"])
print(result_with_batch)
# ---------stream--------- #
for chunk in chain.stream("Artificial Intelligence write 5 lines"):
print(chunk, flush=True, end="")
Your application’s frontend and backend are typically independent, which means that requests are made to the backend from the frontend. You may need to manage several requests on your backend at once if you have numerous users.
Since most of the code in LangChain is just waiting between API calls, we can leverage asynchronous code to improve API scalability, if you want to understand why it is important I recommend reading the concurrent burgers story of the FastAPI documentation. There is no need to worry about the implementation, because async methods are already available if you use LCEL:
We can use asynchronous code to increase API scalability because the majority of LangChain’s code consists of basically waiting between API requests. If we use LCEL, async methods are already accessible, thus we don’t need to bother about implementation:
.ainvoke() / .abatch() / .astream: asynchronous versions of invoke, batch and stream.
Langchain achieved those “out of the box” features by creating a unified interface called “Runnable”.
LangChain Expression Language introduces a revolutionary approach to Python application development. Despite its unique syntax, LCEL offers a unified interface that streamlines industrialization with built-in features like streaming, asynchronous processing, and dynamic configurations. Automatic parallelization enhances performance by executing tasks concurrently, enhancing overall efficiency. Furthermore, LCEL’s composability empowers developers to effortlessly create and customize chains, ensuring code remains flexible and adaptable to changing requirements. Embracing LCEL promises not only streamlined development but also optimized execution, making it a compelling choice for modern Python applications.
A. LCEL enables automatic parallelization of tasks, which enhances execution speed by running multiple operations concurrently.
A. Runnable interfaces allow developers to chain functions easily, improving code readability and maintainability.
A. LCEL provides async methods like .ainvoke(), .abatch(), and .astream(), which handle multiple requests efficiently, enhancing API scalability.
A. LCEL is Not fully PEP compliant , LCEL is DSL (domain specific language) , there is input output dependencies, if we want to access intermediate outputs then we have to pass it all the way to the end of the chain
A. Developers should consider LCEL for its unified interface, composability, and advanced features, making it ideal for building scalable and efficient Python applications.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.