
Google Colab
- Create a Agentic RAG QA chatbot with OpenAI, Langgraph, Couchbase and Agent Catalog
- Trace the agent’s function calls including retrieval and LLM calls using Arize AX
- Create a dataset to benchmark performance
- Evaluate performance using LLM as a judge
- Experiment with different chunk sizes, overlaps, and k number of documents retrieved to see how these affect the performance of the Agentic RAG
- Compare these experiments in Arize AX
Notebook Setup
First, let’s download the required packages and set our API keys:Setup Couchbase
You’ll need to setup your Couchbase cluster by doing the following:1
Create an account at Couchbase Cloud
2
Create a free cluster with the Data, Index, and Search services enabled*
3
Create cluster access credentials
4
Allow access to the cluster from your local machine
5
Create a bucket to store your documents
Initialize Couchbase cluster
Once you’ve setup your cluster, you can connect to it using langchain’s couchbase package. Collect the following information from your cluster:- Connection string
- Username
- Password
- Bucket name
- Scope name
- Collection name
Retriever Tool
Create tools and prompts with Agent Catalog
Fetch our retriever tool from the Agent Catalog using the agentc provider. In the future, when more tools (and/or prompts) are required and the application grows more complex, Agent Catalog SDK and CLI can be used to automatically fetch the tools based on the use case (semantic search) or by name. For instructions on how this tool was created and more capabilities of Agent catalog, please refer to the documentation here.Create Agent
Agent State
We will define a graph of agents to help all involved agents communicate with each other better.Agents communicate through a
state object that is passed around to each node and modified with output from that node.
Our state will be a list of messages and each node in our graph will append to it.
Define the Nodes and Edges
We can lay out an agentic RAG graph like this:- The state is a set of messages
- Each node will update (append to) state
- Conditional edges decide which node to visit next
Define Graph
- Start with an agent,
call_model - Agent makes a decision to call a function
- If so, then
actionto call tool (retriever) - Then call agent with the tool output added to messages (
state)
Visualize the graph
Run the graph
View the trace in the Arize AX UI
Once you’ve run a single query, you can see the trace in the Arize AX UI with each step taken by the retriever, the embedding, and the LLM query. Click through the queries to better understand how the query engine is performing. Arize AX can be used to understand and troubleshoot your RAG app by surfacing:- Application latency
- Token usage
- Runtime exceptions
- Retrieved documents
- Embeddings
- LLM parameters
- Prompt templates
- Tool descriptions
- LLM function calls
- And more!

Generate a synthetic dataset of questions
We will run our Agent against the dataset of questions we generate, and then evaluate the results.Evaluating your Agentic RAG using LLM as a Judge
Now that we have run a set of test cases, we can create evaluators to measure performance of our run. This way, we don’t have to manually inspect every single trace to see if the LLM is doing the right thing. First, we’ll define the prompts for the evaluators. There are two evaluators we will use for this example.- Retrieval Relevance: This evaluator checks if the reference text selected by the retriever is relevant to the question.
- QA Correctness: This evaluator checks if the answer correctly answers the question based on the reference text provided.
llm_classify function. This function uses LLMs to evaluate your LLM calls and gives them labels and explanations. You can read more detail here.
Experiment with different k-values and chunk sizes
Re-run experiments with different k-values and chunk sizes. Then log the results to Arize AX to see how the performance changes. Let’s setup our evaluators to see how the performance changes.