Skip to main content
https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/cookbooks/gc.png

Google Colab

In this tutorial we will:
  1. Build a RAG application using Llama-Index
  2. Set up Phoenix as a trace collector for the Llama-Index application
  3. Use Phoenix’s evals library to compute LLM generated evaluations of our RAG app responses
  4. Use arize SDK to export the traces and evaluations to Arize AX
You can read more about LLM tracing in Arize AX here.

Install Dependencies

Let’s get the notebook setup with dependencies.
# Dependencies needed to build the Llama Index RAG application
!pip install -qq gcsfs llama-index-llms-openai llama-index-embeddings-openai llama-index-core

# Dependencies needed to export spans and send them to our collector: Phoenix
!pip install -qq llama-index-callbacks-arize-phoenix

# Install Phoenix to generate evaluations
!pip install -qq "arize-phoenix[evals]>7.0.0"

# Install Arize AX SDK with `Tracing` extra dependencies to export Phoenix data to Arize AX
!pip install -qq "arize>7.29.0"

Set up Phoenix as a Trace Collector in our LLM app

To get started, launch the phoenix app. Make sure to open the app in your browser using the link below.
import phoenix as px

session = px.launch_app()
Once you have started a Phoenix server, you can start your LlamaIndex application and configure it to send traces to Phoenix. To do this, you will have to add configure Phoenix as the global handler
from llama_index.core import set_global_handler

set_global_handler("arize_phoenix")
That’s it! The Llama-Index application we build next will send traces to Phoenix.

Build Your Llama Index RAG Application

We start by setting your OpenAI API key if it is not already set as an environment variable.
import os
from getpass import getpass

OPENAI_API_KEY = globals().get("OPENAI_API_KEY") or getpass(
    "πŸ”‘ Enter your OpenAI API key: "
)
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
This example uses a RetrieverQueryEngine over a pre-built index of the Arize AX documentation, but you can use whatever LlamaIndex application you like. Download the pre-built index of the Arize AX docs from cloud storage and instantiate your storage context.
from gcsfs import GCSFileSystem
from llama_index.core import StorageContext

file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)
We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.
from llama_index.llms.openai import OpenAI
from llama_index.core import (
    Settings,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()
Let’s test our app by asking a question about the Arize AX documentation:
response = query_engine.query(
    "What is Arize AX and how can it help me as an AI Engineer?"
)
print(response)
Great! Our application works!

Use the instrumented Query Engine

We will download a dataset of questions for our RAG application to answer.
from urllib.request import urlopen
import json

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])

queries[:5]
We use the instrumented query engine and get responses from our RAG app.
from tqdm.notebook import tqdm

N = 10  # Sample size
qa_pairs = []
for query in tqdm(queries[:N]):
    resp = query_engine.query(query)
    qa_pairs.append((query, resp))
To see the questions and answers in phoenix, use the link described when we started the phoenix server

Run Evaluations on the data in Phoenix

We will use the phoenix client to extract data in the correct format for specific evaluations and the custom evaluators, also from phoenix, to run evaluations on our RAG application.
from phoenix.session.evaluation import get_qa_with_reference

px_client = px.Client()  # Define phoenix client
queries_df = get_qa_with_reference(
    px_client
)  # Get question, answer and reference data from phoenix
Next, we enable concurrent evaluations for better performance.
import nest_asyncio

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
Then, we define our evaluators and run the evaluations
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    run_evals,
)

eval_model = OpenAIModel(
    model="gpt-4o",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
Finally, we log the evaluations into Phoenix
from phoenix.trace import SpanEvaluations

px_client.log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(
        eval_name="QA_Correctness", dataframe=qa_correctness_eval_df
    ),
)

Export data to Arize

Get data into dataframes

We extract the spans and evals dataframes from the phoenix client
tds = px_client.get_trace_dataset()
spans_df = tds.get_spans_dataframe(include_evaluations=False)
spans_df.head()
evals_df = tds.get_evals_dataframe()
evals_df.head()

Initialize Arize Client

from arize.pandas.logger import Client
Sign up/log in to your Arize AX account here. Find your space ID and API key. Copy/paste into the cell below.
SPACE_ID = globals().get("SPACE_ID") or getpass(
    "πŸ”‘ Enter your Arize AX Space ID: "
)
API_KEY = globals().get("API_KEY") or getpass("πŸ”‘ Enter your Arize AX API Key: ")

arize_client = Client(
    space_id=SPACE_ID,
    api_key=API_KEY,
)
model_id = "tutorial-tracing-llama-index-rag-export-from-phoenix"
model_version = "1.0"
Lastly, we use log_spans from the arize client to log our spans data and, if we have evaluations, we can pass the optional evals_dataframe.
response = arize_client.log_spans(
    dataframe=spans_df,
    evals_dataframe=evals_df,
    model_id=model_id,
    model_version=model_version,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(
        f"❌ logging failed with response code {response.status_code}, {response.text}"
    )
else:
    print("βœ… You have successfully logged traces set to Arize AX")