Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Some teams have complex experiment pipelines and might need to run experiments remotely. Teams can still log those experiment results to Arize AX via log_experiment to maintain a record of experiments for tracking and comparing.

Steps to log an experiment

1. Store the experiment results in a dataframe

We will be logging an example experiment with three columns:
  • result is the output of the LLM pipeline.
  • correctness is the evaluation label of the experiment.
  • example_id is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.
# Example DataFrame:
experiment_run_df = pd.DataFrame(
    {
        "result": [
            "The telephone was invented by **Alexander Graham Bell**.",
            "The invention of the light bulb is commonly attributed to **Thomas Edison**"
        ],
        "label": ["correct", "incorrect"],
        "score": [1, 0],
        "explanation_text": [
            "This statement is accurate because Alexander Graham Bell is credited with inventing the telephone.",
            "This statement is inaccurate; others like Humphry Davy and Joseph Swan made earlier versions of the light bulb.",
        ],
    }
)

2. Define column mappings

This code sets up mappings that link each dataset example to example_id, the LLM output to result, and evaluator outputs to label, score, and explanation.
from arize.experiments import (
    ExperimentTaskFieldNames,
    EvaluationResultFieldNames,
)

# Define field mappings for the LLM task id and example output
task_fields = ExperimentTaskFieldNames(
    example_id="example_id", output="result"
)

# Define field mappings for evaluator
evaluator_fields = EvaluationResultFieldNames(
    label="label",
    score="score",
    explanation="explanation_text",
)

# This maps the dataset ID to the example_id
dataset_examples = client.datasets.list_examples(dataset_id=dataset_id, all=True)
dataset_df = dataset_examples.to_df()
experiment_run_df["example_id"] = dataset_df["id"]

3. Log the experiment

Log the experiment to Arize AX using the columns and label for correctness.
from arize import ArizeClient

client = ArizeClient(api_key="your-arize-api-key")

experiment = client.experiments.create(
    name="my_experiment",
    dataset_id=dataset_id,
    experiment_runs=experiment_run_df,
    task_fields=task_fields,
    evaluator_columns={"correctness": evaluator_fields},
)