Skip to main content
Some teams have complex experiment pipelines and might need to run experiments remotely. Teams can still log those experiment results to Arize via log_experiment to maintain a record of experiments for tracking and comparing.

Steps to log an experiment

1. Store the experiment results in a dataframe

We will be logging an example experiment with three columns:
  • result is the output of the LLM pipeline.
  • correctness is the evaluation label of the experiment.
  • example_id is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.
# Example DataFrame:
experiment_run_df = pd.DataFrame(
    {
        "result": [
            "The telephone was invented by **Alexander Graham Bell**.", 
            "The invention of the light bulb is commonly attributed to **Thomas Edison**"
        ],
        "label": ["correct", "incorrect"],
        "score": [1, 0],
        "explanation_text": [
            "This statement is accurate because Alexander Graham Bell is credited with inventing the telephone.",
            "This statement is inaccurate; others like Humphry Davy and Joseph Swan made earlier versions of the light bulb.",
        ],
    }
)

2. Define column mappings

This code sets up mappings that link each dataset example to example_id, the LLM output to result, and evaluator outputs to label, score, and explanation.
from arize.experiments import (
    ExperimentTaskFieldNames,
    EvaluationResultFieldNames,
)

# Define field mappings for the LLM task id and example output
task_fields = ExperimentTaskFieldNames(
    example_id="example_id", output="result"
)

# Define field mappings for evaluator
evaluator_fields = EvaluationResultFieldNames(
    label="label",
    score="score",
    explanation="explanation_text",
)

# This maps the dataset ID to the example_id
dataset_examples = client.datasets.list_examples(dataset_id=dataset_id, all=True)
dataset_df = dataset_examples.to_df()
experiment_run_df["example_id"] = dataset_df["id"]

3. Log the experiment

Log the experiment to Arize using the columns and label for correctness.
from arize import ArizeClient

client = ArizeClient(api_key="your-arize-api-key")

experiment = client.experiments.create(
    name="my_experiment",
    dataset_id=dataset_id,
    experiment_runs=experiment_run_df,
    task_fields=task_fields,
    evaluator_columns={"correctness": evaluator_fields},
)