Follow with Complete Python Notebook

Once experiments are defined, they can be integrated into your development workflow as a systematic way to validate changes to your application. In practice, this means updating the underlying code that your experiment task calls—such as prompt changes, model swaps, retrieval logic, or system configuration—and then rerunning the experiment to observe how those changes affect evaluation metrics. Because experiments in Arize AX are tied to a fixed dataset and evaluation setup, you can clearly see how metrics evolve as your system changes. This allows you to compare results across runs and identify whether a change led to an improvement, a regression, or a tradeoff across different quality dimensions. Over time, this creates a measurable history of how your application has evolved and helps teams make decisions based on data rather than intuition.

Iterating on Your Agent

Let’s demonstrate this workflow by creating an improved version of our support agent with enhanced instructions to improve actionability, then running an experiment to compare it against the initial experiment.

Create an Improved Agent

We’ll create a new version of the agent with enhanced instructions that emphasize specific, actionable responses. The key change is in the instructions parameter in the agent’s prompt. For the complete implementation including the task function, see the reference notebook.

Run Another Experiment

Run an experiment with the improved agent using the same dataset and evaluator to compare performance:

improved_experiment, improved_experiment_df = client.experiments.run(
    name="support agent performance with improved prompt",
    dataset_id=dataset_id,
    task=improved_support_agent_task,
    evaluators=[call_actionability_judge],
)

With the improved prompt, the evaluator scores should be higher compared to the initial experiment, indicating better actionability and helpfulness.

Comparing Experiments

After running both experiments, you can compare the results in the Arize AX UI. To compare experiments:

Navigate to the Experiments page in Arize AX
Select the experiments you want to compare by checking the boxes next to their names
Click the Compare button in the toolbar
The comparison view will open, showing side-by-side output and metrics for each experiment

The experiment comparison view allows you to:

See side-by-side metrics, outputs, and evaluation scores for each experiment
Identify which examples improved or regressed
Understand the tradeoffs between different quality dimensions

Next Steps

You’ve now learned the fundamentals of running experiments with Arize. Explore advanced experiment features to enhance your evaluation workflow:

How to Use Arize AX

Quickstart

Instrument

Observe

Evaluate

Improve

Machine Learning

Settings

Security

Iterating with Experiments in Your Workflow

Follow with Complete Python Notebook

Iterating on Your Agent

Create an Improved Agent

Run Another Experiment

Comparing Experiments

Next Steps

Run Experiment

Create Experiment Evaluator

Compare Experiments

How to Use Arize AX

Quickstart

Instrument

Observe

Evaluate

Improve

Machine Learning

Settings

Security

Documentation Index

Follow with Complete Python Notebook

​Iterating on Your Agent

​Create an Improved Agent

​Run Another Experiment

​Comparing Experiments

​Next Steps

Run Experiment

Create Experiment Evaluator

Compare Experiments

Iterating on Your Agent

Create an Improved Agent

Run Another Experiment

Comparing Experiments

Next Steps