Use this file to discover all available pages before exploring further.
Follow with Complete Python Notebook
Once experiments are defined, they can be integrated into your development workflow as a systematic way to validate changes to your application. In practice, this means updating the underlying code that your experiment task calls—such as prompt changes, model swaps, retrieval logic, or system configuration—and then rerunning the experiment to observe how those changes affect evaluation metrics.Because experiments in Arize AX are tied to a fixed dataset and evaluation setup, you can clearly see how metrics evolve as your system changes. This allows you to compare results across runs and identify whether a change led to an improvement, a regression, or a tradeoff across different quality dimensions.Over time, this creates a measurable history of how your application has evolved and helps teams make decisions based on data rather than intuition.
Let’s demonstrate this workflow by creating an improved version of our support agent with enhanced instructions to improve actionability, then running an experiment to compare it against the initial experiment.
We’ll create a new version of the agent with enhanced instructions that emphasize specific, actionable responses. The key change is in the instructions parameter in the agent’s prompt.For the complete implementation including the task function, see the reference notebook.