Key Capabilities
- Automatic tracing of all LLM calls during experiments
- Concurrent execution for faster evaluation
- Dry-run mode for testing without logging
- Built-in evaluator support
- Compare experiments side-by-side in the UI
Run an Experiment
Execute a task function across your dataset examples with automatic evaluation, then log the results to Arize. High-level flow:- Resolve the dataset and download examples (cached if enabled)
- Execute the task and evaluators with configurable concurrency
- Upload results to Arize (unless in dry-run mode)
Dry Run Mode
Execute your experiment locally without logging results to Arize. Use this to test your task and evaluators before committing to a full run.Concurrency Control
Control parallelism for faster execution.Error Handling
Stop execution on the first error encountered.OpenTelemetry Tracing
Set the global OpenTelemetry tracer provider for the experiment run.List Experiments
List all experiments, optionally filtered by dataset.Create an Experiment
Log pre-computed experiment results to Arize. Use this when you’ve already executed your experiment elsewhere and want to record the results. Unlikerun(), this does not execute the task - it only logs existing results.
Get an Experiment
Retrieve experiment details and metadata by name or ID. When using a name, providedataset and optionally space to disambiguate.
Delete an Experiment
Delete an experiment by name or ID. This operation is irreversible. There is no response from this call.List Experiment Runs
Retrieve individual runs from an experiment with pagination support. Passall=True to fetch all runs via Flight (ignores limit).