Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
The
tasks client methods are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.Key Capabilities
- Create project-based tasks that run continuously against live spans
- Create dataset-based tasks that evaluate experiment results
- Create
run_experimenttasks that drive LLM calls on the server - Trigger on-demand task runs with custom data windows
- Poll task runs until completion with configurable timeout
- Cancel in-progress runs
- List and filter task runs by status
List Tasks
List tasks you have access to, with optional filtering by space, project, dataset, or type.task_type are "template_evaluation", "code_evaluation", and "run_experiment".
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.
Create an Evaluation Task
Create a new evaluation task. Evaluation tasks can target either a project (live spans) or a dataset (experiment results).Project-Based Task
A project-based task continuously evaluates incoming spans. Setis_continuous=True to run the task on every new span, or False to run it only on demand.
Dataset-Based Task
A dataset-based task evaluates examples from one or more experiments. At least oneexperiment_ids entry is required.
Column Mappings and Filters
Each evaluator in the task can have its own column mappings (to map template variables to span attribute names) and a per-evaluator query filter.| Parameter | Type | Description |
|---|---|---|
name | str | Task name. Must be unique within the space. |
task_type | str | "template_evaluation" or "code_evaluation". |
evaluators | list | List of evaluators to attach. At least one required. |
project | str | Target project name or ID. Required when dataset is not provided. |
dataset | str | Target dataset name or ID. Required when project is not provided. |
space | str | Space name or ID used to disambiguate name-based resolution for project and dataset. |
experiment_ids | list[str] | Required (at least one) when dataset is provided. |
sampling_rate | float | Fraction of spans to evaluate (0–1). Project-based tasks only. |
is_continuous | bool | True to run on every new span; False for on-demand only. |
query_filter | str | Task-level SQL-style filter applied to all evaluators. |
Create a Run-Experiment Task
Arun_experiment task drives all LLM calls on the server using the AI integration specified in run_configuration — no local callable is required.
TemplateEvaluationRunConfig instance or a plain dict matching one of those schemas; the SDK wraps it for you.
Get a Task
Retrieve a task by name or ID. When using a name, providespace to disambiguate.
Update a Task
Update mutable fields on an existing task. At least one update field must be provided. Passquery_filter=None to clear the existing filter; omit any other argument to leave it unchanged.
Delete a Task
Delete a task and its associated configuration. This operation is irreversible.Task Runs
Trigger a Run
Trigger an on-demand run for a task. The run starts in"pending" status. The accepted parameters depend on the task’s type.
Evaluation tasks (template_evaluation / code_evaluation):
| Parameter | Type | Default | Description |
|---|---|---|---|
task | str | required | Task name or ID to trigger. |
space | str | None | Space name or ID used to disambiguate the task lookup. Recommended when resolving by name. |
data_start_time | datetime | None | Start of data window to evaluate. |
data_end_time | datetime | now | End of data window. Defaults to the current time. |
max_spans | int | 10 000 | Maximum number of spans to process. |
override_evaluations | bool | False | Re-evaluate data that already has labels. |
experiment_ids | list[str] | None | Experiment IDs to run against (dataset-based tasks only). |
run_experiment tasks:
| Parameter | Type | Default | Description |
|---|---|---|---|
task | str | required | Task name or ID to trigger. |
space | str | None | Space name or ID used to disambiguate the task lookup. |
experiment_name | str | required | Display name for the experiment to be created. Must be unique within the dataset. |
dataset_version_id | str | latest | Dataset version global ID. Defaults to the latest version. |
max_examples | int | None | Maximum number of examples to run. When omitted, all examples are used. Mutually exclusive with example_ids. |
example_ids | list[str] | None | Specific dataset example global IDs to run against. Mutually exclusive with max_examples. |
tracing_metadata | dict[str, Any] | None | Arbitrary key-value metadata attached to the run’s traces. |
evaluation_task_ids | list[str] | None | Task global IDs of evaluation tasks to trigger after the experiment run completes. |
List Runs
List runs for a task with optional status filtering.status values: "pending", "running", "completed", "failed", "cancelled".
Get a Run
Retrieve a specific run by its ID.Cancel a Run
Cancel a run that is currently"pending" or "running".
Wait for a Run
Poll a run until it reaches a terminal state ("completed", "failed", or "cancelled").
TimeoutError if the run does not complete within timeout seconds.