Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

The ax tasks commands are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.
The ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.

ax tasks list

List evaluation tasks, optionally filtered by space, project, dataset, or type.
ax tasks list [--space <id>] [--project <id>] [--dataset <id>] [--name <filter>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]
OptionDescription
--spaceFilter tasks by space name or ID
--projectFilter tasks by project name or ID
--datasetFilter tasks by dataset name or ID
--nameCase-insensitive substring filter on task name
--task-typeFilter by type: template_evaluation, code_evaluation, or run_experiment
--limitMaximum number of results to return (default: 15)
--cursorPagination cursor for the next page
Examples:
ax tasks list --space sp_abc123
ax tasks list --space sp_abc123 --task-type template_evaluation
ax tasks list --project proj_abc123 --output tasks.json

ax tasks create

Create a new task. Dispatches internally based on --task-type. For evaluation tasks (template_evaluation or code_evaluation), either --project or --dataset must be provided, but not both. Run-experiment tasks (run_experiment) require --dataset and --run-configuration.
ax tasks create \
  --name <name> \
  --task-type <type> \
  [--evaluators <json-array>] \
  [--run-configuration <json>] \
  (--project <name-or-id> | --dataset <name-or-id>)
OptionDescription
--nameTask name (must be unique within the space)
--task-typetemplate_evaluation, code_evaluation, or run_experiment
--evaluatorsJSON array of evaluator objects (required for evaluation tasks; see format below)
--run-configurationJSON object (or @file.json) specifying the run configuration (required for run_experiment tasks)
--projectTarget project name or ID; mutually exclusive with --dataset (evaluation tasks only)
--spaceSpace name or ID (required when resolving --project or --dataset by name)
--datasetTarget dataset name or ID; mutually exclusive with --project for evaluation tasks; required for run_experiment tasks
--experiment-idsComma-separated experiment global IDs (evaluation tasks only)
--sampling-rateFraction of spans to evaluate, 0–1 (project evaluation tasks only)
--is-continuous / --no-continuousRun task continuously on incoming data (evaluation tasks only)
--query-filterTask-level SQL-style filter applied to all evaluators (evaluation tasks only)
Evaluators JSON format:
[
  {
    "evaluator_id": "ev_abc123",
    "query_filter": null,
    "column_mappings": null
  }
]
Run configuration JSON format (run_experiment tasks):
{
  "experiment_type": "llm_generation",
  "ai_integration_id": "...",
  "model_name": "gpt-4o",
  "messages": [{"role": "user", "content": "{{input}}"}]
}
Examples: Project-based evaluation task (continuous):
ax tasks create \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --is-continuous \
  --sampling-rate 0.1
Dataset-based evaluation task:
ax tasks create \
  --name "Experiment Evaluation" \
  --task-type template_evaluation \
  --dataset ds_xyz789 \
  --experiment-ids "exp_abc123,exp_def456" \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --no-continuous
Run-experiment task:
ax tasks create \
  --name "GPT-4o Summarization" \
  --task-type run_experiment \
  --dataset ds_xyz789 \
  --run-configuration '{"experiment_type": "llm_generation", "ai_integration_id": "ai_abc", "model_name": "gpt-4o", "messages": [{"role": "user", "content": "{{input}}"}]}'

ax tasks create-evaluation

Create a new evaluation task (template_evaluation or code_evaluation). Requires --name, --task-type, --evaluators, and one of --project / --dataset.
ax tasks create-evaluation \
  --name <name> \
  --task-type <type> \
  --evaluators <json-array> \
  (--project <name-or-id> | --dataset <name-or-id>)
OptionDescription
--nameTask name (must be unique within the space)
--task-typetemplate_evaluation or code_evaluation
--evaluatorsJSON array of evaluator objects (see format above)
--projectTarget project name or ID; mutually exclusive with --dataset
--spaceSpace name or ID (required when using a project name)
--datasetTarget dataset name or ID; mutually exclusive with --project
--experiment-idsComma-separated experiment global IDs (required for dataset-based tasks)
--sampling-rateFraction of data to evaluate, 0–1 (project tasks only)
--is-continuous / --no-continuousRun task continuously on incoming data
--query-filterTask-level query filter applied to all evaluators
Example:
ax tasks create-evaluation \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --sampling-rate 0.1 \
  --is-continuous

ax tasks create-run-experiment

Create a new run_experiment task. Requires --name, --dataset, and --run-configuration.
ax tasks create-run-experiment \
  --name <name> \
  --dataset <name-or-id> \
  --run-configuration <json>
OptionDescription
--nameTask name (must be unique within the space)
--datasetDataset name or ID to run experiments against
--run-configurationJSON object (or @file.json) specifying the run configuration
--spaceSpace name or ID
Example:
ax tasks create-run-experiment \
  --name "GPT-4o Summarization" \
  --dataset ds_xyz789 \
  --run-configuration @./run_config.json

ax tasks get

Get a task by name or ID.
ax tasks get <name-or-id>
Example:
ax tasks get task_abc123

ax tasks update

Update mutable fields on an existing task. The SDK auto-dispatches based on the task’s type; providing a field invalid for the resolved task type raises an error. At least one field must be provided.
ax tasks update <name-or-id> [--space <id>] [--name <name>] [--sampling-rate <n>] [--is-continuous|--no-continuous] [--query-filter <expr>] [--evaluators <json>] [--run-configuration <json>]
OptionDescription
--space, -sSpace name or ID (required when resolving task by name)
--name, -nNew task display name
--sampling-rateSampling rate between 0 and 1 (evaluation tasks only)
--is-continuous / --no-continuousWhether the task runs continuously (evaluation tasks only)
--query-filterTask-level query filter (evaluation tasks only). Pass --query-filter "" to clear the existing filter.
--evaluatorsJSON array replacing the full evaluator list (evaluation tasks only; same shape as ax tasks create --evaluators)
--run-configurationJSON object (or @file.json) replacing the run configuration (run_experiment tasks only). The entire stored config is atomically replaced.
Example:
ax tasks update task_abc123 --name "Relevance Monitor v2" --sampling-rate 0.25

ax tasks delete

Delete a task and its associated configuration. This operation is irreversible.
ax tasks delete <name-or-id> [--space <id>] [--force]
OptionDescription
--space, -sSpace name or ID (required when resolving task by name)
--force, -fSkip the confirmation prompt
Example:
ax tasks delete task_abc123 --force

ax tasks trigger-run

Trigger an on-demand run for a task. The run starts in pending status. The SDK auto-dispatches based on the task’s type; providing a flag invalid for the resolved task type raises an error. Pass --wait to block until the run reaches a terminal state.
ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--experiment-name <name>] [--dataset-version-id <id>] [--max-examples <n>] [--tracing-metadata <json>] [--wait] [--poll-interval <s>] [--timeout <s>]
OptionDescription
--data-start-timeISO 8601 start of the data window to evaluate (evaluation tasks only)
--data-end-timeISO 8601 end of the data window (evaluation tasks only, defaults to now)
--max-spansMaximum number of spans to process (evaluation tasks only, default: 10 000)
--override-evaluations / --no-override-evaluationsRe-evaluate data that already has labels (evaluation tasks only)
--experiment-idsComma-separated experiment global IDs (dataset-based evaluation tasks only)
--experiment-nameDisplay name for the experiment to be created (required for run_experiment tasks)
--dataset-version-idDataset version global ID (base64); defaults to the latest version (run_experiment tasks only)
--max-examplesMaximum number of examples to run (run_experiment tasks only)
--tracing-metadataJSON object (or @file.json) of key/value pairs attached to experiment traces (run_experiment tasks only)
--wait / -wBlock until the run reaches a terminal state
--poll-intervalSeconds between polling attempts when using --wait (default: 5)
--timeoutMaximum seconds to wait when using --wait (default: 600)
Examples:
# Trigger a run and return immediately
ax tasks trigger-run task_abc123

# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
  --data-start-time 2024-01-01T00:00:00Z \
  --data-end-time 2024-02-01T00:00:00Z

# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait

# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10

ax tasks list-runs

List runs for a task, with optional status filtering.
ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]
OptionDescription
--statusFilter by run status: pending, running, completed, failed, cancelled
--limitMaximum number of results to return (default: 15)
--cursorPagination cursor for the next page
Examples:
ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json

ax tasks get-run

Get a task run by its global ID.
ax tasks get-run <run-id>
Example:
ax tasks get-run run_abc123

ax tasks cancel-run

Cancel a task run. Only valid when the run is pending or running.
ax tasks cancel-run <run-id> [--force]
OptionDescription
--forceSkip the confirmation prompt
Examples:
ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force

ax tasks wait-for-run

Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.
ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]
OptionDescription
--poll-intervalSeconds between polling attempts (default: 5)
--timeoutMaximum seconds to wait before failing (default: 600)
Example:
ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10