2026-05-13

Build Arize-Powered Tooling in Go with the New Go SDK v2

May 13, 2026New SDKs and REST APIsThe first release of the Arize Go SDK v2 (client-go-v2 v0.1.0) is now available, so you can integrate Arize directly into Go services and pipelines.

Core client — arize.NewClient wires HTTP transport, headers, and configuration with environment-variable overrides and secret masking.
Resource resolution — Look up spaces, projects, datasets, experiments, prompts, evaluators, annotation configs, annotation queues, AI integrations, and tasks by name or ID.
Typed errors and pre-release guardrails — Structured HTTP error types with Unwrap support, plus one-time warnings on alpha/beta endpoints.

The code example below initializes the Go SDK client with an API key:

package main

import (
    "log"
    "os"

    "github.com/Arize-ai/client-go-v2"
)

func main() {
    client, err := arize.NewClient(arize.Config{
        APIKey: os.Getenv("ARIZE_API_KEY"), // set ARIZE_API_KEY in your environment before running
    })
    if err != nil {
        log.Fatal(err)
    }
    // use client to make API calls
}

Visit the Go SDK v2 documentation to learn more.

Automatically Add Spans to Labeling Queues

May 13, 2026New AnnotationsYou can now configure a labeling queue to automatically pull in spans matching optional filter criteria, so you can build review pipelines without manual curation. Selecing a project as the datasource when creating a queue.

Selecing a project as the datasource when creating a queue.

Selecting a project as the datasource when creating a queue to automatically pull in spans.

Project datasource — Select a project as the data source when creating a queue.
Query filter — Optionally scope which spans are routed for labeling (for example, attributes.openinference.span.kind = 'LLM').
Sampling rate — Route a representative slice of traffic when matching span volume exceeds annotation bandwidth.
Continuous and backfill modes — Enable continuous ingestion to keep pace with new traffic, backfill to seed the queue from existing spans, or both.
Queue cap — Set an optional cap to prevent unbounded growth and keep the queue manageable.

All settings are adjustable after queue creation, and deduplication is built in.Learn more about labeling queues.

Run an Evaluator on Selected Experiments in One Click

May 13, 2026Improvement Datasets and ExperimentsPick experiments in the Experiments table, click Run Evaluator, and the evaluator creation dialog opens with those experiments already pre-populated—no need to re-select them by hand. Selected experiments are now pre-populated in the evaluator dialog.

Selected experiments are now pre-populated in the evaluator dialog.For more information, refer to the run offline evals on experiments documentation.

Manage Users, Roles, and Invitations from the Python SDK

May 12, 2026New SDKs and REST APIsarize-python-sdk v8.24.0 adds full CRUD support for the /v2/users endpoints so you can manage your account’s user base programmatically instead of clicking through the UI.

Lifecycle operations — users.list(), get(), create(), update(), and delete().
Invitation and password flows — users.resend_invitation() and users.reset_password() automate the most common admin chores.
Typed domain models — User, organization, and space roles return as Pydantic models, so ax users list produces clean to_df output instead of crashing on raw API types.

The code example below covers listing users, creating a new user with a role assignment, and managing invitation and password flows:

from arize.users.types import PredefinedUserRole

users = client.users.list(
    email="@acme.com",             # optional substring filter
    status=["active", "invited"],  # omit to return all statuses
)
for user in users:
    print(user.id, user.email)

user = client.users.create(
    name="Ada Lovelace",
    email="ada@example.com",                    # used as the idempotency key
    role=PredefinedUserRole(name="member"),      # "admin", "member", or "annotator"
    invite_mode="email_link",                    # "none", "email_link", or "temporary_password"
)

client.users.resend_invitation(user_id=user.id)  # target user must be in "invited" state
client.users.reset_password(user_id=user.id)     # user must authenticate via password, not SSO

Learn more about managing users with the Python SDK.

Fixes and Improvements

May 7–13, 2026Custom Metrics

Improvement PROJECT is now accepted as an alias for MODEL in custom metric SQL, so you can write FROM project to match how tracing projects are named elsewhere in the platform. Existing FROM model queries are unaffected.

Alyx

Improvement The Alyx home agent can now list traces directly, so you can ask for recent traces without having to switch surfaces first.
Fix Editing a Prompt Playground prompt through Alyx no longer fails when the model returns the prompt as a JSON-encoded string—Alyx parses it automatically.
Fix The Alyx read_prompt tool validates prompt IDs before calling, eliminating a class of failed reads.
Fix Re-opening an Alyx chat with a custom_trace_view widget no longer renders a fresh Accept button, so users can’t accidentally create duplicate views.

Annotations

Improvement A new background job soft-deletes annotation queue records that have been annotated or sat untouched for a year, keeping queue tables from growing unbounded.
Fix Creating an annotation queue via POST /v2/annotation-queues no longer returns 500 errors for accounts whose user names aren’t "admin".

Evaluators

Fix Toggling Enable Tracing on an existing template-eval online task now persists on save. Previously the field was silently dropped when patching legacy (pre–Eval Hub) tasks.

Models and Integrations

Improvement The integration setup flow now shows tooltips on each field, so it’s easier to understand what each value should be before submitting.

SDKs and APIs

Fix POST /v2/experiments now returns 400 Bad Request for schema mismatches such as an eval.*.score type mismatch, instead of a generic 500.

Tracing and Sessions

Fix Switching to pretty JSON formatting in the trace view no longer causes UI issues on large payloads.

2026-05-06

Visualize Evaluator Score Distributions Across Spans and Experiments

May 6, 2026New Dashboards and VisualizationsEval score charts are now available to all users. Visualize how your evaluator scores distribute across spans and experiments directly from the model overview and tracing pages—no configuration required.

Review and Confirm Alyx Proposals Before They Take Effect

May 6, 2026Improvement AlyxThree Alyx operations that previously applied changes silently now route through a visible confirmation drawer before taking effect. You can review, edit, and accept or skip each proposal.

Eval Form Proposals: Alyx surfaces an editable drawer with the proposed evaluator name, display name, template, and classification choices before saving.
Task Creation: A review drawer shows the full task configuration—target, evaluator, run mode, sampling rate—before the task is created.
Task Configuration: Configuring task parameters through Alyx now always requires confirmation, regardless of which page you’re on.

All three respect the existing “auto-accept evals & tasks” toggle.

Wire Experiment Runs into Automated Pipelines with the run_experiment REST API

May 6, 2026New SDKs and REST APIsThe v2 REST API now supports experiment run tasks. Create, update, and trigger run_experiment tasks programmatically to wire experiment runs into automated pipelines.

Assign Multiple Annotation Queue Records to a Reviewer in Bulk

May 6, 2026New AnnotationsAssign multiple annotation queue records to a reviewer in a single operation from the queue UI.

Control Annotation Queue Capacity with Per-Queue Record Limits

May 6, 2026Improvement AnnotationsSet and clear a custom max_records cap on individual annotation queues from the queue settings UI, overriding the global account default.

Fixes and Improvements

May 1–6, 2026

Fix Models and Integrations Azure OpenAI o-family models (o1, o3-mini, o4-mini) now work correctly in Prompt Playground and evals—the default API version is updated to 2025-04-01-preview so you no longer need to enter it manually.
Fix Datasets and Experiments The “View Experiment Traces” button now returns correct results for experiments run via the Arize Python SDK, which uses experiment_id rather than dataset_id.
Fix Evaluators Eval result columns in eval.<name>.<field> format generated by AX experiment evals are no longer dropped before the output is returned.
Fix Evaluators “View Task Logs” from an eval feedback tooltip now opens the exact task run instead of an approximate lookup that failed for renamed evaluators and older runs.

2026-04-30

RBAC (GA)

April 13, 2026Role-Based Access Control is now generally available on Arize AX. Define custom roles, scope permissions to spaces and resources, and manage who can create, read, update, and delete platform resources from a single, auditable surface.

Resource Restrictions & Role Bindings: Restrict access to specific resources and bind roles to users or service accounts as the building blocks of fine-grained access control.
API Key Permissions: New USER_KEY_CREATE, SERVICE_KEY_READ, and SERVICE_KEY_UPDATE permissions, plus developer permission checks on API key mutations, give admins precise control over who can mint and rotate keys.
User Creation Hardening: GraphQL API key creation has been removed in favor of REST flows, and a permissions toggle on user creation makes default access explicit at onboarding time.

Alyx Improvements

April 4–30, 2026

Floating Alyx Button: A persistent Alyx launcher follows you across the platform, with a pulse indicator that highlights when context-relevant suggestions are available.
Alyx History Menu: Pick up where you left off—recent Alyx conversations are now grouped in a dedicated history menu so you can resume threads without losing context.
Ask Question Tool: Alyx can now ask clarifying questions mid-task to confirm intent before taking action, reducing wrong turns on multi-step requests.
Alyx Writes Code Evaluators: Describe what you want to measure and Alyx drafts the code evaluator, wires up parameters, and previews results before saving.
Auto-Fix Variable Mappings on Eval & Task Forms: A new Alyx button on evaluator and task creation pages auto-detects and fixes broken variable mappings using actual column data—Alyx can also be auto-triggered to repair mappings without leaving the form.
Variable Mapping Subagent in Playground: The same auto-mapping subagent is now available inside Prompt Playground, with user-defined columns accepted as hints to speed up resolution.
Run Experiments from Datasets: Trigger experiment runs through Alyx directly from the dataset page—no need to navigate to the experiments surface.
Unify Query Filters with Alyx: Translate natural language into structured query filters across tracing, sessions, and evaluations from one consistent Alyx flow.
Eval Hub Alyx: Alyx is now embedded throughout Eval Hub, including a dedicated assistant on the evaluator detail page and a “Start with Alyx” prompt on empty states.
Datasets Empty State: New Alyx entry point on the datasets page guides you through creating your first dataset and configuring evaluations.
Skills on Home Page: The home-page Alyx now exposes surface-specific skills, making it easier to discover what Alyx can do in your current context.
Materialized Column Awareness: Alyx is now aware of materialized input and output columns when filtering, returning more accurate results on projects that use derived attributes.
Integrations Slideover: AI integrations now surface in a slideover within Alyx chat so you can view connection details without leaving the conversation.
Claude Models in Canada: Anthropic Claude models are now enabled for Alyx in the Canada region for customers in that data residency.

Evaluator Improvements

April 4–30, 2026

Combined Eval & Task Form: Creating an evaluator and its task is now a single, streamlined form—configure the evaluator, define the task, and save in one pass instead of two.
Release Evals in Playground: Promote evaluators from draft to released directly in Prompt Playground and reuse them across experiments and online tasks.
Test Code Eval on Example: A new “Test on example” button runs your code evaluator against a single example so you can validate logic before scoring at scale.
All Variables in Eval Template: Evaluator templates now accept all available variables, removing prior caps on the number of inputs you can wire into a prompt.
Optimization Direction “None”: Add a none option to evaluator optimization direction for evaluators where higher or lower scores aren’t inherently better.
Save Evaluator Version on Run: Running an experiment now records the exact evaluator version used, making historical comparisons reproducible.
Eval Metadata in Tracing Details: Evaluator metadata—name, version, score, explanation—is now surfaced inline in trace span details for at-a-glance triage.
Streamlined Eval Task Menu: The evaluator tasks menu has been reorganized for faster access to common operations.
List Evals in Tracing Tasks Button: The tracing toolbar now shows the active evaluators on a project, so you can confirm what’s scoring before drilling in.

Custom Metrics

April 4–30, 2026

Space-Scoped Custom Metrics: Custom metrics are now scoped to spaces instead of individual models, so you can define a metric once and reuse it across every model and dashboard in the space. A migration moves existing metrics to space scope automatically.

Annotation Queues

April 4–30, 2026

Records-Per-Queue Caps: Set a cap on the number of records a single annotation queue can hold, preventing unbounded growth and runaway labeling costs.
Duplicate & Capacity Surfacing: When adding records to a queue, the UI now surfaces the count of duplicates skipped and any capacity restrictions hit, so reviewers know exactly what landed.
Manual Annotation Submission: Reviewers can now submit annotations manually from the queue without going through the full assignment flow.
CSV / JSONL Download: Export annotation queue records to CSV or JSONL for offline review or downstream pipelines.
User-ID-to-Email Rename: Annotation column names that previously used opaque user IDs are now rendered with email addresses for readability.
Buffered Annotation Updates: User annotation updates are now buffered through the existing ingestion path for more reliable persistence under load.
Preselected Configs from Annotation Columns: When configuring annotations on a dataset, the UI now preselects configs based on existing annotation columns to avoid redundant setup.

Datasets & Experiments

April 4–30, 2026

Sortable Example Columns: Sort by any column on the dataset examples table, including custom annotation columns and metrics.
Span Dataset Metrics Bar: Span dataset versions now show a metrics stats bar at the top for instant visibility into volume, scores, and drift.
Expand & Collapse Dataset Rows: Long dataset rows can now be expanded inline to inspect full input/output payloads without leaving the table.
Image Hover Preview: Image cells in experiment slideovers now load a larger preview on hover, making it easier to spot visual regressions across runs.
Nested JSON Preview: Nested JSON values in dataset cells render with collapsible structure instead of a flat string blob.
CSV Integer Support: CSV uploads now correctly preserve integer types instead of coercing them to floats or strings.
Avg. Latency on Experiments: Experiments now show average latency alongside score and cost columns, with consistent eval-experiment formatting across the table.

Tracing & Sessions

April 4–30, 2026

Project Source Mapping Configs: Define source mapping configurations per project to standardize how raw span attributes map to canonical input, output, and metadata columns. A new Save button replaces blur-based auto-save in the source mapping editor, and source mapping is now supported on on-prem deployments.
Materialized Input & Output Columns: Project source mappings can now materialize input and output columns up front, accelerating filters and downstream evals on high-volume projects.
Multi-Span Admission Path: Span/trace/session evaluation tasks can now ingest and score multi-span queries through a unified admission path, with continuous grouping and sizing for higher throughput.
Sessions Columns & Expansion: The sessions tab now supports custom columns and per-row expand/collapse for quick session-level triage.
Allow Double Quotes in Auto-Add Filter: Auto-add filters now accept double-quoted values, fixing a class of “no results” bugs on string filters.
Auto Granularity in Monitor Charts: Monitor metric charts default to a granularity that returns data, eliminating empty time-series on first load.

Dashboards & Visualization

April 4–30, 2026

Charting on Project Home: Project home pages now include charting widgets so you can see traffic, evaluation scores, and latency trends without opening a dedicated dashboard.
Bar & Distribution Charts: Dashboards now support bar charts and distribution charts, with click-through links to the underlying logs and tasks.
Eval Chart Card: A new evaluator chart card with skeleton loader surfaces score trends faster, including a task-level variant for online evaluators.
Metrics Tab & Starring Evals: A new Metrics tab consolidates eval performance views, and you can now star favorite evaluators for quick filtering.
Performance Breakdown Column Selector: Choose which columns to break down by in performance analysis without editing the dashboard config.
Parent Sub-Task Logs: Task logs now show parent and sub-task hierarchy so you can debug nested online evaluations end-to-end.
Automatic Alert Sleuth: When a monitor alert fires, AX now auto-runs a root-cause investigation across recent traces and surfaces likely contributing factors directly on the alert.

Webhooks & Events

April 4–30, 2026

Webhooks UI: Configure, test, and manage webhooks from a dedicated UI with payload previews and delivery history.
Test Webhooks Button: Send a synthetic event to your webhook endpoint with a single click to verify auth and payload shape before going live.
ES-Powered Events Page: The Webhooks panel has been replaced with an Events page backed by Elasticsearch, supporting full-text search, structured filters, and faster lookups across delivery history.

Playground

April 4–30, 2026

Prompt Switcher: Switch between prompts in the playground without losing your current context, making side-by-side comparison faster.
Postprocessing Query Validation: Validation now runs on postprocessing queries in the UI, catching syntax and reference errors before you run the prompt.
PNG Image Display: PNG image inputs now render directly in the playground without requiring a data-URI prefix.
Reset Last-Used LLM: The “last used LLM” is no longer pinned in local storage, so a fresh playground always defaults to your space’s preferred model.
Default Integration per Space: Set a default AI integration per space from Evals or Playground via a new modal, so prompts run against the right provider out of the box.

Data Fabric

April 4–30, 2026

Azure Blob Storage: Data Fabric now supports Azure Blob Storage as a source, alongside existing GCS and S3 support.
Delta Lake Format: Specify Delta Lake as both an input format and an output destination for Data Fabric jobs.
Azure Data Fabric UI: Refreshed UI for configuring Azure data sources, including a dedicated table format column for clarity on input shape.

Models & Integrations

April 4–30, 2026

Anthropic Opus 4.7: Claude Opus 4.7 is now available across Alyx, Playground, and evaluators.
OpenAI o3 & o4-mini: OpenAI’s o3 and o4-mini reasoning models are now selectable for prompts and evaluations.
GPT-5.4 Family: Added gpt-5.4-nano-2026-03-17, gpt-5.4-mini, and gpt-5.4-mini-2026-03-17 to the OpenAI provider.
GPT-5.5 & GPT-5.5-Pro: Latest OpenAI flagship and pro variants are now supported.

SDKs & REST APIs

April 4–30, 2026New endpoints and SDK clients for managing platform resources programmatically:

Organizations API (REST v2 + Python & JS SDKs): Full lifecycle for organizations—list, get, create, update, and delete—through the REST API and both Python and JavaScript SDKs, including a top-level deleteOrganization on the AX client.
Spaces Delete: Delete spaces programmatically through the REST API and SDKs, completing the space management surface.
Tasks Update & Delete: Update and delete evaluation tasks through the REST API and both SDKs, plus matching ax tasks update and ax tasks delete CLI commands.
Batch Annotation Endpoints: New batch annotation endpoints for datasets and experiments let you submit annotations across many examples or runs in a single request.
Annotate Examples & Runs (SDKs): Python and JavaScript SDKs gain annotate_examples / annotateDatasetExamples and annotate_runs / annotateExperimentRuns for ergonomic batch annotation flows.
Resource Restrictions & Role Bindings: New ResourceRestrictionsClient and RoleBindingsClient subclients in Python and JS expose the GA RBAC surface, with restrictResource, unrestrictResource, and role binding operations.
Custom Code Evaluator Configs: The evaluators API now accepts code-evaluator-specific configuration fields, enabling programmatic creation of code evals with parameters.
Partial Success on Spans Delete: Spans Delete responses now report partial success with per-span outcomes, so bulk deletes surface exactly which spans were affected.
DatasetWithExampleIds Response: Dataset example endpoints now return a DatasetWithExampleIds response variant, exposing example IDs alongside dataset metadata.
Public SDK Type Aliases: Both Python and JavaScript SDKs now re-export public type aliases for all subdomains, eliminating the need to import from _generated.
Prompts v2 API Audit Improvements: The Prompts v2 API has improved audit logging on create, update, and label operations for clearer change history.

CLI Commands

April 4–30, 2026New command groups and capabilities for the ax CLI:

ax organizations: Full CRUD for organizations from the terminal—list, get, create, update, and delete.
ax resource-restrictions & ax role-bindings: Manage RBAC resource restrictions and role bindings from the CLI to script access provisioning.
ax tasks update & ax tasks delete: Update task configurations or remove tasks directly from the terminal.
ax spaces delete: Delete spaces with a confirmation prompt to prevent accidental removal.
Profile API Key Hint: ax profile create now shows a hint about expected API key format, reducing setup errors on first use.
Name/ID Resolution on api_keys.create(): Pass either a space name or ID when creating API keys—the CLI resolves names automatically via API lookup.
Background Update Check & Upgrade Command: The CLI now checks for new versions in the background and exposes an upgrade command to pull the latest release without leaving the terminal.
Single-Host Flags for On-Prem: New --single-host and --single-port flags streamline on-prem deployments where all services share a single hostname.

2026-04-03

Alyx Improvements

March 20–April 3, 2026

Trace Aggregations: Alyx can now aggregate numeric values from child spans in traces, grouped by attributes in the root span. Ask questions like “compute average token usage by router type” or “calculate total cost by user email” and get instant results.
Auto-Fix Column Mappings: Alyx automatically detects and fixes broken evaluator column mappings using actual column data. It discovers eval-task pairs needing fixes, verifies data coverage, previews sample values for semantic fit, and applies full mapping updates while preserving correct mappings.
Auto-Select Preview Spans: After fixing evaluator variable mappings, the preview panel now automatically selects the relevant span with data for each mapped column—no manual selection needed.
Playground Onboarding: New “Start with Alyx” action on the Playgrounds list opens a new playground and guides you through setting up a customer support bot with datasets, prompts, evaluations, and experiments.
Custom Metrics Creation: Ask Alyx to create custom metrics using natural language. Alyx generates the query, shows a confirmation drawer for you to review and approve the name and query, and applies it automatically.
Improved Error Messages: Alyx now shows clear, user-facing error messages when internal model provider errors occur, instead of generic failure states.

Evaluator Improvements

March 23–April 3, 2026

Code Evals in Eval Hub: Code evaluators are now first-class citizens in Eval Hub—create code evaluators (template or custom), version them, and reuse across tasks and experiments. Update an evaluator to save a new version, just like template evaluators.
Custom Code Evals, Revamped: Live validation surfaces issues in your code block before you submit. Custom evals now support parameters (self.param_name) and evaluate params, bringing them to parity with template evaluators.
Column Mapping Preview: The same preview panel from template evaluators is now available for code evals, with clear warnings for missing mappings, unresolved columns, and valid state.
Code Evals in Playground: Select code evaluators from Eval Hub directly in Prompt Playground to score experiment runs.
Optimization Direction: Evaluators now support optimization direction configuration, letting you specify whether higher or lower scores are better for your evaluation criteria.
Manual Mode Access: Evaluators are now accessible from the home page in both onboarding and normal views in manual mode.
Playground Eval Config: Configure evaluators directly in the playground with classification choices and custom scores, explanation toggles, and save configurations to the Eval Hub with version history.
Task from Span: Create evaluator tasks directly from trace spans in the slideover, keeping the UI in sync with in-flight and newly created evaluators.
Hide Null Outputs: New toggle in experiments to hide rows where all experiment outputs are null, defaulting to on. State persists via URL.

SDKs & REST APIs

March 20–April 3, 2026New SDK clients and REST API endpoints for managing platform resources programmatically:

Annotation Queues (Python & JavaScript SDKs): Complete annotation queue management in both SDKs with queue CRUD operations, record management, annotation submission, and record assignment. Supports both ALL and RANDOM assignment methods.
Tasks API (Python & JavaScript SDKs): Comprehensive evaluation tasks support in both SDKs with task CRUD operations, task run management including trigger, list, get, and cancel. Python SDK includes a wait helper with configurable polling and timeout.
Spans API with Annotations: The Spans API now returns annotations and evaluations in structured form, including user email lookup for user annotations.
List Spans (JavaScript SDK): The JavaScript SDK now supports listing spans with filtering and pagination capabilities.

CLI Commands

March 20–April 2, 2026New command groups and capabilities for the ax CLI:

New Command Groups: Six new command groups added—ax evaluators for evaluator and version management, ax tasks for evaluation task operations including wait-for-run, ax api-keys for API key lifecycle management, ax ai-integrations for managing OpenAI, Azure, Bedrock, Vertex AI, Anthropic, and custom providers, ax prompts for full prompt lifecycle with versions and labels, and ax roles for role management.
Name Filters: Added —name / -n option to all list commands that support it, including ai-integrations, annotation-configs, datasets, evaluators, projects, prompts, and tasks. Filter by case-insensitive substring.
Classification Config: Configure classification evaluators from the terminal with —classification-choices for label-to-score mappings, —direction for optimization direction, and —data-granularity for evaluation scope.
Agent Skills Install: Interactively install agent skills through the CLI with both interactive and non-interactive options.

Tracing Improvements

March 20–April 2, 2026

Sessions Metrics Bar: New metrics bar for sessions provides at-a-glance visibility into key session statistics and performance indicators.
Linkable Trace Views: Share specific trace views with colleagues using direct links. Each trace view now has a unique URL for easy collaboration and reference.

Dashboard & Visualization

March 20–April 2, 2026

Cost Formatting: Dashboard charts and pivot tables automatically detect LLM cost dimensions and apply currency formatting with dollar-prefixed values on axes, full precision in tooltips, and auto-defaulting Y-axis labels to “Cost ($)”.
Pivot Table Cardinality: Non-numeric dimensions in pivot tables now show cardinality and count metrics, making it easier to analyze categorical data distribution.
Preview Variables: Navigate to latest data and select columns directly in the preview variables panel for faster workflow.

Model & Integration Updates

March 20–27, 2026

Gemini 3.1 Models: Added Gemini 3.1 Pro Preview and Flash Lite Preview to the Vertex AI provider.
OTLP JSON Support: The OTLP HTTP endpoint now accepts application/json content type in addition to protobuf, making it easier to test with curl and integrate with languages that lack strong protobuf support.

Annotation Improvements

March 25–31, 2026

Queue Record Deletion: Delete annotation queue records individually or in bulk with new management capabilities.
Accessibility Enhancements: Visual accessibility improvements for annotation queues including zebra striping with alternating background colors, bold titles, and increased spacing between configs.

2026-03-19

Saved Views on Tracing

March 13–19, 2026Save filters, columns, sort, and time range on Tracing and reuse them anytime. Use the Views dropdown in the Tracing toolbar to switch views, set a personal default per project, or start from built-in Arize Default and Errors views—without reapplying the same setup each session.See Saved views on Tracing.Note: Sort order in saved views applies to timestamp and latency on Tracing today. Support for sorting on additional columns is planned; when available, that sort state will be included in saved views as well.

Dashboard export options

March 13–19, 2026

Export dashboards for offline analysis and sharing:

Full dashboard PDF: Download a PDF of the entire dashboard.
Single-widget PDF: Download a PDF of one widget.
Widget CSV: Download CSV data for a widget.

Works across dashboard types, including tracing project overviews, so you can share insights with stakeholders who do not have platform access.

SDKs & REST APIs

March 13–19, 2026New SDK clients and REST API endpoints for managing platform resources programmatically:

Evaluators (Python & JavaScript SDKs): Create, manage, and version evaluators programmatically through Python and JavaScript SDKs. Full create, read, update, and delete operations for evaluators, plus list, create, and retrieve for evaluator versions. Enables automated evaluator lifecycle management and integration of evaluation workflows into existing development processes.
Prompts (Python & JavaScript SDKs): Manage prompts and prompt versions through Python and JavaScript SDKs with full create, read, update, and delete operations across all prompt and version endpoints. Includes label management for organizing and retrieving specific versions. Set labels like “production” on any version for easy resolution, and labels automatically move when reassigned.
AI Integrations (Python SDK): Connect Arize AX with AI frameworks programmatically through the Python SDK. List, create, update, and delete integrations to automate the setup of instrumentation and monitoring across your AI applications.
Roles Management: Create, read, update, and delete custom roles through the API, enabling programmatic role management and automated access control workflows across your organization.
Name-Based Search: Find resources faster with case-insensitive name search across all major list endpoints including projects, prompts, datasets, experiments, spaces, annotation configs, and annotation queues. Flexible substring matching enables quick resource location without remembering exact names.
Space Deletion: Manage the full space lifecycle through the API with the ability to delete spaces programmatically, completing the set of space management operations.

Evaluator Improvements

March 13–19, 2026

Evaluator preview: The evaluator preview has been updated and the creation flow is streamlined so you can move from setup to validation with fewer steps.
Online Task Resources Configuration: Configure CPU and memory resources for online task evaluators to optimize performance and cost based on workload requirements. Provides more granular control over how evaluations run at scale.

Annotation Improvements

March 13–19, 2026

Queue Records Management: Add and delete records from annotation queues with new record operations. Assignee and status filters now persist in the URL for easy sharing and bookmarking, with an empty state UI for guidance when no filters are applied.
Session Slideover Annotations: The annotate button has moved from individual messages to the trace level in session conversations for clearer context. A trace number subtitle now appears in the annotation panel, making it clear which trace is being annotated.
View Source Data: Clicking “View Source Data” in annotation queues now opens the full span dialog instead of a limited preview, providing complete trace context for more informed annotation decisions.

2026-03-18

Alyx Improvements

March 13–18, 2026

Multi-Span Support: Alyx can now analyze and work with multiple spans simultaneously, enabling more powerful conversational debugging and analysis workflows across traces without switching context.
Dataset Page Context: Alyx now has richer awareness on dataset pages, including selected experiments, latest available experiments, and active evaluators. Provides more relevant, context-specific assistance when working with datasets and experiments.
Auto-Trigger on Destination Pages: When the home page Alyx links to another page, it can now auto-open the destination page’s Alyx with context pre-loaded, providing seamless continued help across navigation.

2026-03-17

CLI Commands

March 13–17, 2026New CLI commands for managing spaces and profiles:

Spaces Management: Create, list, get, and update spaces directly from the command line with formatted table output. The Spaces API has been promoted to stable, providing reliable programmatic access to space management.
Profile Recovery: The CLI now gracefully handles invalid or extra configuration fields instead of crashing. A new profile fix command helps diagnose and repair broken profile configurations, enabling quick recovery from configuration issues without manual file editing.

Bedrock Integration Updates

March 16–17, 2026

Bearer Token Authentication: Added support for AWS Bedrock integrations using bearer token authentication, providing a simpler alternative to IAM-based credential management for teams that prefer token-based auth workflows.
Inference Profile Support: AWS Bedrock inference profiles are now fully supported without requiring model IDs to be in a predefined allowlist. Unknown inference profile identifiers pass through directly to Bedrock for native error handling, enabling immediate use of custom or newly released profiles.

Dataset Summary Columns

March 17, 2026Dataset pages now display summary evaluation scores and annotation counts with token usage metrics in dedicated columns. Quickly assess dataset quality and annotation progress at a glance without opening individual records, streamlining your data review workflow.

2026-03-11

REST APIs

March 6–11, 2026New REST API endpoints for managing core platform resources programmatically:

Prompt Version & Label Management: Create, retrieve, and manage prompt versions with labels like “production” for easy resolution. List all versions of a prompt, create new versions with commit messages, and set/replace labels on versions. Labels automatically move when reassigned, ensuring the correct version is always referenced.
API Keys Management (Python & JavaScript SDKs): Create, list, and delete API keys programmatically through Python and JavaScript SDKs. Two types of keys available: user keys that inherit individual user permissions and can be created in multiples, and service keys tied to bot users for organizational continuity. Enables secure automation and integration with external systems while ensuring service accounts remain functional even when team members leave.
AI Integrations: List, create, update, and delete integrations with cursor-based pagination and space filtering. Connect Arize AX with various AI frameworks like OpenAI Agents, LangGraph, and Autogen for automatic instrumentation and monitoring with just a few lines of code, dramatically simplifying observability infrastructure.
Evaluators: List evaluators programmatically with cursor-based pagination and space filtering. Each evaluator includes ID, name, description, space ID, task type, tags, current version, and timestamps. Transform subjective AI outputs into measurable, trackable metrics that enable teams to confidently iterate on applications and ensure consistent quality at scale.
Annotation Queue Records: Retrieve annotation queue records via REST API with pagination support. View annotations, assigned users with status, source data, and evaluations for each record. Streamlines the annotation workflow by providing programmatic access to centralized data labeling processes, ensuring organized and consistent annotation tracking across teams.

Alyx Improvements

March 6–9, 2026

Home Chat: Interactive chat interface on the home page with revolving Alyx title changes. Get context-aware assistance for AI engineering tasks including troubleshooting traces, optimizing prompts, building evaluations, and analyzing experiments. Alyx provides relevant, surface-specific skills and can translate natural language into queries, offering intelligent analysis to streamline workflows.
Bedrock Integration: Use AWS Bedrock integrations as the model powering Alyx. Pass provider parameters (region, anthropic_version, etc.) through the full stack from UI to generative services. A warning icon appears when a Bedrock integration is selected without a configured region, opening a dialog to persist parameters to localStorage for automatic message retry.
Session Reading: Alyx can now read and analyze sessions with two new tools: get session table preview for an overview of the sessions table, and get session data to see all traces within a session. Regardless of selected time range, tools that fetch assets by ID now look in a minimum 60-day window. Improved large JSON compression logic maintains valid JSON structure and adds pagination indicators.

Saved Views for Playground

March 11, 2026Save complete snapshots of your prompt experimentation sessions including model configurations, parameters, prompts, inputs, and results. This makes AI prompt development reproducible and collaborative—preserve your work, return to previous experiments without recreating setups from scratch, share exact configurations with teammates, and track how prompts evolve over time with version control.

Evaluators in Playground

March 11, 2026Test and measure LLM system performance in real-time before deploying evaluations. Run evaluators directly in the Prompt Playground to assess how well your prompts, agents, or retrieval systems perform on specific tasks by creating custom metrics. Validate and optimize your LLM applications through immediate feedback, ensuring better performance before full implementation.

2026-03-10

Trace Views

March 10, 2026Query, filter, and inspect captured traces to analyze specific system behaviors and identify issues in your LLM applications or agents. Construct queries using various operators and filters (including AI-assisted search) to narrow down traces by attributes, time ranges, or complex multi-span patterns. Efficiently navigate large volumes of trace data and pinpoint exactly where your AI systems need improvement.

2026-03-09

CLI Commands

March 9, 2026New CLI commands for managing annotation configs and exporting data:

Annotation Configs CRUD: Create, list, and delete annotation configs through the CLI. Reusable schemas define how to structure human feedback and evaluations across your workspace with consistent rubrics (categorical labels, continuous scores, or freeform text). Consistent annotations enable better error analysis, help build high-quality training datasets, and provide reliable ground truth data for improving prompts and model fine-tuning.
Dataset, Experiment, and Span Export: Export datasets, experiments, and spans using new CLI commands. Download all examples from datasets, all runs from experiments, or spans filtered by trace/span/session ID. Append examples to existing datasets from inline JSON or files (CSV, JSON, JSONL, Parquet) with client-side structural validation and server-side field-level validation.

March 9, 2026Access your tracing projects directly from the navigation menu with a hover menu displaying your top 5 projects. This dedicated workspace organizes all related traces, evaluations, and sessions for a specific use case or model in one place, allowing you to isolate different products or models and visualize agent execution patterns to identify bottlenecks and understand agent reasoning.

2026-03-06

Gemini Provider Support

March 6, 2026Connect your Google API key to access Gemini and Gemma models directly within the Arize AX platform. Use Gemini models for prompt playground experimentation, model evaluation, and function calling with structured outputs. Gemini models support function calling and structured output; Gemma models do not support these features but are available for general tasks.

2026-03-05

March 5, 2026Create code evaluators directly from the New Evaluator dropdown menu. The evaluator-first creation experience includes an inline data source selector for projects and datasets. Code evaluators provide fast, consistent, and efficient evaluation for objective criteria like keyword checks, URL validation, or compliance rules without the variability of LLM-based judgments.

Test LLM Evaluator While Creating

March 5, 2026Test evaluators with a single-shot evaluation against the currently selected datasource row and see results (label/score/explanation) inline while creating or editing. The “Test Evaluator” button in the preview variables panel allows validation of evaluation criteria, input variables, and expected outputs before running full evaluations, saving time and ensuring accurate quality assessments.

2026-02-24

February 24, 2026

What’s new with Alyx

Alyx is your AI-powered agent across Arize AX—Alyx can use Arize for you, so you get context-aware help wherever you’re working. We’ve built a top-tier agent that’s actually useful: your best friend for building, debugging, and improving AI. This release highlights new skills, model options, and integrations.

Alyx by surface

Alyx adapts to where you are in the platform and exposes the right skills:

Trace slideover — Trace troubleshooting, span analysis, annotations, build evals
Prompt Playground — Optimize prompts, build evals, run experiments
Eval Builder / Task Builder — Build custom evals, configure tasks
Traces table (AI Search) — Natural language to filter syntax in the traces table
Traces page — Multi-trace analysis, pattern discovery, categorization
Datasets & Experiments — Analyze experiments, manage datasets, synthetic data

Cross-cutting skills

ArizeQL Generator — Turn natural language or existing code (SQL, Python) into AQL for custom metrics
Optimize Prompts — Improve prompt quality or target specific issues

Use your own integrations

Use OpenAI or Anthropic (via AWS Bedrock and Vertex AI) with Alyx. Add and manage integrations in Settings → Account Settings → Integrations. More providers coming soon.

Default model

Claude Sonnet 4.5 is now the default model for Alyx, giving you a strong balance of speed and capability out of the box. You can change the model in Alyx settings or via your configured integrations.

2026-02-17

February 17, 2026

Enhanced Scheduled Task Management

Edit scheduled automation tasks directly from the table with full control over:

Target execution date modification using a date/time picker
Arguments editing via comma-separated text fields
Input JSON payload updates with built-in validation
Real-time status validation ensuring only scheduled tasks can be modified

2026-02-12

Claude Opus 4.6 on AWS Bedrock

Access Anthropic’s most advanced AI model through AWS Bedrock, featuring:

State-of-the-art performance for coding, agentic workflows, and complex enterprise tasks
Extended context windows supporting up to 1 million tokens
Enterprise-grade deployment through AWS infrastructure

2026-02-12

Tracing Monitors Unification

Streamlined monitor creation experience with a unified flow for all LLM and tracing monitors:

Single creation path through the Tracing Project Monitor flow
Improved model selection with paginated ComboBox and model type filtering
Prefilled attributes from empty-state action cards for faster setup
Type column in ML monitors table for better organization

2026-02-12

Annotation Config REST API

New REST API endpoint GET /v2/annotation-configs provides programmatic access to annotation configurations with:

Paginated listing of all accessible annotation configs
Complete metadata including type, optimization direction, labels, and score ranges
RBAC enforcement ensuring users only see configs they have permission to access

2026-02-12

Prompt Version Persistence in Playground

LLM integration selection is now preserved when saving and loading prompts, ensuring:

Correct model integration restored when loading saved prompts
Consistent behavior across UI saves and Alyx-initiated saves
Version continuity maintaining integration settings across prompt iterations

2026-02-11

Enhanced Dashboard Templates

Improved dashboard creation experience with:

Updated empty state showing action cards for Token Tracking, Latency templates, and blank dashboards
Pre-filled dashboard names for faster setup
Streamlined modals with automatic model version detection for LLM projects

2026-02-11

Prompts in Experiments Table

The experiments table now displays prompt information directly with:

Prompt name and version shown in a dedicated column
Template preview on hover showing system and user messages
Full prompt slideover with detailed message inspection
Direct playground navigation opening experiments with their original prompts loaded

2026-02-11

Enhanced Annotation Queue Filtering

Improved annotation queue management with flexible filtering options:

Status-based filtering to view records by completion state
User assignment filtering to see records assigned to specific annotators
Combined filters supporting status and user criteria simultaneously

2026-02-11

Improved Evaluator Preview Panel

Enhanced span preview in evaluator configuration with:

Smart column sorting showing columns with values first
Alphabetical organization within value/no-value groups
Better usability when working with projects containing many columns

2026-02-11

RBAC Full Lineage Support

Comprehensive role-based access control with complete resource lineage:

Fine-grained permissions for prompts, evaluators, and annotation configs
Hierarchical authorization following space and account relationships
Consistent enforcement across mutations, resolvers, and shields

2026-02-11

Text Annotation Updates via SDK

The annotations API now supports updating freeform text annotations programmatically, enabling:

Bulk annotation updates through the SDK
Automated annotation workflows for text feedback
Complete annotation lifecycle management via API

2026-02-12

Eval Traces in Playground Experiments

Direct trace links from playground experiment results with:

Clickable trace links in experiment comparison views
Span-level visibility for evaluations kicked off via playground
Seamless debugging from experiment results to detailed trace data

2026-02-12

Invocation Parameters for Online Tasks

Online task LLM configuration now supports full invocation parameter customization:

Provider-specific parameters for each LLM provider
Flexible model configuration separating client setup from invocation settings
Consistent interface across different LLM providers

2026-02-10

Delete Projects and Models from Tables

Delete projects and models directly from the table view with improved safety controls:

Action column in tables - access delete functionality without navigating to individual project pages
Type-to-confirm deletion - must enter the exact project or model name to prevent accidental deletions
Streamlined workflow - manage your workspace organization more efficiently

2026-02-10

Create and Manage Prompts via API

New REST API endpoints enable programmatic prompt management:

Create prompts - POST /v2/prompts to save prompts with model parameters, tools, and response formats
List prompts - GET /v2/prompts with filtering by space, cursor-based pagination, and sort by last updated
Get prompt details - GET /v2/prompts/{prompt_id} to retrieve specific prompt configurations
Delete prompts - DELETE /v2/prompts/{prompt_id} to remove prompts programmatically
Version control - systematically manage prompt variations across environments and track changes over time

2026-02-10

Query Spans with Powerful Filters

New /v2/spans endpoint enables sophisticated trace data analysis:

Complex filtering - query spans by status code, time range, and custom attributes
Pagination support - efficiently retrieve large result sets with cursor-based navigation
Project-scoped access - filter spans within specific projects with proper RBAC enforcement
Programmatic debugging - quickly identify errors, performance bottlenecks, and behavioral patterns without manual searching

2026-02-09

Escaped Curly Braces in Eval Templates

Include literal JSON examples in evaluation prompts without triggering template variable errors:

Backslash escape syntax - use \{...\} to include curly braces as literal text
JSON examples support - safely include code snippets like \{"component":"chat-response"\} in prompts
Cross-stack compatibility - works consistently across frontend, backend TypeScript, Python, and Go services
Template flexibility - combine template variables {variable} with escaped literals \{literal\} in the same prompt

2026-02-09

Trace Links in Playground Experiments

Navigate directly from experiment results to full trace details:

Clickable trace links - view complete execution details for each playground run
Available in multiple views - access traces from dataset pages, experiment comparison views, and the playground
Bridge experimentation and observability - understand both output quality and execution behavior (latency, costs, tool calls) for every experimental run

2026-02-04

Internal Playground Tracing

All playground interactions are now automatically traced for debugging and analysis:

Automatic capture - playground experiments generate traces without manual instrumentation
Full observability - view latency, token counts, and execution details for every playground run
Debugging capabilities - troubleshoot issues by examining complete execution paths
Performance metrics - track and optimize playground interactions with detailed span data

2026-02-05

Enhanced Experiment Metadata

Comprehensive tracing metadata makes it easier to connect experiments to their evaluation results:

Run and evaluator tracking - runId and evaluatorId link traces to specific experiment executions
Parent task relationships - parentTaskId and parentTaskRunId connect chained evaluations
Row identifiers - spanId for tracing data and exampleId for dataset evaluations
Prompt templates - capture template strings and variables for reproducibility

2026-02-06

Select All Spans Moved to Floating Panel

Improved trace table UI with better multi-select controls:

Floating action panel - “Select All Spans” button relocated to the floating panel for easier access
Cleaner table view - removed dedicated row for select-all, reducing visual clutter
Streamlined selection - manage span selections more efficiently with grouped controls

2026-02-06

Annotation Queue Progress Tracking

Better visibility into annotation workload distribution:

Total assigned records - see how many records are assigned to annotators in each data cluster
Workload balancing - monitor task distribution across annotation teams to ensure balanced allocation

2026-02-06

ax-client SDK Published

The official Arize AX client SDK is now available for TypeScript/JavaScript projects:

Datasets API - create, list, update, and delete datasets programmatically
Experiments API - manage experiment runs and retrieve results
Examples API - append and update examples in datasets
Type-safe imports - full TypeScript support with generated types
Environment variable configuration - simple setup with API keys

2026-02-03

Clickable Span Table Rows

Navigate directly to span details by clicking anywhere in the row:

Full row clickable - entire table row is now interactive for easier navigation
Faster debugging - quickly jump to span details without precise clicking on specific cells
Improved UX - more intuitive interaction pattern consistent with modern data tables

2026-02-04

Dataset Examples Sorting

Organize dataset examples with flexible sorting options:

Sort by creation date - see newest or oldest examples first
Sort by update date - identify recently modified examples
Sort by ID - consistent ordering for reproducible views

2026-02-03

Pivot Table Dashboard Widgets

Create custom analysis views with configurable pivot tables:

Group by dimensions - organize data by categorical fields
Configurable metrics - select numeric columns with aggregation functions (sum, average, count)
Ad-hoc analysis - explore data patterns without predefined reports
Dashboard integration - save pivot configurations as reusable dashboard widgets

2026-02-03

Authentication Type for LLM Integrations

Database support for different authentication methods with LLM providers:

Auth type column - stores authentication method for each LLM integration
Multiple auth strategies - prepares platform for API keys, OAuth, and service accounts
Provider flexibility - manage credentials differently for various LLM providers

2026-02-03

Phoenix Import Dedicated Tab

Simplified migration experience for Phoenix users:

Dedicated import tab - separate “Import From Phoenix” tab in project creation
Traces-only focus - streamlined UI showing only trace connection form
Cleaner onboarding - removed Phoenix card from LLM tracing setup to reduce confusion
Feature flag control - controlled rollout with enablePhoenixMigration flag

2026-02-05

Claude Opus 4.6 on Vertex AI

Access Anthropic’s most advanced model through Google Cloud:

Vertex AI integration - Claude Opus 4.6 now available in playground and online tasks
1M token context window - handle extensive documents and conversations
Enhanced reasoning - state-of-the-art performance on coding and complex workflows
Enterprise deployment - leverage Google Cloud infrastructure for Claude models

2026-02-03

External HTTPS Access to Clusters

Simplified network access for external integrations:

Port 443 ingress - external HTTPS traffic now allowed to cluster endpoints
Secure external access - properly configured TLS/SSL for organization-wide availability
Integration support - enables webhooks, API callbacks, and external monitoring tools

2026-02-03

Deployment Activity Heatmap

Visualize deployment patterns over time with a GitHub-style contribution graph:

52-week view - see deployment activity for the past year at a glance
Color-coded frequency - darker cells indicate more deployments on that day
Interactive tooltips - hover to see date, deployment count, and services deployed
Summary statistics - total deployments and max-per-day metrics
Filter compatibility - heatmap updates based on service filters

2026-02-04

AI Features Timeline Improvements

Better organization of AI features with date-based filtering:

Recent activity focus - main tab shows only features from the last 7 days
Full history tab - new “History” tab provides access to all past features
View full history link - easily navigate between recent and historical views
Direct URL support - ?tab=history parameter for bookmarking and sharing

2026-02-04

Improved Feature Generation Reliability

More robust AI feature generation with better error handling:

Exponential backoff - automatic retries for rate limits and transient errors
Branch diff context - includes current changes in prompts for better iteration
Operation tracking - operationUuid field tracks async generation jobs
Collapsible logs UI - view generation logs with auto-scroll and syntax highlighting

2026-02-09

GKE Infrastructure Upgrades

Enhanced Kubernetes infrastructure for improved performance and cost efficiency:

GKE 1.33.5 - updated cluster and node pool versions
Spot node pools - 60-91% cost savings for fault-tolerant workloads
Extended IAM permissions - SecuritySMTP role updated with required access tokens
DevOps logging access - roles/logging.viewer granted for improved debugging

2026-01-31

Enhanced Usage Monitoring

New comprehensive usage tracking and reporting features for better resource management:

Datasource-level breakdowns for granular usage visibility
Account-based tracking with improved join keys for accurate reporting
10-minute update intervals for near real-time usage insights
Automated cleanup of expired data for accurate retention calculations

2026-01-31

Enhanced Platform Stability

January 2026Numerous improvements to platform reliability and performance:

Configuration drift resolution in GCP Terraform
Enhanced error handling across services
Improved logging and monitoring for faster troubleshooting
Database migration optimizations for schema updates
Better resource management for high-volume workloads

2026-01-30

Improved Onboarding Experience

Streamlined onboarding with enhanced user flows:

Redesigned onboarding cards with clearer visual hierarchy
“My First Playground” experience for hands-on experimentation
Role collection during signup for personalized setup
Custom hover states matching each card’s accent color

2026-01-30

Real-Time Evaluations

Run evaluations immediately on incoming data with real-time ingestion:

Instant evaluation of production traces without delays
Latent evaluation support for updating earlier spans
Seamless cutover between batch and real-time processing
Available across all Arize AX tiers by default

2026-01-30

AWS Bedrock Custom Endpoints

Enhanced AWS Bedrock integration for enterprise deployments:

Custom base URL support for private endpoints
Inference profile ARNs for multi-region routing
Custom model configurations for specialized deployments
Simplified regional management with unified tracking

2026-01-30

Wildcard Array Path Variables

Access array data more flexibly in templates and experiments:

Wildcard (*) patterns to reference all array elements
Last-index (-1) access for the most recent item
Automatic generation of wildcard variants for convenience
Support in task variables and experiment columns

2026-01-29

Improved Queue Management

Better user experience when managing annotation queues:

Duplicate detection with clear error messages
Added and skipped record counts after bulk operations
Actionable feedback when attempting to add existing records

2026-01-27

Enhanced RBAC System

Fine-grained access control with the new RBAC system:

Custom roles with specific permissions
Space-level role bindings for granular access management
Coexistence with legacy roles during migration
UI support for role assignment across all user management pages
Automatic fallback to legacy roles when custom roles are deleted

2026-01-27

Custom Metrics with LIKE Operator

More powerful filtering in custom metrics:

LIKE and ILIKE operators for pattern matching
Wildcard support with % syntax
Case-insensitive matching with ILIKE
Optimized query execution for performance

2026-01-27

Dashboard Template Filtering

Cleaner dashboard creation experience:

LLM-only space filtering shows only relevant templates
Context-aware templates based on project types
Reduced clutter in template selection
Consistent experience across spaces and projects

2026-01-27

Foundation for advanced tabular data visualization:

Grouped categorical dimensions for organized views
Configurable numeric columns with aggregations
Flexible filtering and time range support
Dashboard integration ready

2026-01-27

Evaluator Hub: Reusable Evaluators

We’re excited to introduce the Evaluator Hub - a centralized place to create, version, and reuse evaluators across all your evaluation tasks.

Why Reusable Evaluators?

Previously, evaluators were defined inline each time you created a task. This led to duplicated configurations, inconsistent evaluation criteria, and extra setup overhead. With the Evaluator Hub, you define an evaluator once and use it everywhere - ensuring consistent, reliable evaluations across your organization.Key benefits:

Consistency: The same evaluator definition is used across tasks, eliminating drift in evaluation criteria
Reliability: LLM configuration (model, provider, parameters) is set at the evaluator level, ensuring the evaluator is tested and validated with a specific model before being deployed to production tasks
Version Control: Track changes to evaluators over time with commit messages, making it easy to audit and roll back if needed
Flexibility: Column mappings let you reuse the same evaluator across datasets with different schemas by mapping template variables to your data columns

What’s New

Evaluator Hub tab: Browse, search, and manage all your evaluators in one place
Running Tasks tab: View and manage your active evaluation tasks
“Use Evaluator” action: Quickly create a task with any evaluator pre-selected
Column Mappings: Map evaluator template variables to your datasource columns when adding an evaluator to a task
Evaluator Versioning: Create new versions of evaluators with commit messages to track changes

Getting Started

Navigate to Evaluators in the left sidebar
Click New Evaluator to create your first reusable evaluator
Choose from pre-built templates or create a custom evaluation from scratch
Use your evaluator in tasks by clicking Use Evaluator or selecting it when creating a new task

The Evaluator Hub currently supports LLM-as-a-Judge evaluators. Support for reusable code evaluators is coming soon.

2026-01-23

Session Evaluations with Conversation Context

Evaluate entire conversation flows with new virtual attributes:

{conversation} template variable for session-level evaluations
Chronologically ordered input/output pairs
Automatic aggregation of multi-turn dialogues
Root span filtering for accurate session context

2026-01-23

Tracing Configuration for Evaluation Tasks

Enable detailed debugging for evaluation tasks:

Toggle tracing on/off in Advanced Options
Automatic trace generation for monitoring and debugging
Persistent settings saved with your tasks
Production-ready visibility into evaluation execution

2026-01-23

Improved Error Handling for Exceptions

Better filtering and debugging capabilities:

Filter by exception.type and exception.message in the UI
OpenInference semantic convention support for exceptions
Consistent data structure across datasources
Faster troubleshooting of error patterns

2026-01-23

SAML Role Mapping Search

Navigate large role mapping configurations easily:

Client-side search across attributes, spaces, roles, and organizations
Visual highlighting of search matches
Keyboard navigation through results
Improved usability for enterprise customers

2026-01-22

Enhanced Dashboard Time Persistence

Your dashboard preferences now persist automatically:

Auto-save time range, time zone, and granularity selections
Instant restoration when returning to dashboards
Per-dashboard settings for customized views
Seamless experience across sessions

2026-01-22

Resizable Trace Slideover

Customize your viewing experience:

Draggable slideover width for optimal layout
Persistent sizing preferences across sessions
Better content visibility for long traces

2026-01-21

Trace Table Performance Improvements

Faster loading times for the tracing table:

30-50% faster initial load times
String truncation for large content
Lazy loading of full values in tooltips
Minimal impact on user experience

2026-01-21

Scatter Plot Widgets

Explore relationships between variables with scatter plots:

Correlation analysis for two numeric dimensions
Interactive data points for detailed investigation
Dashboard integration for visual analytics
Customizable axes and filtering

2026-01-21

Enhanced Session Slideover

Better conversation visualization and navigation:

Trace labels with links to detailed views
Visual separators between traces
Hover highlighting synchronized between list and conversation
Improved readability for multi-turn interactions

2026-01-21

Experiment Task Timeout Configuration

Accommodate long-running evaluations:

Configurable timeout parameter beyond 120 seconds
Function-level control in run_experiment and evaluate_experiment
Backward compatibility with default values
Support for complex evaluators requiring extended processing

2026-01-21

Configurable Experiment Timeout

Handle complex evaluation scenarios:

Custom timeout values for long-running tasks
Per-experiment configuration for flexibility
Backward compatible defaults for existing code

2026-01-20

Expandable Trace Hierarchy

View trace structure directly in the table:

Expand traces to see child spans inline
Hierarchical visualization without opening slideouts
Faster navigation through complex traces
Contextual understanding of request flow

2026-01-20

Custom Prompt Release Labels

Organize and track prompt versions with custom labels:

Tag prompt versions with meaningful identifiers
Environment markers like “staging” or “production”
Dynamic label suggestions from existing prompts
Easy retrieval of specific prompt releases

2026-01-16

Eval Hub Enhancements

Improved evaluation management and visibility:

Model information in evaluator listings with provider icons
Evaluator counts in running tasks with hover details
Automatic save when creating or editing evaluators
Streamlined task flow for faster evaluation setup

2026-01-16

Todo List Management Improvements

More reliable task tracking in Alyx conversations:

Visual status indicators for all todo states
Dynamic reminders with exact update calls needed
Plan preservation across human-in-the-loop pauses
Clearer instructions positioned near the plan

2026-01-15

Span-to-Queue Workflow

Add spans and dataset examples to annotation queues seamlessly:

Multiple entry points from spans table, trace slideover, and queue records
New or existing queue selection
Batch operations for efficient queue population
Dataclusters integration for reliable processing

2026-01-15

Atlantis Terraform Automation

Streamlined infrastructure-as-code workflows:

Pull request integration for Terraform plans
Automated plan posting as PR comments
DevOps team permissions for webhook debugging
Structured review process before applying changes

2026-01-14

Java SDK Space ID Support

Modern authentication for Java applications:

Space ID authentication (space keys deprecated)
Backward compatibility maintained with existing constructors
Updated documentation and examples
Test coverage for new authentication method

2026-01-14

Enhanced Space Model Schema

More control over data retention and lookback:

Space-level schema lookback overrides for custom retention
Model-specific configurations for unique requirements
Flexible data management across different use cases

2026-01-14

Exact Match Code Evaluator

New built-in evaluator for validation:

String equality checks for exact matches
Expected vs actual comparisons for testing
Multi-field access with dataset row support
Alphabetically sorted evaluator list in UI

2026-01-12

Enhanced Annotation Configs

More powerful annotation workflows with improved configs:

Color-coded categories based on optimization direction
Read-only view for reviewing existing configs
Optimization direction control (maximize, minimize, or none)
Clear label guidance for consistent evaluations

2026-01-12

Batch Annotation Updates

Efficiently annotate large volumes of data:

Optimization direction support in annotation configs
Category-based labeling for issue detection
Best practice guidance for naming and structure
Streamlined categorization workflows

2026-01-09

Stacked Bar Chart Widgets

Visualize multi-dimensional data with new chart types:

Stacked bar charts for comparing categories over time
Optimized queries for fast rendering
Customizable groupings and dimensions
Dashboard integration for comprehensive monitoring

2026-01-09

Enhanced Eval Hub Empty States

Better guidance for getting started:

Improved empty state design with clear next steps
Documentation links for learning resources
Actionable cards for common workflows

2026-01-08

Google Analytics 4 BigQuery Sync

Automated analytics data export:

Daily GA4 to BigQuery transfers via Terraform
Raw event data access for advanced analysis
Overcome GA4 limitations like sampling and retention
Custom reporting capabilities with full data access

2026-01-08

Vertex AI Migration

Updated integration with Google Cloud AI:

Seamless Vertex AI connectivity for LLM applications
Enhanced observability for Google Cloud deployments
Modernized instrumentation for better tracing

2026-01-08

Generative Service Monitoring

Comprehensive monitoring for evaluation infrastructure:

Uptime and health alerts with paging
CPU and memory monitoring with warnings
Dedicated Grafana dashboard for visibility
Runbook documentation for incident response

2026-01-07

Custom Model Migrations

Expanded support for custom integrations:

Custom model endpoint support in evaluations
Higher traffic model optimization for performance
Flexible integration options for enterprise deployments

2026-01-06

Prompt Optimization on Experiments

Run prompt optimization directly on experiment results:

Experiment selector in optimization task creation
Dynamic column resolution for experiment data
Enhanced iteration on proven prompts
Seamless workflow from experiments to optimization

2026-01-05

Labeling Queue Annotations

More flexible annotation management:

Clear annotations (reset to null) anywhere
Support across spans, queues, and experiments for consistent workflows
Improved annotation lifecycle management

2025-12-18

Multi-Span Filters

Filter traces using multiple span conditions with:

AND, OR, NOT operators for combining conditions
Indirectly Calls (->) and Directly Calls (=>) relationship filters
Up to 5 filters to find complex patterns like “spans where A calls B, but not C” or “traces containing both X and Y”
Parentheses to build complex queries and pinpoint the traces that matter

2025-12-15

Support for Opus & Haiku 4.5

Expanded LLM model support to include Opus & Haiku 4.5 models in the playground and online tasks.

2025-12-12

Support for GPT-5.2 and GPT-5.2 Pro

Expanded LLM model support to include GPT-5.2 models in the playground and online tasks.

2025-12-10

Improved Playground Views

The Prompt Playground page is now Playgrounds, where you can use your Playground Views! This change allows you to easily navigate to a configuration of the prompt playground you’ve saved. A playground view saves a complete snapshot of your current prompt playground session, allowing you to preserve your work, share configurations with teammates, or return to previous experiments. Here’s what’s saved in a view:

LLM Config (provider, model selection, model params, custom endpoint settings)
Prompt Setup (messages, roles, message content, tool and function calls)
Generated Results (when setup with datasets)
Connected Datasource (dataset or span)

If you want to start a playground from scratch, you can create a view using the + Playground button at the top of the Playgrounds page.

2025-12-05

Realtime Ingestion for all new Arize AX Spaces

Any spaces created on or after 12/5/25 will use realtime ingestion by default! This eliminates the delay between sending traces and seeing them in the platform, giving you instant visibility into your production workloads.

2025-11-20

Structured Outputs Support for Playground

This release adds full structured output support to the playground, giving users precise control over the fields an LLM must return. Models that implement the OpenAI API schema (including OpenAI, Azure, and compatible custom endpoints) now support structured outputs end-to-end. When saving prompts, the structured output JSON is stored alongside other LLM parameters for seamless reuse. Tooltips have also been added to clearly indicate when a model or provider does not support structured outputs.

2025-11-10

Session Annotations

This release introduces Session Annotations, making it easier than ever to capture human insights without disrupting your workflow. You can now add notes directly from the Session Page—no context switching required.Annotations are supported at two levels:

Input/Output Level: Attach insights to specific output messages, automatically linked to the root span of the trace.
Span Level: Dive deeper into a trace and annotate individual spans for precise, context-rich feedback.

Together, these capabilities make it simple to highlight issues, call out successes, and integrate human feedback seamlessly into your debugging and evaluation process.

2025-11-05

Integrations Revamp

This release delivers major improvements to how integrations are managed, scoped, and configured. Integrations can now be targeted to specific orgs and spaces, and the UI has been refreshed to clearly separate AI Providers from Monitoring Integrations.A new creation flow supports both simple API-based setups and flexible custom endpoints, including multi-model configurations with defaults or custom names. Users can also add multiple keys for the same provider, enabling more granular control and easier management at scale.

2025-11-03

OpenInference TypeScript 2.0

See the OpenInference TypeScript Core package for more details.

Added easy manual instrumentation with the same decorators, wrappers, and attribute helpers found in the Python openinference-instrumentation package.
Introduced function tracing utilities that automatically create spans for sync/async function execution, including specialized wrappers for chains, agents, and tools.
Added decorator-based method tracing, enabling automatic span creation on class methods via the @observe decorator.
Expanded attribute helper utilities for standardized OpenTelemetry metadata creation, including helpers for inputs/outputs, LLM operations, embeddings, retrievers, and tool definitions.
Overall, tracing workflows, agent behavior, and external tool calls is now significantly simpler and more consistent across languages.

2025-10-30

API-Driven Monitors

We now have API-Triggered Monitors: A monitor type that only evaluates when triggered via API call, instead of on a fixed schedule. Ideal for teams running evaluations after events like batch ingestions, model retraining, or CI/CD workflows.

2025-10-30

Automatic Threshold Ranges for Monitors

You can now set upper and lower bounds using our new Auto Threshold options!Arize AX can automatically determine the right threshold for your alerts based on your historical data. This is ideal for most users who want to start monitoring without manually tuning thresholds.

2025-10-29

Data Fabric

We’re excited to introduce Data Fabric, a new capability that automatically synchronizes production trace data, evaluations, and annotations from Arize AX into your cloud data warehouse every 60 minutes in Iceberg format—giving you an always-current, query-ready source of truth.

2025-10-28

New Timeline Tab for Traces

You can now see a timeline view when you click into a trace! The new timeline view is right next to the Trace Tree and Agent Graph tabs, and it shows the execution flow and duration of each span.

2025-10-24

Sort Datasets and Experiment Listing Table

You can now sort your datasets and experiments by name, number of experiments, creation date, and more!

2025-10-15

Support for Tool Call IDs in OpenInference Messages

This update introduces full support for tool_call_id and tool_call.id in OpenInference message semantics. These identifiers are now stored alongside input and output messages. Tool call IDs now appear in the trace slideover’s input/output and attributes tabs.

2025-10-14

We’ve added a Data Region selector to the login page, allowing users to choose their preferred data region during sign-in. This helps ensure compliance and improved performance based on regional data needs.

2025-10-13

Add Auth Failures to Tracing

This release adds tracing for authentication failures, enabling better visibility and debugging of auth-related issues across systems.

2025-10-12

Total Traces on Stats Bar

Your projects now display the total number of traces directly in the Stats Bar at the top for quicker visibility into overall activity.

2025-10-10

Support for GPT OSS Models on Bedrock

We’ve added support for GPT open-source models available on Bedrock — try them out now in the Prompt Playground!

2025-10-05

Expanded LLM Support

Expanded LLM model support to include Claude models on Bedrock and Vertex, Titan Text Premiere, Amazon Nova Premiere, Gemini 2.5 Flash/Pro, and new GPT OSS and DeepSeek models —offering broader coverage across top providers.

2025-10-03

You can now define time settings per widget in dashboards! This enhancement adds flexibility by letting you set custom time ranges at the widget level — without losing the ability to apply a global dashboard time range. It’s a powerful way to dig deeper into data and run more tailored analyses.

2025-10-01

Autocomplete for Annotations on Datasets

You can now autocomplete annotation variables when editing eval templates in the playground or directly from dataset slideovers. This makes building and managing evals faster and more intuitive.

See more

2026

2025

2024

2023

2022

2021

History

Documentation Index

​Build Arize-Powered Tooling in Go with the New Go SDK v2

​Automatically Add Spans to Labeling Queues

​Run an Evaluator on Selected Experiments in One Click

​Manage Users, Roles, and Invitations from the Python SDK

​Fixes and Improvements

​Visualize Evaluator Score Distributions Across Spans and Experiments

​Review and Confirm Alyx Proposals Before They Take Effect

​Wire Experiment Runs into Automated Pipelines with the run_experiment REST API

​Assign Multiple Annotation Queue Records to a Reviewer in Bulk

​Control Annotation Queue Capacity with Per-Queue Record Limits

​Fixes and Improvements

​RBAC (GA)

​Alyx Improvements

​Evaluator Improvements

​Custom Metrics

​Annotation Queues

​Datasets & Experiments

​Tracing & Sessions

​Dashboards & Visualization

​Webhooks & Events

​Playground

​Data Fabric

​Models & Integrations

​SDKs & REST APIs

​CLI Commands

​Alyx Improvements

​Evaluator Improvements

​SDKs & REST APIs

​CLI Commands

​Tracing Improvements

​Dashboard & Visualization

​Model & Integration Updates

​Annotation Improvements

​Saved Views on Tracing

​Dashboard export options

​SDKs & REST APIs

​Evaluator Improvements

​Annotation Improvements

​Alyx Improvements

​CLI Commands

​Bedrock Integration Updates

​Dataset Summary Columns

​REST APIs

​Alyx Improvements

​Saved Views for Playground

​Evaluators in Playground

​Trace Views

​CLI Commands

​Projects in Navigation

​Gemini Provider Support

​Code Evaluator in New Evaluator Dropdown

​Test LLM Evaluator While Creating

​What’s new with Alyx

​Alyx by surface

​Cross-cutting skills

​Use your own integrations

​Default model

​Enhanced Scheduled Task Management

​Claude Opus 4.6 on AWS Bedrock

​Tracing Monitors Unification

​Annotation Config REST API

​Prompt Version Persistence in Playground

​Enhanced Dashboard Templates

​Prompts in Experiments Table

​Enhanced Annotation Queue Filtering

​Improved Evaluator Preview Panel

​RBAC Full Lineage Support

​Text Annotation Updates via SDK

​Eval Traces in Playground Experiments

​Invocation Parameters for Online Tasks

​Delete Projects and Models from Tables

​Create and Manage Prompts via API

​Query Spans with Powerful Filters

​Escaped Curly Braces in Eval Templates

​Trace Links in Playground Experiments

​Internal Playground Tracing

​Enhanced Experiment Metadata

​Select All Spans Moved to Floating Panel

Build Arize-Powered Tooling in Go with the New Go SDK v2

Automatically Add Spans to Labeling Queues

Run an Evaluator on Selected Experiments in One Click

Manage Users, Roles, and Invitations from the Python SDK

Fixes and Improvements

Visualize Evaluator Score Distributions Across Spans and Experiments

Review and Confirm Alyx Proposals Before They Take Effect

Wire Experiment Runs into Automated Pipelines with the run_experiment REST API

Assign Multiple Annotation Queue Records to a Reviewer in Bulk

Control Annotation Queue Capacity with Per-Queue Record Limits

Fixes and Improvements

RBAC (GA)

Alyx Improvements

Evaluator Improvements

Custom Metrics

Annotation Queues

Datasets & Experiments

Tracing & Sessions

Dashboards & Visualization

Webhooks & Events

Playground

Data Fabric

Models & Integrations

SDKs & REST APIs

CLI Commands

Alyx Improvements

Evaluator Improvements

SDKs & REST APIs

CLI Commands

Tracing Improvements

Dashboard & Visualization

Model & Integration Updates

Annotation Improvements

Saved Views on Tracing

Dashboard export options

SDKs & REST APIs

Evaluator Improvements

Annotation Improvements

Alyx Improvements

CLI Commands

Bedrock Integration Updates

Dataset Summary Columns

REST APIs

Alyx Improvements

Saved Views for Playground

Evaluators in Playground

Trace Views

CLI Commands

Projects in Navigation

Gemini Provider Support

Code Evaluator in New Evaluator Dropdown

Test LLM Evaluator While Creating

What’s new with Alyx

Alyx by surface

Cross-cutting skills

Use your own integrations

Default model

Enhanced Scheduled Task Management

Claude Opus 4.6 on AWS Bedrock

Tracing Monitors Unification

Annotation Config REST API

Prompt Version Persistence in Playground

Enhanced Dashboard Templates

Prompts in Experiments Table

Enhanced Annotation Queue Filtering

Improved Evaluator Preview Panel

RBAC Full Lineage Support

Text Annotation Updates via SDK

Eval Traces in Playground Experiments

Invocation Parameters for Online Tasks

Delete Projects and Models from Tables

Create and Manage Prompts via API

Query Spans with Powerful Filters

Escaped Curly Braces in Eval Templates

Trace Links in Playground Experiments

Internal Playground Tracing

Enhanced Experiment Metadata

Select All Spans Moved to Floating Panel

Annotation Queue Progress Tracking

ax-client SDK Published