2026-02-10
Delete Projects and Models from Tables
Delete projects and models directly from the table view with improved safety controls:- Action column in tables - access delete functionality without navigating to individual project pages
- Type-to-confirm deletion - must enter the exact project or model name to prevent accidental deletions
- Streamlined workflow - manage your workspace organization more efficiently
2026-02-10
Create and Manage Prompts via API
New REST API endpoints enable programmatic prompt management:- Create prompts - POST
/v2/promptsto save prompts with model parameters, tools, and response formats - List prompts - GET
/v2/promptswith filtering by space, cursor-based pagination, and sort by last updated - Get prompt details - GET
/v2/prompts/{prompt_id}to retrieve specific prompt configurations - Delete prompts - DELETE
/v2/prompts/{prompt_id}to remove prompts programmatically - Version control - systematically manage prompt variations across environments and track changes over time
2026-02-10
Query Spans with Powerful Filters
New/v2/spans endpoint enables sophisticated trace data analysis:- Complex filtering - query spans by status code, time range, and custom attributes
- Pagination support - efficiently retrieve large result sets with cursor-based navigation
- Project-scoped access - filter spans within specific projects with proper RBAC enforcement
- Programmatic debugging - quickly identify errors, performance bottlenecks, and behavioral patterns without manual searching
2026-02-09
Escaped Curly Braces in Eval Templates
Include literal JSON examples in evaluation prompts without triggering template variable errors:- Backslash escape syntax - use
\{...\}to include curly braces as literal text - JSON examples support - safely include code snippets like
\{"component":"chat-response"\}in prompts - Cross-stack compatibility - works consistently across frontend, backend TypeScript, Python, and Go services
- Template flexibility - combine template variables
{variable}with escaped literals\{literal\}in the same prompt
2026-02-09
Trace Links in Playground Experiments
Navigate directly from experiment results to full trace details:- Clickable trace links - view complete execution details for each playground run
- Available in multiple views - access traces from dataset pages, experiment comparison views, and the playground
- Bridge experimentation and observability - understand both output quality and execution behavior (latency, costs, tool calls) for every experimental run
2026-02-04
Internal Playground Tracing
All playground interactions are now automatically traced for debugging and analysis:- Automatic capture - playground experiments generate traces without manual instrumentation
- Full observability - view latency, token counts, and execution details for every playground run
- Debugging capabilities - troubleshoot issues by examining complete execution paths
- Performance metrics - track and optimize playground interactions with detailed span data
2026-02-05
Enhanced Experiment Metadata
Comprehensive tracing metadata makes it easier to connect experiments to their evaluation results:- Run and evaluator tracking -
runIdandevaluatorIdlink traces to specific experiment executions - Parent task relationships -
parentTaskIdandparentTaskRunIdconnect chained evaluations - Row identifiers -
spanIdfor tracing data andexampleIdfor dataset evaluations - Prompt templates - capture template strings and variables for reproducibility
2026-02-06
Select All Spans Moved to Floating Panel
Improved trace table UI with better multi-select controls:- Floating action panel - “Select All Spans” button relocated to the floating panel for easier access
- Cleaner table view - removed dedicated row for select-all, reducing visual clutter
- Streamlined selection - manage span selections more efficiently with grouped controls
2026-02-06
Annotation Queue Progress Tracking
Better visibility into annotation workload distribution:- Total assigned records - see how many records are assigned to annotators in each data cluster
- Progress calculation - track completion rates based on assigned vs. completed annotations
- Workload balancing - monitor task distribution across annotation teams to ensure balanced allocation
2026-02-06
ax-client SDK Published
The official Arize AX client SDK is now available for TypeScript/JavaScript projects:- Datasets API - create, list, update, and delete datasets programmatically
- Experiments API - manage experiment runs and retrieve results
- Examples API - append and update examples in datasets
- Type-safe imports - full TypeScript support with generated types
- Environment variable configuration - simple setup with API keys
2026-02-03
Clickable Span Table Rows
Navigate directly to span details by clicking anywhere in the row:- Full row clickable - entire table row is now interactive for easier navigation
- Faster debugging - quickly jump to span details without precise clicking on specific cells
- Improved UX - more intuitive interaction pattern consistent with modern data tables
2026-02-04
Dataset Examples Sorting
Organize dataset examples with flexible sorting options:- Sort by creation date - see newest or oldest examples first
- Sort by update date - identify recently modified examples
- Sort by ID - consistent ordering for reproducible views
- Druid integration - sorting optimized at the database level for performance
2026-02-03
Pivot Table Dashboard Widgets
Create custom analysis views with configurable pivot tables:- Group by dimensions - organize data by categorical fields
- Configurable metrics - select numeric columns with aggregation functions (sum, average, count)
- Ad-hoc analysis - explore data patterns without predefined reports
- Dashboard integration - save pivot configurations as reusable dashboard widgets
2026-02-03
Authentication Type for LLM Integrations
Database support for different authentication methods with LLM providers:- Auth type column - stores authentication method for each LLM integration
- Multiple auth strategies - prepares platform for API keys, OAuth, and service accounts
- Provider flexibility - manage credentials differently for various LLM providers
2026-02-03
Phoenix Import Dedicated Tab
Simplified migration experience for Phoenix users:- Dedicated import tab - separate “Import From Phoenix” tab in project creation
- Traces-only focus - streamlined UI showing only trace connection form
- Cleaner onboarding - removed Phoenix card from LLM tracing setup to reduce confusion
- Feature flag control - controlled rollout with
enablePhoenixMigrationflag
2026-02-05
Claude Opus 4.6 on Vertex AI
Access Anthropic’s most advanced model through Google Cloud:- Vertex AI integration - Claude Opus 4.6 now available in playground and online tasks
- 1M token context window - handle extensive documents and conversations
- Enhanced reasoning - state-of-the-art performance on coding and complex workflows
- Enterprise deployment - leverage Google Cloud infrastructure for Claude models
2026-02-03
External HTTPS Access to Clusters
Simplified network access for external integrations:- Port 443 ingress - external HTTPS traffic now allowed to cluster endpoints
- Secure external access - properly configured TLS/SSL for organization-wide availability
- Integration support - enables webhooks, API callbacks, and external monitoring tools
2026-02-03
Deployment Activity Heatmap
Visualize deployment patterns over time with a GitHub-style contribution graph:- 52-week view - see deployment activity for the past year at a glance
- Color-coded frequency - darker cells indicate more deployments on that day
- Interactive tooltips - hover to see date, deployment count, and services deployed
- Summary statistics - total deployments and max-per-day metrics
- Filter compatibility - heatmap updates based on service filters
2026-02-04
AI Features Timeline Improvements
Better organization of AI features with date-based filtering:- Recent activity focus - main tab shows only features from the last 7 days
- Full history tab - new “History” tab provides access to all past features
- View full history link - easily navigate between recent and historical views
- Direct URL support -
?tab=historyparameter for bookmarking and sharing
2026-02-04
Improved Feature Generation Reliability
More robust AI feature generation with better error handling:- Exponential backoff - automatic retries for rate limits and transient errors
- Branch diff context - includes current changes in prompts for better iteration
- Operation tracking -
operationUuidfield tracks async generation jobs - Collapsible logs UI - view generation logs with auto-scroll and syntax highlighting
2026-02-09
GKE Infrastructure Upgrades
Enhanced Kubernetes infrastructure for improved performance and cost efficiency:- GKE 1.33.5 - updated cluster and node pool versions
- Spot node pools - 60-91% cost savings for fault-tolerant workloads
- Extended IAM permissions - SecuritySMTP role updated with required access tokens
- DevOps logging access -
roles/logging.viewergranted for improved debugging
2026-01-31
Enhanced Usage Monitoring
New comprehensive usage tracking and reporting features for better resource management:- Datasource-level breakdowns for granular usage visibility
- Account-based tracking with improved join keys for accurate reporting
- 10-minute update intervals for near real-time usage insights
- Automated cleanup of expired data for accurate retention calculations
2026-01-31
Enhanced Platform Stability
January 2026Numerous improvements to platform reliability and performance:- Configuration drift resolution in GCP Terraform
- Enhanced error handling across services
- Improved logging and monitoring for faster troubleshooting
- Database migration optimizations for schema updates
- Better resource management for high-volume workloads
2026-01-30
Improved Onboarding Experience
Streamlined onboarding with enhanced user flows:- Redesigned onboarding cards with clearer visual hierarchy
- “My First Playground” experience for hands-on experimentation
- Role collection during signup for personalized setup
- Custom hover states matching each card’s accent color
2026-01-30
Real-Time Evaluations
Run evaluations immediately on incoming data with real-time ingestion:- Instant evaluation of production traces without delays
- Latent evaluation support for updating earlier spans
- Seamless cutover between batch and real-time processing
- Available across all Arize AX tiers by default
2026-01-30
AWS Bedrock Custom Endpoints
Enhanced AWS Bedrock integration for enterprise deployments:- Custom base URL support for private endpoints
- Inference profile ARNs for multi-region routing
- Custom model configurations for specialized deployments
- Simplified regional management with unified tracking
2026-01-30
Wildcard Array Path Variables
Access array data more flexibly in templates and experiments:- Wildcard (
*) patterns to reference all array elements - Last-index (
-1) access for the most recent item - Automatic generation of wildcard variants for convenience
- Support in task variables and experiment columns
2026-01-29
Improved Queue Management
Better user experience when managing annotation queues:- Duplicate detection with clear error messages
- Added and skipped record counts after bulk operations
- Actionable feedback when attempting to add existing records
2026-01-27
Enhanced RBAC System
Fine-grained access control with the new RBAC system:- Custom roles with specific permissions
- Space-level role bindings for granular access management
- Coexistence with legacy roles during migration
- UI support for role assignment across all user management pages
- Automatic fallback to legacy roles when custom roles are deleted
2026-01-27
Custom Metrics with LIKE Operator
More powerful filtering in custom metrics:- LIKE and ILIKE operators for pattern matching
- Wildcard support with
%syntax - Case-insensitive matching with ILIKE
- Direct Druid mapping for performance
2026-01-27
Dashboard Template Filtering
Cleaner dashboard creation experience:- LLM-only space filtering shows only relevant templates
- Context-aware templates based on project types
- Reduced clutter in template selection
- Consistent experience across spaces and projects
2026-01-27
Pivot Table Widget Schema
Foundation for advanced tabular data visualization:- Grouped categorical dimensions for organized views
- Configurable numeric columns with aggregations
- Flexible filtering and time range support
- Dashboard integration ready
2026-01-27
Evaluator Hub: Reusable Evaluators
We’re excited to introduce the Evaluator Hub - a centralized place to create, version, and reuse evaluators across all your evaluation tasks.Why Reusable Evaluators?
Previously, evaluators were defined inline each time you created a task. This led to duplicated configurations, inconsistent evaluation criteria, and extra setup overhead. With the Evaluator Hub, you define an evaluator once and use it everywhere - ensuring consistent, reliable evaluations across your organization.Key benefits:- Consistency: The same evaluator definition is used across tasks, eliminating drift in evaluation criteria
- Reliability: LLM configuration (model, provider, parameters) is set at the evaluator level, ensuring the evaluator is tested and validated with a specific model before being deployed to production tasks
- Version Control: Track changes to evaluators over time with commit messages, making it easy to audit and roll back if needed
- Flexibility: Column mappings let you reuse the same evaluator across datasets with different schemas by mapping template variables to your data columns
What’s New
- Evaluator Hub tab: Browse, search, and manage all your evaluators in one place
- Running Tasks tab: View and manage your active evaluation tasks
- “Use Evaluator” action: Quickly create a task with any evaluator pre-selected
- Column Mappings: Map evaluator template variables to your datasource columns when adding an evaluator to a task
- Evaluator Versioning: Create new versions of evaluators with commit messages to track changes
Getting Started
- Navigate to Evaluators in the left sidebar
- Click New Evaluator to create your first reusable evaluator
- Choose from pre-built templates or create a custom evaluation from scratch
- Use your evaluator in tasks by clicking Use Evaluator or selecting it when creating a new task
2026-01-23
Session Evaluations with Conversation Context
Evaluate entire conversation flows with new virtual attributes:{conversation}template variable for session-level evaluations- Chronologically ordered input/output pairs
- Automatic aggregation of multi-turn dialogues
- Root span filtering for accurate session context
2026-01-23
Tracing Configuration for Evaluation Tasks
Enable detailed debugging for evaluation tasks:- Toggle tracing on/off in Advanced Options
- Automatic trace generation for monitoring and debugging
- Persistent settings saved with your tasks
- Production-ready visibility into evaluation execution
2026-01-23
Improved Error Handling for Exceptions
Better filtering and debugging capabilities:- Filter by
exception.typeandexception.messagein the UI - OpenInference semantic convention support for exceptions
- Consistent data structure across datasources
- Faster troubleshooting of error patterns
2026-01-23
SAML Role Mapping Search
Navigate large role mapping configurations easily:- Client-side search across attributes, spaces, roles, and organizations
- Visual highlighting of search matches
- Keyboard navigation through results
- Improved usability for enterprise customers
2026-01-22
Enhanced Dashboard Time Persistence
Your dashboard preferences now persist automatically:- Auto-save time range, time zone, and granularity selections
- Instant restoration when returning to dashboards
- Per-dashboard settings for customized views
- Seamless experience across sessions
2026-01-22
Resizable Trace Slideover
Customize your viewing experience:- Draggable slideover width for optimal layout
- Persistent sizing preferences across sessions
- Better content visibility for long traces
2026-01-21
Trace Table Performance Improvements
Faster loading times for the tracing table:- 30-50% faster initial load times
- String truncation for large content
- Lazy loading of full values in tooltips
- Minimal impact on user experience
2026-01-21
Scatter Plot Widgets
Explore relationships between variables with scatter plots:- Correlation analysis for two numeric dimensions
- Interactive data points for detailed investigation
- Dashboard integration for visual analytics
- Customizable axes and filtering
2026-01-21
Enhanced Session Slideover
Better conversation visualization and navigation:- Trace labels with links to detailed views
- Visual separators between traces
- Hover highlighting synchronized between list and conversation
- Improved readability for multi-turn interactions
2026-01-21
Experiment Task Timeout Configuration
Accommodate long-running evaluations:- Configurable timeout parameter beyond 120 seconds
- Function-level control in run_experiment and evaluate_experiment
- Backward compatibility with default values
- Support for complex evaluators requiring extended processing
2026-01-21
Configurable Experiment Timeout
Handle complex evaluation scenarios:- Custom timeout values for long-running tasks
- Per-experiment configuration for flexibility
- Backward compatible defaults for existing code
2026-01-20
Expandable Trace Hierarchy
View trace structure directly in the table:- Expand traces to see child spans inline
- Hierarchical visualization without opening slideouts
- Faster navigation through complex traces
- Contextual understanding of request flow
2026-01-20
Custom Prompt Release Labels
Organize and track prompt versions with custom labels:- Tag prompt versions with meaningful identifiers
- Environment markers like “staging” or “production”
- Dynamic label suggestions from existing prompts
- Easy retrieval of specific prompt releases
2026-01-16
Eval Hub Enhancements
Improved evaluation management and visibility:- Model information in evaluator listings with provider icons
- Evaluator counts in running tasks with hover details
- Automatic save when creating or editing evaluators
- Streamlined task flow for faster evaluation setup
2026-01-16
Todo List Management Improvements
More reliable task tracking in Alyx conversations:- Visual status indicators for all todo states
- Dynamic reminders with exact update calls needed
- Plan preservation across human-in-the-loop pauses
- Clearer instructions positioned near the plan
2026-01-15
Span-to-Queue Workflow
Add spans and dataset examples to annotation queues seamlessly:- Multiple entry points from spans table, trace slideover, and queue records
- New or existing queue selection
- Batch operations for efficient queue population
- Dataclusters integration for reliable processing
2026-01-15
Atlantis Terraform Automation
Streamlined infrastructure-as-code workflows:- Pull request integration for Terraform plans
- Automated plan posting as PR comments
- DevOps team permissions for webhook debugging
- Structured review process before applying changes
2026-01-14
Java SDK Space ID Support
Modern authentication for Java applications:- Space ID authentication (space keys deprecated)
- Backward compatibility maintained with existing constructors
- Updated documentation and examples
- Test coverage for new authentication method
2026-01-14
Enhanced Space Model Schema
More control over data retention and lookback:- Space-level schema lookback overrides for custom retention
- Model-specific configurations for unique requirements
- Flexible data management across different use cases
2026-01-14
Exact Match Code Evaluator
New built-in evaluator for validation:- String equality checks for exact matches
- Expected vs actual comparisons for testing
- Multi-field access with dataset row support
- Alphabetically sorted evaluator list in UI
2026-01-12
Enhanced Annotation Configs
More powerful annotation workflows with improved configs:- Color-coded categories based on optimization direction
- Read-only view for reviewing existing configs
- Optimization direction control (maximize, minimize, or none)
- Clear label guidance for consistent evaluations
2026-01-12
Batch Annotation Updates
Efficiently annotate large volumes of data:- Optimization direction support in annotation configs
- Category-based labeling for issue detection
- Best practice guidance for naming and structure
- Streamlined categorization workflows
2026-01-09
Stacked Bar Chart Widgets
Visualize multi-dimensional data with new chart types:- Stacked bar charts for comparing categories over time
- Druid-powered queries for fast rendering
- Customizable groupings and dimensions
- Dashboard integration for comprehensive monitoring
2026-01-09
Enhanced Eval Hub Empty States
Better guidance for getting started:- Improved empty state design with clear next steps
- Documentation links for learning resources
- Actionable cards for common workflows
2026-01-08
Google Analytics 4 BigQuery Sync
Automated analytics data export:- Daily GA4 to BigQuery transfers via Terraform
- Raw event data access for advanced analysis
- Overcome GA4 limitations like sampling and retention
- Custom reporting capabilities with full data access
2026-01-08
Vertex AI Migration
Updated integration with Google Cloud AI:- Seamless Vertex AI connectivity for LLM applications
- Enhanced observability for Google Cloud deployments
- Modernized instrumentation for better tracing
2026-01-08
Generative Service Monitoring
Comprehensive monitoring for evaluation infrastructure:- Uptime and health alerts with paging
- CPU and memory monitoring with warnings
- Dedicated Grafana dashboard for visibility
- Runbook documentation for incident response
2026-01-07
Custom Model Migrations
Expanded support for custom integrations:- Custom model endpoint support in evaluations
- Higher traffic model optimization for performance
- Flexible integration options for enterprise deployments
2026-01-06
Prompt Optimization on Experiments
Run prompt optimization directly on experiment results:- Experiment selector in optimization task creation
- Dynamic column resolution for experiment data
- Enhanced iteration on proven prompts
- Seamless workflow from experiments to optimization
2026-01-05
Labeling Queue Annotations
More flexible annotation management:- Clear annotations (reset to null) anywhere
- Support across spans, queues, and experiments for consistent workflows
- Improved annotation lifecycle management
2025-12-18
Multi-Span Filters
Filter traces using multiple span conditions with:- AND, OR, NOT operators for combining conditions
- Indirectly Calls (->) and Directly Calls (=>) relationship filters
- Up to 5 filters to find complex patterns like “spans where A calls B, but not C” or “traces containing both X and Y”
- Parentheses to build complex queries and pinpoint the traces that matter

2025-12-15
Support for Opus & Haiku 4.5
Expanded LLM model support to include Opus & Haiku 4.5 models in the playground and online tasks.2025-12-12
Support for GPT-5.2 and GPT-5.2 Pro
Expanded LLM model support to include GPT-5.2 models in the playground and online tasks.2025-12-10
Improved Playground Views
The Prompt Playground page is now Playgrounds, where you can use your Playground Views! This change allows you to easily navigate to a configuration of the prompt playground you’ve saved. A playground view saves a complete snapshot of your current prompt playground session, allowing you to preserve your work, share configurations with teammates, or return to previous experiments. Here’s what’s saved in a view:- LLM Config (provider, model selection, model params, custom endpoint settings)
- Prompt Setup (messages, roles, message content, tool and function calls)
- Generated Results (when setup with datasets)
- Connected Datasource (dataset or span)

2025-12-05
Realtime Ingestion for all new Arize AX Spaces
Any spaces created on or after 12/5/25 will use realtime ingestion by default! This eliminates the delay between sending traces and seeing them in the platform, giving you instant visibility into your production workloads.2025-11-20
Structured Outputs Support for Playground
2025-11-10
Session Annotations
- Input/Output Level: Attach insights to specific output messages, automatically linked to the root span of the trace.
- Span Level: Dive deeper into a trace and annotate individual spans for precise, context-rich feedback.
2025-11-05
Integrations Revamp

2025-11-03
OpenInference TypeScript 2.0
- Added easy manual instrumentation with the same decorators, wrappers, and attribute helpers found in the Python
openinference-instrumentationpackage. - Introduced function tracing utilities that automatically create spans for sync/async function execution, including specialized wrappers for chains, agents, and tools.
- Added decorator-based method tracing, enabling automatic span creation on class methods via the
@observedecorator. - Expanded attribute helper utilities for standardized OpenTelemetry metadata creation, including helpers for inputs/outputs, LLM operations, embeddings, retrievers, and tool definitions.
- Overall, tracing workflows, agent behavior, and external tool calls is now significantly simpler and more consistent across languages.
2025-10-30
API-Driven Monitors

2025-10-30
Automatic Threshold Ranges for Monitors
You can now set upper and lower bounds using our new Auto Threshold options!Arize AX can automatically determine the right threshold for your alerts based on your historical data. This is ideal for most users who want to start monitoring without manually tuning thresholds.2025-10-29
Data Fabric

2025-10-28
New Timeline Tab for Traces

2025-10-24
Tags

- Describe source (
from-prod,EHR-record) - Encode purpose (
ab-test,regression-test) - Indicate readiness (
golden,deprecated) - Group by config (
threshold-0.85,cohort_5)
Space level, under Space Settings. They can be reused across entities that belong to that space (Datasets, Experiments, and more). More on Tags.2025-10-22
Sort Datasets and Experiment Listing Table

2025-10-15
Support for Tool Call IDs in OpenInference Messages
This update introduces full support fortool_call_id and tool_call.id in OpenInference message semantics. These identifiers are now stored alongside input and output messages. Tool call IDs now appear in the trace slideover’s input/output and attributes tabs.2025-10-14
Add Data Region to Login Page

2025-10-13
Add Auth Failures to Tracing
This release adds tracing for authentication failures, enabling better visibility and debugging of auth-related issues across systems.2025-10-12
Total Traces on Stats Bar

2025-10-10
Support for GPT OSS Models on Bedrock

2025-10-05
Expanded LLM Support
Expanded LLM model support to include Claude models on Bedrock and Vertex, Titan Text Premiere, Amazon Nova Premiere, Gemini 2.5 Flash/Pro, and new GPT OSS and DeepSeek models —offering broader coverage across top providers.2025-10-03
Dashboard Widget Time Setting

2025-10-01
Autocomplete for Annotations on Datasets

2025-08-18
Dataset Management Upgrades
The Datasets interface has been improved with CSV upload fixes, search capabilities on the Datasets List Page, and REST API support for dataset deletion.See more
2026
2025
2024
2023
2022
2021