arize.utils module provides helper utilities for common preprocessing tasks. These are primarily useful when building evaluators for online tasks that operate on data exported from Arize.
Online Task Utilities
extract_nested_data_to_column
Extract deeply nested attributes from complex data structures into new DataFrame columns.
This function is designed for use in online task evaluators. Data exported from Arize often contains columns with nested structures (lists of dicts, JSON strings) — for example, LLM message arrays stored under attributes.llm.output_messages. This function resolves a dot-delimited attribute path against those structures and creates new flat columns, making the values accessible to evaluators.
| Parameter | Type | Description |
|---|---|---|
attributes | list[str] | Dot-delimited attribute paths to extract (e.g. ["attributes.llm.output_messages.0.message.content"]) |
df | pd.DataFrame | Input DataFrame, typically exported from Arize |
pd.DataFrame with the extracted attributes as additional columns. Rows where any of the requested attributes cannot be resolved are dropped.
Raises: ColumnNotFoundError if no column in df matches any prefix of a requested attribute.
How it works: For each attribute string (e.g.
"attributes.llm.output_messages.0.message.content"):
- Finds the longest prefix that matches an existing column name (e.g.
"attributes.llm.output_messages") - Uses the remainder as a path to introspect into each row’s value (e.g.
"0.message.content") - Creates a new column named exactly
attributewith the extracted values - Drops rows where any of the new columns could not be resolved
Example: