Utilities

The arize.utils module provides helper utilities for common preprocessing tasks. These are primarily useful when building evaluators for online tasks that operate on data exported from Arize.

Online Task Utilities

`extract_nested_data_to_column`

Extract deeply nested attributes from complex data structures into new DataFrame columns. This function is designed for use in online task evaluators. Data exported from Arize often contains columns with nested structures (lists of dicts, JSON strings) — for example, LLM message arrays stored under attributes.llm.output_messages. This function resolves a dot-delimited attribute path against those structures and creates new flat columns, making the values accessible to evaluators.

from arize.utils.online_tasks import extract_nested_data_to_column

Signature:

extract_nested_data_to_column(
    attributes: list[str],
    df: pd.DataFrame,
) -> pd.DataFrame

Parameters:

Parameter	Type	Description
`attributes`	`list[str]`	Dot-delimited attribute paths to extract (e.g. `["attributes.llm.output_messages.0.message.content"]`)
`df`	`pd.DataFrame`	Input DataFrame, typically exported from Arize

Returns: A new pd.DataFrame with the extracted attributes as additional columns. Rows where any of the requested attributes cannot be resolved are dropped. Raises: ColumnNotFoundError if no column in df matches any prefix of a requested attribute.

How it works: For each attribute string (e.g. "attributes.llm.output_messages.0.message.content"):

Finds the longest prefix that matches an existing column name (e.g. "attributes.llm.output_messages")
Uses the remainder as a path to introspect into each row’s value (e.g. "0.message.content")
Creates a new column named exactly attribute with the extracted values
Drops rows where any of the new columns could not be resolved

The introspection handles nested dicts, lists (by integer index), JSON strings, and dotted dict keys.

Example:

import pandas as pd
from arize.utils.online_tasks import extract_nested_data_to_column

# DataFrame exported from Arize — output_messages is a list of message dicts
df = pd.DataFrame({
    "span_id": ["s1", "s2"],
    "attributes.llm.output_messages": [
        [{"message.role": "assistant", "message.content": "The capital of France is Paris."}],
        [{"message.role": "assistant", "message.content": "Shakespeare wrote Romeo and Juliet."}],
    ],
})

# Extract the assistant's reply into a flat column
result = extract_nested_data_to_column(
    attributes=["attributes.llm.output_messages.0.message.content"],
    df=df,
)

print(result["attributes.llm.output_messages.0.message.content"].tolist())
# ["The capital of France is Paris.", "Shakespeare wrote Romeo and Juliet."]

Use in an online task evaluator:

import pandas as pd
from arize.utils.online_tasks import extract_nested_data_to_column

def my_evaluator(df: pd.DataFrame) -> pd.DataFrame:
    # Extract nested content before scoring
    df = extract_nested_data_to_column(
        attributes=[
            "attributes.input.value",
            "attributes.llm.output_messages.0.message.content",
        ],
        df=df,
    )

    # Now use the flat columns to compute scores
    df["eval.MyEval.score"] = df.apply(
        lambda row: score(
            row["attributes.input.value"],
            row["attributes.llm.output_messages.0.message.content"],
        ),
        axis=1,
    )
    return df

Version 8

Version 7

Online Task Utilities

`extract_nested_data_to_column`

Version 8

Version 7

Documentation Index

​Online Task Utilities

​extract_nested_data_to_column

Online Task Utilities

`extract_nested_data_to_column`