Skip to main content
The arize.utils module provides helper utilities for common preprocessing tasks. These are primarily useful when building evaluators for online tasks that operate on data exported from Arize.

Online Task Utilities

extract_nested_data_to_column

Extract deeply nested attributes from complex data structures into new DataFrame columns. This function is designed for use in online task evaluators. Data exported from Arize often contains columns with nested structures (lists of dicts, JSON strings) — for example, LLM message arrays stored under attributes.llm.output_messages. This function resolves a dot-delimited attribute path against those structures and creates new flat columns, making the values accessible to evaluators.
from arize.utils.online_tasks import extract_nested_data_to_column
Signature:
extract_nested_data_to_column(
    attributes: list[str],
    df: pd.DataFrame,
) -> pd.DataFrame
Parameters:
ParameterTypeDescription
attributeslist[str]Dot-delimited attribute paths to extract (e.g. ["attributes.llm.output_messages.0.message.content"])
dfpd.DataFrameInput DataFrame, typically exported from Arize
Returns: A new pd.DataFrame with the extracted attributes as additional columns. Rows where any of the requested attributes cannot be resolved are dropped. Raises: ColumnNotFoundError if no column in df matches any prefix of a requested attribute.
How it works: For each attribute string (e.g. "attributes.llm.output_messages.0.message.content"):
  1. Finds the longest prefix that matches an existing column name (e.g. "attributes.llm.output_messages")
  2. Uses the remainder as a path to introspect into each row’s value (e.g. "0.message.content")
  3. Creates a new column named exactly attribute with the extracted values
  4. Drops rows where any of the new columns could not be resolved
The introspection handles nested dicts, lists (by integer index), JSON strings, and dotted dict keys.
Example:
import pandas as pd
from arize.utils.online_tasks import extract_nested_data_to_column

# DataFrame exported from Arize — output_messages is a list of message dicts
df = pd.DataFrame({
    "span_id": ["s1", "s2"],
    "attributes.llm.output_messages": [
        [{"message.role": "assistant", "message.content": "The capital of France is Paris."}],
        [{"message.role": "assistant", "message.content": "Shakespeare wrote Romeo and Juliet."}],
    ],
})

# Extract the assistant's reply into a flat column
result = extract_nested_data_to_column(
    attributes=["attributes.llm.output_messages.0.message.content"],
    df=df,
)

print(result["attributes.llm.output_messages.0.message.content"].tolist())
# ["The capital of France is Paris.", "Shakespeare wrote Romeo and Juliet."]
Use in an online task evaluator:
import pandas as pd
from arize.utils.online_tasks import extract_nested_data_to_column

def my_evaluator(df: pd.DataFrame) -> pd.DataFrame:
    # Extract nested content before scoring
    df = extract_nested_data_to_column(
        attributes=[
            "attributes.input.value",
            "attributes.llm.output_messages.0.message.content",
        ],
        df=df,
    )

    # Now use the flat columns to compute scores
    df["eval.MyEval.score"] = df.apply(
        lambda row: score(
            row["attributes.input.value"],
            row["attributes.llm.output_messages.0.message.content"],
        ),
        axis=1,
    )
    return df