Trace-level evaluations provide granular insights into individual user interactions, enabling you to assess performance on a per-request basis. This approach is particularly valuable for identifying specific successes and failures in end-to-end system performance.
We’ll go through the following steps:
- Set up tracing for a movie recommendation agent with OpenAI Agents SDK
- Build and capture individual traces representing single user requests
- Evaluate each trace across key dimensions (Tool Usage, Recommendation Relevance)
- Format evaluation outputs to match Arize AX’s schema
- Log results back to Arize AX for monitoring and analysis
Notebook Walkthrough
We will go through key code snippets on this page. To follow the full tutorial, check out the notebook or watch the video above:Trace level evals
Build Movie Recommendation Agent
Create a movie recommendation agent with three specialized tools:Create and Test the Agent

Generate Multiple Traces
Run the agent with various questions to generate multiple traces for evaluation:Get Span Data from Arize AX
Export your traces from Arize AX and prepare them for evaluation:Define and Run Evaluators
Tool Calling Order Evaluation
Evaluate whether the agent uses tools in the correct logical sequence:Recommendation Relevance Evaluation
Evaluate whether the movie recommendations match the user’s request:Log Results Back to Arize AX
Format and log the evaluation results back to Arize AX for monitoring:View Results in Arize AX
After logging the evaluations, you can view the results in the Traces tab of your Arize AX project. The evaluation results will populate for each trace, allowing you to:- Monitor trace-level performance metrics
- Identify patterns in agent effectiveness
- Track recommendation quality and relevance
