Session-level evaluations provide a holistic view of entire interactions, enabling you to assess broader patterns and answer high-level questions about user experience and system performance.
We’ll go through the following steps:
- Set up tracing for multi-turn AI tutor conversations
- Aggregate spans into structured sessions with truncation support
- Evaluate sessions across multiple dimensions (Correctness, Goal Completion, Frustration)
- Format evaluation outputs to match Arize AX’s schema
- Log results back to Arize AX for monitoring and analysis
Notebook Walkthrough
We will go through key code snippets on this page. To follow the full tutorial, check out the notebook or walkthrough video.Session level evals for chatbot cookbook
Build AI Tutor with Session Tracking
Create an AI tutor that tracks conversations usingusing_attributes:
Prepare Spans for Session-Level Evaluation
Use the Arize AX Client to export your spans as a dataframe:Group Spans by Session with Truncation
Here, we group our spans together to make a session dataframe. We also include logic to truncate part of the session messages if token limits are exceeded. This prevents context window issues for longer sessions.Session Evaluations
Session Correctness Evaluation
Evaluate if the AI tutor provides factually accurate and educationally sound responses:Session Frustration Evaluation
Evaluate if the student shows signs of frustration during the session:Session Goal Achievement Evaluation
Evaluate if the tutor successfully helped the student achieve their learning goals:Log Evaluations Back to Arize AX
Format and log the evaluation results back to Arize AX for monitoring:View Results in Arize AX
After logging the evaluations, you can view the results in the Sessions tab of your Arize AX project. The evaluation results will populate for each session, allowing you to:- Monitor session-level performance metrics
- Identify patterns in tutor effectiveness
- Track student satisfaction and engagement
- Compare different evaluation dimensions across sessions
