- Coherence: Does the agent maintain a consistent and logical conversation flow?
- Context Retention: Does the agent remember and correctly utilize information from earlier in the conversation?
- Goal Achievement: Did the user successfully achieve their overall goal by the end of the session?
- Task Progression: For multi-step tasks, does the conversation progress logically toward completion?
- Session-Level Evaluations via UI
- Session-Level Evaluations via Code
To run evaluations at the trace level in the UI, set the evaluator scope to “Session” for each evaluator you want to operate at that level. You will the evaluation output populate next to each session. You can hover over the evaluation to filter by results or view details like score and explanation.
