
- Span-Level Evaluation: Evaluates an individual step, such as a single LLM call or tool call.
- Trace-Level Evaluation: Evaluates a full chain of steps to assess reasoning quality and flow across multiple calls.
- Session-Level Evaluation: Evaluates the overall end-to-end interaction or conversation, focusing on user experience and outcome quality.