Why are annotations critical?
Annotations enable deep error analysis, which is the first step toward writing meaningful evals and understanding where performance fall short.- A well-annotated dataset is essential for testing and refining eval templates.
- Annotations also provide a structured way to capture human feedback that can be fed back into prompt optimization and fine-tuning.
- By creating high-quality labeled data, annotations serve as a reliable ground truth.
What is an Annotation Config?
Annotation Configs allow you to define consistent annotation schemas that can be reused across your workspace, ensuring evaluations are structured and comparable over time.
- Annotation Name: Provide a clear, descriptive name for your annotation. This helps others identify its purpose (ex: Correctness or Response Helpfulness).
- Annotation Config Type: Choose how you want to capture feedback
- Categorical Options – Assign predefined labels (e.g., Correct / Incorrect, Helpful / Unhelpful).
- Continuous Score – Apply a numeric score or range to quantify performance (e.g., 0–1 for relevance).
- Freeform Text – Enter open-ended feedback for qualitative evaluations.
- Optimization Direction: Specify how the annotation is evaluated: Maximize when higher scores are better or Minimize when lower scores are better
- Define Labels or Scores: Depending on your selected type, define the label categories or scoring range. For example: Correct (score = 1) and Incorrect (score = 0)
Add Annotations in the UI
Traces
Annotations can be applied at a per-span level for LLM use cases. Within the span, you can click the icon to annotate. From here, you can choose an existing annotation config or create a new one.Experiments
You can annotate experiment results in Arize to capture human feedback. As you iterate and make system changes, this feedback serves as a strong signal for identifying improvements or regressions.
Create Annotations via API
Annotations can also be performed via our Python SDK using thelog_annotations function to attach human feedback.
Note: Annotations can be applied on spans up to 14 days prior to the current day. To apply annotations beyond this lookback window, please reach out to support@arize.com
Logging the annotation
Important Prerequisite: Before logging annotations using the SDK, you must first configure the annotation within the Arize AX UI.
Annotations Dataframe Schema
Annotations Dataframe Schema
The
annotations_dataframe requires the following columns:context.span_id: The unique identifier of the span to which the annotations should be attached.- Annotation Columns: Columns following the pattern
annotation.<annotation_name>.<suffix>:
<annotation_name>: A name for your annotation (e.g., quality, correctness, sentiment). Should be alphanumeric characters and underscores.<suffix>: Defines the type and metadata of the annotation. Valid suffixes are:label: For categorical annotations (e.g., “good”, “bad”, “spam”). The value should be a string.score: For numerical annotations (e.g., a rating from 1-5). The value should be numeric (int or float).- You must provide at least one
annotation.<annotation_name>.labelorannotation.<annotation_name>.scorecolumn for each annotation you want to log. updated_by(Optional): A string indicating who made the annotation (e.g., “user_id_123”, “annotator_team_a”). If not provided, the SDK automatically sets this to “SDK Logger”.updated_at(Optional): A timestamp indicating when the annotation was made, represented as milliseconds since the Unix epoch (integer). If not provided, the SDK automatically sets this to the current UTC time.
annotation.notes(Optional): A column containing free-form text notes that apply to the entire span, not a specific annotation label/score. The value should be a string.