You can explore individual traces and see what happened in a single request. But any non-trivial LLM app (agent, RAG, or chatbot) can fail in many different ways on the same input. When something goes wrong, the problem could be anywhere in the chain. This page shows three ways to do error analysis across many traces at once (Skills in your coding agent, Alyx in the product, or manual annotation) and how to validate a fix once you’ve found a pattern.Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Error analysis
A single failing trace is a data point. It can surface a pattern you actually want to fix. Three approaches get you from failures to labeled error modes. The manual approach walks through annotation and clustering step by step; Skills and Alyx use AI to produce categories directly.
- Arize Skills
- Alyx
- Manually
Run the Set up authenticationAsk your agentOther prompts to try: “why are these traces failing”, “group last night’s errors by root cause”. The skill runs 
arize-trace skill in your AI coding agent (Claude Code, Cursor, Codex, Windsurf, and 40+ others) to export traces and analyze them locally.Install skillax traces export under the hood; your agent reads the output and reports patterns.What you get back: a written summary of common patterns across the exported traces. Fast to run. Best for quickly spotting what is going wrong before you commit to deeper analysis.
Skills and Alyx are the fastest way to do a first pass across large volumes of traces. Manual annotation takes longer but typically produces sharper, more reliable categories. Use it on the error modes that matter most.
Fix the error mode
If you used Skills or Alyx, pick the most impactful pattern from the output and give it a name (e.g.,invalid_hotel_id). Then validate a fix before shipping:
- Save failing traces to a dataset: those traces become your regression suite for future prompt tweaks, retrieval updates, tool changes, or any other fix.
- Run an experiment: compare the old vs new version against the dataset; ship the fix only if the experiment shows improvement without regressions on other examples.
