Common Types of Datasets
Golden Datasets: Compare Against Ideal Outputs
Curating golden datasets allows you to establish a reliable benchmark. A golden dataset provides a consistent and trusted “ground truth” for LLM outputs. By meticulously hand-labeling ideal responses, you create a stable benchmark that allows you to objectively measure and compare the performance of different models and prompt versions over time.Regression Datasets: Focus on Areas of Improvements
A regression dataset captures examples where your application previously failed or performed poorly. These datasets are crucial for ensuring that fixes or improvements persist over time and don’t reintroduce bugs or regressions. Examples are often pulled from user feedback or logs with problematic behavior.Flexible Dataset Format
Arize supports flexible dataset formats so you can structure data in the way that best fits your LLM application: 1. Key-Value Pairs: Flexible for multi-input/multi-output tasks such as function calls, agents, or classification, ensuring complex workflows can be tested consistently.| Input | Context | Output |
|---|---|---|
What is Paul Graham known for? | ”Paul Graham is an investor, entrepreneur, and computer scientist known for…" | "Paul Graham is known for co-founding Y Combinator…” |
| Input | Output |
|---|---|
”do you have to have two license plates in ontario" | "True” |