
Google Colab
- Use Pydantic Evals to evaluate your LLM app for a simple question-answering task.
- Log your results to Arize to track your experiments and traces.
Install dependencies
Setup API keys and imports
Setup Arize
Add our auto-instrumentation for OpenAI using arize-otel.Define the Evaluation Dataset
Create a dataset of test cases using Pydantic Evals for a question-answering task.- Each Case represents a single test with an input (question) and an expected output (answer).
- The Dataset aggregates these cases for evaluation.
Setup LLM task to evaluate
Run your experiment and evaluation
View results in Arize
