Advanced: Evaluator as a Class
Users have the option to run an experiment by creating an evaluator that inherits from the Evaluator(ABC) base class in the Arize Python SDK. The evaluator takes in a single dataset row as input and returns an EvaluationResult dataclass. This is an alternative you can use if you’d prefer to use object oriented programming instead of functional programming.Eval Class Inputs
The Eval argument values are supported below:| Parameter name | Description | Example |
|---|---|---|
input | experiment run input | def eval(input): ... |
output | experiment run output | def eval(output): ... |
dataset_row | the entire row of the data, including every column as dictionary key | def eval(expected): ... |
metadata | experiment metadata | def eval(metadata): ... |
EvaluationResult Outputs
The EvaluationResult results can be a score, label, tuple (score, label, explanation) or a ClassEvaluationResult
| Return Type | Description |
|---|---|
EvaluationResult | Score, label and explanation |
float | Score output |
string | Label string output |
Code Evaluator as Class
LLM Evaluator as Class Example
Here’s an example of a LLM evaluator that checks for hallucinations in the model output. The Phoenix Evals package is designed for running evaluations in code:HallucinationEvaluator class evaluates whether the output of an experiment contains hallucinations by comparing it to the expected output using an LLM. The llm_classify function runs the eval, and the evaluator returns an EvaluationResult that includes a score, label, and explanation.