You may have evaluators that run on large datasets or use additional external data sources. To help manage resources and control costs, Arize gives you the flexibility to decide when and how your evals run and tracked. With these self-managed evals, you stay in control of execution, data, and evaluator configuration.
First, export your traces from Arize. Visit the LLM Tracing tab to see your traces and export them in code. By clicking the export button and choosing Export to Notebook, you can get the boilerplate code to copy/paste to your evaluator.
We will run through a sample LLM as a Judge eval. First, define an evaluation template:
Copy
Ask AI
MY_SAMPLE_TEMPLATE = ''' You are evaluating the positivity or negativity of the responses to questions. [BEGIN DATA] ************ [Question]: {input} ************ [Response]: {output} [END DATA] Please focus on the tone of the response. Your answer must be single word, either "positive" or "negative" '''
Check which attributes are present in your traces dataframe:
Copy
Ask AI
primary_df.columns
If you’re using OpenAI traces, set the input/output variables like this: