TheDocumentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
arize-phoenix-evals library uses an LLM-as-judge to grade model output — hallucinations, factuality, helpfulness, toxicity, custom rubrics. Plug Vertex AI in as the judge by passing provider="vertex" to the LLM(...) wrapper, then build a create_classifier(...) evaluator and run it over a DataFrame with evaluate_dataframe(...).
Prerequisites
- Python 3.11+
- A Google Cloud project with the Vertex AI API enabled
- A service account or user with the
roles/aiplatform.userIAM role - Authenticated Application Default Credentials (
gcloud auth application-default login) or a service account JSON file referenced byGOOGLE_APPLICATION_CREDENTIALS
Install
vertex provider dispatches via the LiteLLM backend to the regional aiplatform.googleapis.com endpoint. google-auth is required so LiteLLM can resolve Application Default Credentials; without it the first eval call exits with ModuleNotFoundError: No module named 'google'.
Configure credentials
Vertex AI uses Google Cloud auth, not an API key. Authenticate locally and tell the SDK which project/region to target:VERTEXAI_PROJECT is mandatory — the SDK exits with Could not resolve project_id if it isn’t set. VERTEXAI_LOCATION is optional and defaults to us-central1; set it explicitly when you need a different region (e.g. europe-west1 for EU residency, or to match where the target model is enabled).
Setup the eval LLM
gemini-2.5-flash is a strong default judge — fast and cheap relative to gemini-2.5-pro. The judge’s job is classification, not generation, so a smaller model is often sufficient.
Run an evaluation
This example builds a hallucination classifier and grades two sample question/answer pairs against a reference. The pattern generalizes: replace the prompt template, choices, and DataFrame columns with whatever metric you want to evaluate.Expected output
hallucination_execution_details (status + exceptions + timing) and the original hallucination_score column with each evaluator result’s full dict (name, score, label, explanation, metadata, kind, direction) — useful for surfacing the LLM’s reasoning, persisting eval rows back to Arize AX, or filtering retries.
Troubleshooting
ModuleNotFoundError: No module named 'google'. Thegoogle-authpackage isn’t installed. Add it to your install line (pip install ... google-auth ...) — or, equivalently, installlitellm[google]which pulls in the fullgoogle-cloud-aiplatformSDK plus its auth deps.Permission denied on resource project .../PERMISSION_DENIED. The principal in your ADC doesn’t haveroles/aiplatform.user(or finer-grained Vertex permissions) on the project, or you authenticated with end-user credentials that have no quota project. Grant the role in the IAM console, then rungcloud auth application-default set-quota-project <PROJECT_ID>.Reauthentication needed/ expired credentials. Rungcloud auth application-default loginagain, or rotate the service account key referenced byGOOGLE_APPLICATION_CREDENTIALS.Could not resolve project_id.VERTEXAI_PROJECTisn’t set and ADC didn’t surface a default project. Either exportVERTEXAI_PROJECTexplicitly or rungcloud config set project <PROJECT_ID>beforegcloud auth application-default login.404 NOT_FOUNDfor the model. The model isn’t available in the region you set forVERTEXAI_LOCATION(or in the defaultus-central1if you didn’t set one). Check the Vertex AI generative model availability matrix and swap regions accordingly.- All rows return the same label. Your prompt template isn’t differentiating cases. Make sure each row’s
{input}/{output}/{reference}columns expose enough context for the judge to discriminate, and thatchoiceslists every label your prompt asks the LLM to emit. - Some rows fail with timeout / rate-limit. Pass
max_retries=toevaluate_dataframe(...)(defaults to 3). For large batches, also passinitial_per_second_request_rate=...toLLM(...)to throttle. - Logging results back to Arize AX. This guide stops at producing the eval DataFrame. To attach those evals to existing spans in an Arize AX project, use
log_evaluations_synconarize.Client. - Using the Gemini API instead of Vertex. Set
GEMINI_API_KEYand switch toprovider="google"— see the Gemini evals doc for the full pattern.