TheDocumentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
arize-phoenix-evals library uses an LLM-as-judge to grade model output — hallucinations, factuality, helpfulness, toxicity, custom rubrics. Plug Bedrock-hosted models in as the judge by passing provider="bedrock" to the LLM(...) wrapper, then build a create_classifier(...) evaluator and run it over a DataFrame with evaluate_dataframe(...).
Prerequisites
- Python 3.11+
- AWS credentials with
bedrock:InvokeModelpermission on the model you want to judge with - The target foundation model enabled in your AWS region’s Bedrock model access page
Install
bedrock provider uses the LiteLLM backend under the hood; boto3 provides the AWS SDK and SigV4 signing.
Configure credentials
Thebedrock provider picks up the standard AWS credential chain — env vars, shared credentials file (~/.aws/credentials), or an attached IAM role. Set the env vars directly if you don’t already have AWS credentials configured:
Setup the eval LLM
us. / eu. inference profile prefix shown above. See the Bedrock model catalog for the id to use in your region.
Run an evaluation
This example builds a hallucination classifier and grades two sample question/answer pairs against a reference. The pattern generalizes: replace the prompt template, choices, and DataFrame columns with whatever metric you want to evaluate.Expected output
hallucination_execution_details (status + exceptions + timing) and the original hallucination_score column with each evaluator result’s full dict (name, score, label, explanation, metadata, kind, direction) — useful for surfacing the LLM’s reasoning, persisting eval rows back to Arize AX, or filtering retries.
Troubleshooting
AccessDeniedException/UnrecognizedClientException. Your AWS credentials don’t havebedrock:InvokeModelon the target model, or the credentials aren’t being picked up. Verify withaws sts get-caller-identityand confirm the role has Bedrock permissions.ValidationException: ... on-demand throughput isn't supported. The base Anthropic model id (e.g.anthropic.claude-sonnet-4-6) requires a cross-region inference profile. Switch to the regional prefix (us.anthropic.claude-sonnet-4-6for US,eu....for EU).AccessDeniedException: You don't have access to the model. The model isn’t enabled in your region. Enable it on the Bedrock model access page.- All rows return the same label. Your prompt template isn’t differentiating cases. Make sure each row’s
{input}/{output}/{reference}columns expose enough context for the judge to discriminate, and thatchoiceslists every label your prompt asks the LLM to emit. - Some rows fail with timeout / rate-limit. Pass
max_retries=toevaluate_dataframe(...)(defaults to 3). For large batches, also passinitial_per_second_request_rate=...toLLM(...)to throttle. - Logging results back to Arize AX. This guide stops at producing the eval DataFrame. To attach those evals to existing spans in an Arize AX project, use
log_evaluations_synconarize.Client. - Assuming a role from a different account. Use
boto3.client("sts").assume_role(...), export the temporary credentials as env vars, then callLLM(...)— the provider will pick them up on the next request.