Best Practices
The Prompt Playground is a powerful tool for testing LLM Evaluators before deploying them as an online task. Within this environment, users can easily iterate and improve their evaluator configurations:- The Prompt Template: Experiment with different prompt structures to see which works best. For example, if you’re iterating on a hallucination evaluator template, you might experiment with adding few shot examples.
- The LLM Model: Compare how different LLM models, such as GPT 4o-mini or o1-mini, affect evaluation results. You can also explore performance across various providers (e.g., Anthropic’s Bedrock models or Vertex AI Gemini) and adjust LLM parameters as needed.