Trace LangChain Agent & Microsoft Risk+Safety Evaluators (Microsoft Foundry)

1. Azure AI Foundry and Arize for Agent Observability and Evaluation
2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments

Deploy and orchestrate AI agents at scale - governed, observable, and integrated for enterprise transformation. Microsoft Foundry offers a rich library of enterprise-grade evaluation capabilities such as Risk and Safety, while Arize AX delivers observability, evaluation and experimentation workflows for continuous improvement. Combined, they let organizations close the loop between insight and action, transforming Responsible AI from policy into practice. The result is a continuous feedback system where the same evaluators that power offline testing also monitor live production traffic. Data moves seamlessly from trace logs to evaluation results to experiment dashboards.

This tutorial follows examples illustrated in this blog:

Blog: Evaluating and Improving AI Agents at Scale with Microsoft Foundry

This tutorial covers two sections:

1. Azure AI Foundry and Arize for Agent Observability and Evaluation

This notebook demonstrates how to:

Build a LangChain multi-chain agent on Azure AI Foundry while tracing all operations to Arize AX for observability
Leverage Microsoft Risk and Safety Evaluators to evaluate LLM behavior
Log evaluation results to Arize AX for visibility

Notebook Tutorial - Foundry Agent Observability and Evaluation

Screenshot shows Arize AX Agent graph view with aggregate span level evaluation performance Microsoft Foundry Agent Observability Agent Graph

Microsoft Foundry Agent Observability Agent Graph

Screenshot showing Microsoft hate and unfairness evaluation metric attached to a span. Arize Ax Microsoft Foundry Trace Unfairness

Screenshot showing summarized dashboard with key observability metrics and evaluation KPI metrics Arize Ax Dash

2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments

This notebook demonstrates how to leverage Azure Risk and Safety Evaluators with Arize Datasets+Experiments to track and visualize experiments and evaluations in the Arize. We will use the Hate Unfairness Evaluator to evaluate the output an Azure AI Foundry agent.

Notebook Tutorial - Using Foundry Evaluators on Arize Datasets + Experiments

Screenshot showing experiment runs on the dataset, comparison of evaluation hate and unfairness metric in Arize AX Arize Ax Experiment Runs

Screenshot showing row level comparison of experiment runs in Arize AX with hate and unfairness scores, labels and explanations. Arize Ax Prompts

Trace Red Teaming Agent (Microsoft Foundry)OpenAI Agents Cookbook

⌘I

AI Engineering Workflows

Agents

Human-in-the-Loop Workflows (Annotations)

Experiments

Prompt Learning

Evaluation

Trace LangChain Agent & Microsoft Risk+Safety Evaluators (Microsoft Foundry)

Blog: Evaluating and Improving AI Agents at Scale with Microsoft Foundry

1. Azure AI Foundry and Arize for Agent Observability and Evaluation

Notebook Tutorial - Foundry Agent Observability and Evaluation

2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments

Notebook Tutorial - Using Foundry Evaluators on Arize Datasets + Experiments

AI Engineering Workflows

Agents

Human-in-the-Loop Workflows (Annotations)

Experiments

Prompt Learning

Evaluation

Documentation Index

Blog: Evaluating and Improving AI Agents at Scale with Microsoft Foundry

​1. Azure AI Foundry and Arize for Agent Observability and Evaluation

Notebook Tutorial - Foundry Agent Observability and Evaluation

​2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments

Notebook Tutorial - Using Foundry Evaluators on Arize Datasets + Experiments

1. Azure AI Foundry and Arize for Agent Observability and Evaluation

2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments