LlamaIndex Workflows

LlamaIndex Workflows are a building block for complex, event-driven LLM applications — each @step is a typed handler that emits the next event. Arize AX captures every workflow run — each step invocation, the events flowing between them, and the LLM calls made inside steps — via the openinference-instrumentation-llama-index package, the same instrumentor that covers core LlamaIndex.

If you’ve already followed the LlamaIndex tracing guide, workflows are already traced — there is one instrumentor for both. This page is a workflow-focused setup that you can follow standalone.

Prerequisites

Python 3.10+
An Arize AX account (sign up)
An OPENAI_API_KEY from the OpenAI Platform

Launch Arize AX

Sign in to your Arize AX account.
From Space Settings, copy your Space ID and API Key. You will set them as ARIZE_SPACE_ID and ARIZE_API_KEY below.

Install

pip install arize-otel \
  openinference-instrumentation-llama-index \
  llama-index llama-index-llms-openai

Configure credentials

export ARIZE_SPACE_ID="<your-space-id>"
export ARIZE_API_KEY="<your-api-key>"
export ARIZE_PROJECT_NAME="llamaindex-workflows-tracing-example"
export OPENAI_API_KEY="<your-openai-api-key>"

Setup tracing

# instrumentation.py
import os

from arize.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

tracer_provider = register(
    space_id=os.environ["ARIZE_SPACE_ID"],
    api_key=os.environ["ARIZE_API_KEY"],
    project_name=os.environ["ARIZE_PROJECT_NAME"],
)

LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
print("Arize AX tracing initialized for LlamaIndex Workflows.")

Run LlamaIndex Workflows

# example.py

# Importing instrumentation first ensures tracing is set up
# before `llama_index` is imported.
from instrumentation import tracer_provider

import asyncio

from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
)
from llama_index.llms.openai import OpenAI


class OceanFactWorkflow(Workflow):
    @step
    async def answer(self, ev: StartEvent) -> StopEvent:
        # OpenAI reads OPENAI_API_KEY from the environment.
        llm = OpenAI(model="gpt-5")
        response = await llm.acomplete(ev.question)
        return StopEvent(result=str(response))


async def main() -> None:
    workflow = OceanFactWorkflow(timeout=180)
    result = await workflow.run(
        question="Why is the ocean salty? Answer in two sentences.",
    )
    print(result)


asyncio.run(main())

Expected output

Arize AX tracing initialized for LlamaIndex Workflows.
The ocean is salty because rivers continuously dissolve mineral salts from rocks and soil and carry them to the sea, where they accumulate over millions of years. Water leaves the ocean through evaporation but the salts remain, steadily concentrating until reaching today's roughly 3.5% salinity.

Verify in Arize AX

Open your Arize AX space and select project llamaindex-workflows-tracing-example.
You should see a new trace within ~30 seconds containing a OceanFactWorkflow.run parent span wrapping a step span (OceanFactWorkflow.answer) and a nested OpenAI.acomplete LLM child span with the prompt, response, and token usage attached.
If no traces appear, see Troubleshooting.

Troubleshooting

No traces in Arize AX. Confirm ARIZE_SPACE_ID and ARIZE_API_KEY are set in the same shell that runs example.py. Enable OpenTelemetry debug logs with export OTEL_LOG_LEVEL=debug and re-run.
Workflow ran but no spans appear. LlamaIndexInstrumentor().instrument(...) must run before any llama_index import. Make sure instrumentation.py is the first import in your entry point.
401 from OpenAI. Verify OPENAI_API_KEY is set and has access to gpt-5. Swap for a model your key can call.
Step did not return a StopEvent. Workflows finish only when a step returns StopEvent (or StartEvent rolls into a chain that eventually does). Check each @step’s return type.
WorkflowTimeoutError: Operation timed out after N.0 seconds. LlamaIndex Workflow has its own timeout — 45 s by default — separate from any HTTP-client timeout your LLM library uses. Reasoning-heavy models (gpt-5, o3, etc.) can blow past that on the first call. Pass timeout=180 (or similar) to the workflow constructor as shown in the Run section.

OpenTelemetry

LLM Providers

Python Agent Frameworks

TS/JS Agent Frameworks

Java Agent Frameworks

Coding Agents

Platforms

Evaluation Integrations