Arize AX supports adding spans from your projects to datasets. The trace data from an application with errors or faulty evals can become fuel for ongoing development. You can use our tracing filters or ✨AI Search to curate your dataset.
If you’d like to create your datasets programmatically, you can using our clients to create, update, and delete datasets.To start let’s install the packages we need:
pip install --pre arize pandas
You can get your API key by navigating to the “Settings” page.
Let’s setup the Arize Dataset Client to create or update a dataset. See here for API reference.
from arize import ArizeClientclient = ArizeClient(api_key="your-arize-api-key")
You can create many different kinds of datasets. The examples below are sorted by complexity.
Simple dataset
Dataset with prompt template & variables
This is a simple dataset with just string values for the columns.
import pandas as pd# Example datasetinventions_dataset = pd.DataFrame({ "attributes.input.value": ["Telephone", "Light Bulb"], "attributes.output.value": ["Alexander Graham Bell", "Thomas Edison"],})dataset = client.datasets.create( space_id="your-arize-space-id", name="test_invention_dataset", examples=inventions_dataset,)dataset_id = dataset.id
The datasets in Arize AX can support flexible columns. You can also add the prompt template and variables to each row.In this example, we are setting attributes.llm.prompt_template.variables. We are using the OpenInference semantic conventions and Arize AX will automatically import these as input variables.
import pandas as pdimport jsonPROMPT_TEMPLATE = """You are an expert in the history of technological inventions.Identify the individual or organization that created the following invention.Invention: {invention}"""data = [ { "attributes.llm.prompt_template.template": PROMPT_TEMPLATE, "attributes.llm.prompt_template.variables": json.dumps({ "invention": "Telephone", }), "attributes.output.value": "Alexander Graham Bell" }]df = pd.DataFrame(data)dataset = client.datasets.create( space_id="your-arize-space-id", name="prompt_invention_dataset", examples=df,)dataset_id = dataset.id
In some cases, the data you have might not be enough to cover all the scenarios you want to test. This is where you can use Alyx for Synthetic Dataset Generation:
Suggested Prompt: “Generate a synthetic dataset of 20 examples that cover…”
Use When: You need labeled examples to test, fine-tune, or evaluate prompts without relying on real user data. Description: Creates artificial examples that mimic real-world scenarios enabling faster experimentation
You can save your generated examples as a dataset and test them directly in the playground.
Was this page helpful?
⌘I
Assistant
Responses are generated using AI and may contain mistakes.