Skip to main content
Create versioned datasets for experimentation, evaluation, and fine-tuning. Datasets are immutable and version-controlled automatically.

Key Capabilities

  • Create datasets from Python dicts or pandas DataFrames
  • Automatic versioning on updates
  • Efficient bulk operations via Arrow Flight for large datasets
  • Cache datasets locally for faster experiment iteration

List Datasets

List all datasets with optional filtering by space or name.
resp = client.datasets.list(
    space="your-space-name-or-id",  # optional
    name="my-dataset",              # optional substring filter
    limit=50,
)

for dataset in resp.datasets:
    print(dataset.id, dataset.name)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Create a Dataset

Create a new dataset with examples for evaluation or experimentation.
examples = [
    {
        "query": "What is the capital of France?",
        "expected_output": "Paris",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who wrote Romeo and Juliet?",
        "expected_output": "William Shakespeare",
        "eval.Correctness.label": "correct",
    },
]

dataset = client.datasets.create(
    space="your-space-name-or-id",
    name="my-test-dataset",
    examples=examples,
)

Get a Dataset

Retrieve a specific dataset by name or ID. When using a name, provide space to disambiguate.
dataset = client.datasets.get(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print(dataset)

Delete a Dataset

Delete a dataset by name or ID. This operation is irreversible. There is no response from this call.
client.datasets.delete(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print("Dataset deleted successfully")

List Dataset Examples

Retrieve examples from a dataset with pagination support. Pass all=True to fetch all examples via Flight (ignores limit).
resp = client.datasets.list_examples(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    limit=100,
)

for example in resp.examples:
    print(example)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Append Dataset Examples

Add new examples to an existing dataset. Examples are appended to the latest dataset version by default.
new_examples = [
    {
        "query": "What is machine learning?",
        "expected_output": "A subset of AI focused on learning from data",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who invented Python?",
        "expected_output": "Guido van Rossum",
        "eval.Correctness.label": "correct",
    },
]

updated_dataset = client.datasets.append_examples(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    examples=new_examples,
)
Note: Do not include system-managed fields (id, created_at, updated_at) in your examples. These are automatically generated by the server. Learn more: Datasets Documentation