> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OUMI DATASETS

> Upload, generate, and validate datasets in Oumi

The Oumi Agent makes it easy to generate high-quality <Tooltip headline="Dataset" tip="Structured data used for training or evaluation." cta="Full Definition" href="/reference/key-terms#dataset">datasets</Tooltip> at any stage of the machine learning workflow. Using natural language prompts, you can create training data from scratch, [synthesize datasets from failure modes](/guides/evaluations/failure-modes), or analyze and refine existing data, without writing pipeline code.

What traditionally requires weeks of manual collection, cleaning, and formatting can be completed in hours. The Oumi Agent automates the most time-consuming parts of dataset preparation such as schema validation, format conversion, and iterative refinement, so your team spends less time on data plumbing and more time on model quality. Because datasets are generated on-demand and scoped to your task, you also avoid the cost of sourcing or licensing large generic datasets that may not fit your use case.

## STRUCTURE & CONTENTS

An <Tooltip headline="Dataset" tip="Structured data used for training or evaluation." cta="Full Definition" href="/reference/key-terms#dataset">Oumi dataset</Tooltip> is a structured collection of prompts and responses used to either train a model or evaluate its performance. Depending on your workflow, a dataset may include:

* **Prompt–response pairs** for supervised fine-tuning
* **Prompts only**, where model outputs are generated and evaluated separately
* **Multi-turn conversations** for dialogue-based training or benchmarking

## UPLOADING DATASETS

You can upload datasets directly into Oumi in a variety of common formats, including [JSON](https://huggingface.co/datasets/oumi-ai/examples/blob/main/example_contents.json), [JSONL](https://huggingface.co/datasets/oumi-ai/examples/blob/main/example_contents.jsonl), [CSV](https://huggingface.co/datasets/oumi-ai/examples/blob/main/example_contents.csv), and [Parquet](https://huggingface.co/datasets/oumi-ai/examples/blob/main/example_contents.parquet).

All Oumi datasets follow a standardized internal <Tooltip headline="Conversation" tip="Oumi’s internal dataset format with role-based messages." cta="Full Definition" href="/reference/key-terms#conversation">Conversation format</Tooltip> that defines how messages, roles, and metadata are structured. During upload, Oumi automatically validates and converts your data into this format, ensuring it works seamlessly with training, evaluation, data synthesis, and analysis tools across the platform as well as modern machine learning pipelines.

### CONTEXT FILES

Oumi also supports uploading context files to ground your models in proprietary or domain-specific data. This allows you to incorporate internal documents, knowledge bases, or other private content into your workflows.

To learn more, please see [Uploading context files](/guides/files).

## EXAMPLE USAGE

Here's an example of a properly-formed dataset for Oumi in <Tooltip headline="JSONL" tip="File format with one JSON object per line." cta="Full Definition" href="/reference/key-terms#jsonl-json-lines">JSONL</Tooltip> format:

```jsonl theme={null}
{"messages": [{"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."}], "metadata": {"source": "geography"}}
{"messages": [{"role": "user", "content": "How do I make pasta?"}, {"role": "assistant", "content": "Boil water and add pasta for 8-10 minutes."}], "metadata": {"source": "cooking"}}
{"messages": [{"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "2+2 equals 4."}], "metadata": {"source": "math"}}
```

In addition to the standard `messages` field, you can also specify a `metadata` field that is a dictionary of metadata for your data row.

* [More dataset examples](https://huggingface.co/oumi-ai/datasets)

## WHAT'S NEXT

<CardGroup cols={4}>
  <Card title="Add datasets" icon="database" href="/guides/datasets/create">
    Upload and import datasets into the Oumi platform.
  </Card>

  <Card title="Add context files" icon="file-arrow-up" href="/guides/datasets/add-files">
    Upload and import files to contextualize and ground your data.
  </Card>

  <Card title="Data explorer" icon="rocket" href="/guides/datasets/exploring">
    Explore, inspect, and validate your datasets.
  </Card>

  <Card title="Recipes" icon="rocket" href="/guides/datasets/recipes">
    Adding new datasets using guided workflows.
  </Card>
</CardGroup>
