> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# HOW IT WORKS

> Creating synthetic datasets with Oumi

Oumi makes it easy to synthesize data for a wide range of machine learning use cases, whether you’re training a model from scratch, fine-tuning an existing one, or improving performance through targeted evaluation criteria.

***

## DATA SYNTHESIS METHODS

Oumi’s data synthesis capabilities are built directly into the platform, enabling data generation to integrate seamlessly into your end-to-end workflow.

You can synthesize data using the following methods via the Oumi UI:

* [Using a data synthesis recipe](#using-a-data-synthesis-recipe)
* [Generate completions](#generate-completions)
* [Generate from failure modes](#generate-from-failure-modes)
* [Generate from scratch](#generate-from-scratch)

These options provide flexible, scalable ways to generate high-quality datasets aligned with your modeling goals, no matter where you are in the workflow.

### USING A DATA SYNTHESIS RECIPE

Data synthesis recipes provide a reusable, structured approach to generating high-quality synthetic datasets. Designed as configurable blueprints, these recipes define how data should be created for specific use cases or modeling needs, giving you a fast and consistent starting point.

To learn more, please see [Data synthesis recipes](/guides/data-synthesis-recipes).

### GENERATE COMPLETIONS

Generate completions lets you create training data by supplying inputs and using a model to produce corresponding outputs. You can configure the process by selecting the model and any existing datasets to use, or by directly inserting the JSON configuration in the Oumi Builder.

To generate completions:

1. Go to the **Datasets** page and click on `Create Dataset`.
2. In the Builder window, click on `Generate Completions`.
3. Set your `Model` and `Source Dataset`, and specify any optional parameters. Alternatively, copy/paste your JSON configuration into the text area on the right-hand side.
4. Click `Execute` to start the job. Once processing is complete, the new dataset will appear on your **Datasets** page.

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/-C82V_kXqoBIcXEj/videos/generate-completions.mp4?fit=max&auto=format&n=-C82V_kXqoBIcXEj&q=85&s=07c6a5b66c9cfe663233dd273ea18861" data-path="videos/generate-completions.mp4" />

### GENERATE FROM FAILURE MODES

Generating synthetic datasets through failure modes allows you to create targeted training data to fix known model errors and improve overall performance.

To generate a dataset from failure mode:

1. Select an evaluation on the **Evaluations** page.
2. Click `Review Failure Modes`.
3. On the **Failure Modes** page, configure which evaluators' items to include when generating datasets through failure mode.
4. Click `Generate Dataset...`
5. Give you synthesis job a **Display Name** and click `Run Synthesis`.

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/-C82V_kXqoBIcXEj/videos/generate-from-failure.mp4?fit=max&auto=format&n=-C82V_kXqoBIcXEj&q=85&s=f92a3ffa542a626622310df131f30802" data-path="videos/generate-from-failure.mp4" />

### GENERATE FROM SCRATCH

Generating synthetic datasets from scratch allows you to build synthetic datasets with full control over attiebutes, sources, and output format.

To generate a dataset from scratch:

1. On the **Datasets** page, click the `Create Dataset` button.
2. Click `Generate from Scratch`.
3. Configure your synthesis configurations on the `INPUTS` tab.
4. Click `Execute`
5. Give your job a **Display Name** and click `Generate Dataset`.

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/-C82V_kXqoBIcXEj/videos/gen-from-scratch.mp4?fit=max&auto=format&n=-C82V_kXqoBIcXEj&q=85&s=c45289dfc2b22ec60b1c0b52ee4e4ba3" data-path="videos/gen-from-scratch.mp4" />

You can check the job status in the activity log. Your newly synthesized dataset will be available on your **Datasets** page when the job completes.
