DATA SYNTHESIS RECIPES

Data synthesis recipes provide a quick starting point for creating datasets tailored to specific configurations or use cases. Each recipe defines how to generate a custom dataset, making it easy to reuse and adapt across projects.

WHAT IS A DATA SYNTHESIS RECIPE?

A data synthesis recipe is effectively a synthesis configuration for generating data. Rather than writing static examples by hand, you define structured rules that describe:

What should vary across samples
How examples should be generated
Which model should perform generation
How many samples to create

You can think of a recipe as a data generation blueprint. Attributes define the ingredients, templates define the instructions, and the synthesis engine produces consistent, scalable outputs by systematically combining them. This approach allows you to:

Generate large, diverse datasets from compact specifications
Systematically target failure modes identified during evaluation
Maintain consistency across generated examples
Iterate quickly by adjusting attributes instead of rewriting data

By separating structure (attributes and templates) from scale (number of samples), recipes make synthetic data generation controllable, reproducible, and aligned with measurable training goals.

ACCESSING & EXECUTING DATA SYNTHESIS RECIPES

You can access and manage all your data synthesis recipes in one location:

Go to the Recipes page.
Filter on data synthesis recipes by selecting Synthesis in the Type drop-down menu.
Click the recipe name to open it in the Builder.
Click the Execute button to run the recipe and start the synthesis job.

Your newly synthesized dataset will appear under datasets once the job is finished.

Getting started

Oumi workflow

DATA SYNTHESIS RECIPES

WHAT IS A DATA SYNTHESIS RECIPE?

ACCESSING & EXECUTING DATA SYNTHESIS RECIPES

​WHAT IS A DATA SYNTHESIS RECIPE?

​ACCESSING & EXECUTING DATA SYNTHESIS RECIPES

WHAT IS A DATA SYNTHESIS RECIPE?

ACCESSING & EXECUTING DATA SYNTHESIS RECIPES