Skip to main content
Oumi data recipes are guided workflows for creating and adding datasets quickly and consistently. Instead of manually configuring every step, Oumi recipes streamline common data tasks for generating, structuring, and preparing datasets for machine learning. Whether you’re synthesizing new data, transforming existing data, or preparing a dataset for training, Oumi recipes reduce setup time and minimize errors.

Why use recipes?

Recipes are designed to:
  • Standardize dataset creation workflows
  • Reduce manual configuration
  • Enforce consistent formatting
  • Accelerate experimentation
  • Lower the risk of structural or quality issues
They’re especially useful when working with repeated processes such as data generation for distillation, domain adaptation, or filling coverage gaps. Depending on your workflow, recipes can help you:
  • Generate synthetic instruction-response pairs
  • Transform raw data into the Conversation format
  • Augment existing datasets
  • Apply consistent preprocessing steps
  • Prepare datasets for training or evaluation
Each recipe encapsulates best practices so you don’t need to configure everything from scratch.

Getting started with dataset recipes

To learn how to use Oumi recipes to synthesize training data, please see Synthetic Data Creation.