Skip to main content
Oumi makes it easy to synthesize data for a wide range of machine learning use cases, whether you’re training a model from scratch, fine-tuning an existing one, or improving performance through targeted evaluation criteria.

Data synthesis methods

Oumi’s data synthesis capabilities are built directly into the platform, enabling data generation to integrate seamlessly into your end-to-end workflow. You can synthesize data using the following methods via the Oumi UI: These options provide flexible, scalable ways to generate high-quality datasets aligned with your modeling goals, no matter where you are in the workflow.

Using a data synthesis recipe

Data synthesis recipes provide a reusable, structured approach to generating high-quality synthetic datasets. Designed as configurable blueprints, these recipes define how data should be created for specific use cases or modeling needs, giving you a fast and consistent starting point. To learn more, please see Data synthesis recipes.

Generate completions

Generate completions lets you create training data by supplying inputs and using a model to produce corresponding outputs. You can configure the process by selecting the model and any existing datasets to use, or by directly inserting the JSON configuration in the Oumi Builder. To generate completions:
  1. Go to the Datasets page and click on Create Dataset.
  2. In the Builder window, click on Generate Completions.
  3. Set your Model and Source Dataset, and specify any optional parameters. Alternatively, copy/paste your JSON configuration into the text area on the right-hand side.
  4. Click Execute to start the job. Once processing is complete, the new dataset will appear on your Datasets page.

Generate from failure modes

Generating synthetic datasets through failure modes allows you to create targeted training data to fix known model errors and improve overall performance. To generate a dataset from failure mode:
  1. Select an evaluation on the Evaluations page.
  2. Click Review Failure Modes.
  3. On the Failure Modes page, configure which evaluators’ items to include when generating datasets through failure mode.
  4. Click Generate Dataset...
  5. Give you synthesis job a Display Name and click Run Synthesis.

Generate from scratch

Generating synthetic datasets from scratch allows you to build synthetic datasets with full control over attiebutes, sources, and output format. To generate a dataset from scratch:
  1. On the Datasets page, click the Create Dataset button.
  2. Click Generate from Scratch.
  3. Configure your synthesis configurations on the INPUTS tab.
  4. Click Execute
  5. Give your job a Display Name and click Generate Dataset.
You can check the job status in the activity log. Your newly synthesized dataset will be available on your Datasets page when the job completes.