Data synthesis methods
Oumi’s data synthesis capabilities are built directly into the platform, enabling data generation to integrate seamlessly into your end-to-end workflow. You can synthesize data using the following methods via the Oumi UI:- Using a data synthesis recipe
- Generate completions
- Generate from failure modes
- Generate from scratch
Using a data synthesis recipe
Data synthesis recipes provide a reusable, structured approach to generating high-quality synthetic datasets. Designed as configurable blueprints, these recipes define how data should be created for specific use cases or modeling needs, giving you a fast and consistent starting point. To learn more, please see Data synthesis recipes.Generate completions
Generate completions lets you create training data by supplying inputs and using a model to produce corresponding outputs. You can configure the process by selecting the model and any existing datasets to use, or by directly inserting the JSON configuration in the Oumi Builder. To generate completions:- Go to the Datasets page and click on
Create Dataset. - In the Builder window, click on
Generate Completions. - Set your
ModelandSource Dataset, and specify any optional parameters. Alternatively, copy/paste your JSON configuration into the text area on the right-hand side. - Click
Executeto start the job. Once processing is complete, the new dataset will appear on your Datasets page.
Generate from failure modes
Generating synthetic datasets through failure modes allows you to create targeted training data to fix known model errors and improve overall performance. To generate a dataset from failure mode:- Select an evaluation on the Evaluations page.
- Click
Review Failure Modes. - On the Failure Modes page, configure which evaluators’ items to include when generating datasets through failure mode.
- Click
Generate Dataset... - Give you synthesis job a Display Name and click
Run Synthesis.
Generate from scratch
Generating synthetic datasets from scratch allows you to build synthetic datasets with full control over attiebutes, sources, and output format. To generate a dataset from scratch:- On the Datasets page, click the
Create Datasetbutton. - Click
Generate from Scratch. - Configure your synthesis configurations on the
INPUTStab. - Click
Execute - Give your job a Display Name and click
Generate Dataset.