Overview
This quickstart demonstrates how to build a custom model that can outperform a leading large model in just a few steps. You’ll be using:- Large model (judge and large model benchmarking): ChatGPT 5.2
- Small model (for fine-tuning): Qwen/Qwen2.5-3B-Instruct
Task definition
You will develop an AI model for a bank that classifies customer support queries by intent, enabling accurate routing in a banking context.Workflow steps
Oumi provides a fully automated, end-to-end workflow. For this use case, the process includes:- Uploading your datasets and having Oumi automatically analyze them for potential issues
- Defining evaluators (judges) to assess accuracy and ensure proper output formatting
- Benchmarking both the large and small models on a test dataset to establish performance baselines
- Fine-tuning the small model using your training data to create a custom version
- Re-run the evaluations to see your custom model outperform the larger model’s baseline
- Oumi Agent
- Platform UI
Project Setup
Start by creating a new project in your workspace:
- Click on
New Project. - Give your project a
Project Name. - Provide a description in
Project Context. You can also invite your team members to your project by clicking theInvite team membersbutton and selecting their usernames. - Click
Create Project.
Set Up Your Datasets
You can use the following datasets for this quickstart. Download them to your local machine:From your project’s Overview page:
Oumi requires that datasets follow a specific format. Please see Datasets to learn more.
-
Click the
Createbutton and selectDatasetfrom the menu. -
Select
Upload a Dataset(to the right of Create Dataset) -
Provide a
Dataset Namefor your dataset and select the JSONL files you downloaded in the previous step. -
Click
Create Dataset. Oumi will start uploading your datasets and automatically run a series of quality checks.
Create Your Evaluators & Baselines
Next, define your evaluators for measuring baseline model performance. The Oumi Agent makes it easy to create custom evaluators for any metric using natural language prompts.You’ll need two evaluators for this example: one to measure accuracy against ground truth, the other to validate that outputs are correctly formatted (i.e., an integer within the valid class range).From the Agent pane on the right-hand side of your screen:
Please see Evaluations to learn more about how to assess your model’s quality using Oumi.
- Give the Oumi Agent the following prompt:
-
The Agent analyze your existing project assets and guide you through the steps for creating your model baselines and evaluations. Select
GPT-5.2as yourstrong model,Qwen2.5-1.5Bas yoursmall model, andGPT-5.2as yourjudge model. -
Once the Agent finishes configuring your evaluation jobs, click
Run Itto kick off each evaluation. Oumi will run the two evaluations in parallel.
Compare and select your two evaluations.Fine-tune & Evaluate Your Custom Model
After reviewing the baselines, you can fine-tune your custom small model to improve accuracy and close the gap.From the Agent-provided options:
-
Select
Fine-tune Qwen 2.5-3B. -
Review the training guidance and plan as the Agent automatically sets up your training configuration, select
SFT with LoRAas your training method and parameter strategy. -
Click
start trainingandapproveto kick off the training job.
- Once the training job model completes, run an evaluation against your new custom model to ensure that it beats your previous baselines. Click
Run Evaluationto kick off the job.
- Click
Done for nowto wrap up the workflow.