Skip to main content

Overview

This quickstart demonstrates how to build a custom model that can outperform a leading large model in just a few steps. You’ll be using:
  • Large model (judge and large model benchmarking): ChatGPT 5.2
  • Small model (for fine-tuning): Qwen/Qwen2.5-3B-Instruct
By the end, you’ll have a fine-tuned small model that outperforms the larger model, validated through Oumi’s evaluations, which you’ll set up as part of the workflow.

Task definition

You will develop an AI model for a bank that classifies customer support queries by intent, enabling accurate routing in a banking context.

Workflow steps

Oumi provides a fully automated, end-to-end workflow. For this use case, the process includes:
  • Uploading your datasets and having Oumi automatically analyze them for potential issues
  • Defining evaluators (judges) to assess accuracy and ensure proper output formatting
  • Benchmarking both the large and small models on a test dataset to establish performance baselines
  • Fine-tuning the small model using your training data to create a custom version
  • Re-run the evaluations to see your custom model outperform the larger model’s baseline
You can complete this quickstart using the Oumi Agent or just the platform UI.
1

Project Setup

Start by creating a new project in your workspace:
  1. Click on New Project.
  2. Give your project a Project Name.
  3. Provide a description in Project Context. You can also invite your team members to your project by clicking the Invite team members button and selecting their usernames.
  4. Click Create Project.

2

Set Up Your Datasets

You can use the following datasets for this quickstart. Download them to your local machine:
Oumi requires that datasets follow a specific format. Please see Datasets to learn more.
From your project’s Overview page:
  1. Click the Create button and select Dataset from the menu.
  2. Select Upload a Dataset (to the right of Create Dataset)
  3. Provide a Dataset Name for your dataset and select the JSONL files you downloaded in the previous step.
  4. Click Create Dataset. Oumi will start uploading your datasets and automatically run a series of quality checks.

3

Create Your Evaluators & Baselines

Next, define your evaluators for measuring baseline model performance. The Oumi Agent makes it easy to create custom evaluators for any metric using natural language prompts.You’ll need two evaluators for this example: one to measure accuracy against ground truth, the other to validate that outputs are correctly formatted (i.e., an integer within the valid class range).
Please see Evaluations to learn more about how to assess your model’s quality using Oumi.
From the Agent pane on the right-hand side of your screen:
  1. Give the Oumi Agent the following prompt:
You are building a model to classify and route customer support queries for a bank.
The model should determine the customer’s intent based on a provided conversation. Start by creating baselines for performance benchmarking and fine-tuning. 
First, evaluate a strong model on the uploaded test dataset, then evaluate a small language model on the test dataset. Define two custom judges for the evaluations: one to determine whether the output is correct (measuring accuracy using the ground truth labels stored in the dataset’s metadata fields as `label` and `label_name`), the other to determine whether the output is valid (is an integer between 1 and the number of classes).
To make the evaluations run faster, do not create failure modes.
  1. The Agent analyze your existing project assets and guide you through the steps for creating your model baselines and evaluations. Select GPT-5.2 as your strong model, Qwen2.5-1.5B as your small model, and GPT-5.2 as your judge model.
  2. Once the Agent finishes configuring your evaluation jobs, click Run It to kick off each evaluation. Oumi will run the two evaluations in parallel.
Once your evaluations jobs finish, review the results. Your strong model should outperform your base model. Oumi enables you to get that same performance in your custom small model. You’ll do this by fine-tuning your small model on the training dataset.You can also review your in-depth evaluation results side-by-side. From the Evaluations page, click on Compare and select your two evaluations.
4

Fine-tune & Evaluate Your Custom Model

After reviewing the baselines, you can fine-tune your custom small model to improve accuracy and close the gap.From the Agent-provided options:
  1. Select Fine-tune Qwen 2.5-3B.
  2. Review the training guidance and plan as the Agent automatically sets up your training configuration, select SFT with LoRA as your training method and parameter strategy.
  3. Click start training and approve to kick off the training job.
  1. Once the training job model completes, run an evaluation against your new custom model to ensure that it beats your previous baselines. Click Run Evaluation to kick off the job.
  1. Click Done for now to wrap up the workflow.
Your custom, high-quality AI model tailored to your classification task is now ready for use. The entire process should take only a matter of hours, not months.

Next steps

You’ve now built a custom model with Oumi, leveraging the strength of a large model while gaining the efficiency and control of a smaller one, without complex or costly development. From here, you’re ready to iterate, scale, and apply this example to your own use cases. Explore the Oumi Workflow and dive deeper into the available options and configurations for building custom AI models in Oumi.