PROMPT LIBRARY

The following prompts enable a range of machine learning tasks and workflows, including model development, data generation, evaluation, and project management, to name a few.

Please note that while this collection highlights common use cases, it represents only a subset of the Oumi Agent’s full capabilities.

END-TO-END MODEL BUILDING

Build a customer support bot

Build a full workflow to create a model that handles customer inquiries.

Help me build a custom model for handling customer support tickets, including refunds, billing questions, and shipping issues

Build a coding assistant

Build an end-to-end workflow for a code-focused model.

Build me a coding assistant that can explain errors, generate code snippets, and review pull requests

Build an education tutor

Create a tutoring model for students

Help me train a model that tutors middle school students in algebra, explaining concepts step by step

Build a brand voice model

Train a model that mimics a specific tone and style

I want to build a model that writes marketing copy in our brand voice: friendly, concise, and professional

Build a medical triage assistant

Create a model for initial patient intake

Help me create a model that performs initial medical triage by asking patients about symptoms and suggesting urgency levels

Build a legal document summarizer

Train a model to summarize legal texts

Build me a model that summarizes legal contracts into plain-language bullet points for non-lawyers

Build a sales email generator

Create a model for drafting outbound sales emails

Help me train a model that drafts personalized cold outreach emails based on prospect company and role

Build a content moderator

Train a model to flag inappropriate content

Build a content moderation model that classifies user-generated posts as safe, needs review, or policy violation

DATA SYNTHESIS: GENERAL

Generate training data from scratch

Create a synthetic dataset for a specific task.

Generate 500 synthetic training examples for a customer support chatbot that handles refund requests, billing disputes, and account issues

Generate diverse scenario data

Create data covering a wide range of situations.

Generate a training dataset with diverse scenarios for a travel booking assistant, covering flights, hotels, cancellations, and itinerary changes

Generate multi-turn conversations

Create data with realistic back-and-forth dialogue.

Generate synthetic multi-turn conversations between a user and a technical support Agent troubleshooting Wi-Fi connectivity issues

Generate edge case data

Create data targeting tricky or unusual inputs.

Generate training data focused on edge cases for a food ordering bot: ambiguous orders, dietary restrictions, out-of-stock items, and off-topic requests

Generate data with specific tone

Create data that follows a particular communication style.

Generate training examples for a friendly, casual chatbot that helps users pick outfit recommendations based on occasion and weather

Generate classification data

Create labeled data for classification tasks.

Generate 300 examples of customer feedback classified into categories: product quality, shipping speed, customer service, and pricing

Generate structured output data

Create data where responses follow a specific format.

Generate training examples where the assistant responds with structured JSON containing fields: intent, confidence, and suggested_action

Generate test/evaluation data

Create a held-out dataset for evaluation purposes.

Generate 100 evaluation examples for a customer support bot covering common and edge-case scenarios to use as a test set

DATA SYNTHESIS: IMPROVED SAMPLES

Improve existing dataset quality

Refine and enhance samples in an existing dataset.

Improve the quality of samples in my existing dataset by making responses more detailed, accurate, and consistent in tone

Fix low-scoring samples

Target improvements on data that scored poorly in evaluation.

Improve the samples in my dataset that scored low on helpfulness and accuracy based on my evaluation results

DATA SYNTHESIS: GENERATE COMPLETIONS

Add responses to prompts

Generate model completions for a prompt-only dataset.

Generate completions for my prompt-only dataset using GPT-4o

Replace existing completions

Re-generate responses with a different model.

Generate new completions for my dataset, replacing the existing responses, using Claude

Generate completions with system instruction

Add responses with a specific persona or behavior.

Generate completions for my dataset using GPT-4o with the system instruction: You are a helpful and concise technical support Agent

Generate completions with custom temperature

Control response creativity/randomness.

Generate completions for my dataset using a temperature of 0.3 for more deterministic responses

MODEL TRAINING

Fine-tune with LoRA

Parameter-efficient fine-tuning on your data.

Fine-tune a model on my training dataset using LoRA

Full fine-tune

Update all model weights for maximum customization.

Set up full fine-tuning for my model on my training dataset

Train with on-policy distillation

Use a teacher model to train a student model.

Train a model using on-policy distillation with a teacher model

Train with specific hyperparameters

Customize training configuration.

Fine-tune a model on my dataset with 3 epochs, a learning rate of 2e-5, and LoRA rank 16

Train with validation set

Include a validation dataset for monitoring.

Train a model on my training dataset and use my validation dataset to monitor training progress

Train on specific data

Point training at a known dataset.

Fine-tune Llama on my customer support dataset

EVALUATION: EVALUATOR/JUDGE CREATION

Create a helpfulness judge

Evaluate how helpful model responses are.

Create an evaluator that judges how helpful and complete the model's responses are

Create an accuracy judge

Evaluate factual correctness.

Create an evaluator that judges whether the model's responses are factually accurate and free of hallucinations

Create a safety judge

Evaluate responses for harmful content.

Create an evaluator that judges whether responses are safe, avoiding harmful, biased, or inappropriate content

Create a tone judge

Evaluate communication style.

Create an evaluator that judges whether the model maintains a friendly, professional tone throughout its responses

Create an instruction-following judge

Evaluate adherence to instructions.

Create an evaluator that judges how well the model follows the specific instructions given in the user's prompt

Create a conciseness judge

Evaluate response brevity.

Create an evaluator that judges whether responses are concise and to the point without unnecessary verbosity

Create a code quality judge

Evaluate generated code.

Create an evaluator that judges the quality of generated code: correctness, readability, and adherence to best practices

Create a multi-axis evaluation suite

Set up multiple evaluators at once.

Help me create evaluators for helpfulness, accuracy, safety, and tone for my customer support model

EVALUATION: RUNNING EVALUATIONS

Run a baseline evaluation

Benchmark an unmodified model before training.

Run a baseline evaluation on the base model using my test dataset and evaluators before I fine-tune it

Evaluate a fine-tuned model

Assess quality after training.

Evaluate my fine-tuned model using the same test dataset and evaluators I used for the baseline

Evaluate a hosted model

Test an API model’s performance on your task.

Evaluate GPT-4o on my test dataset using my evaluators to see how it performs on my task

Compare two models

Benchmark two models side by side.

Help me compare my fine-tuned model against the base model by running evaluations on both with the same dataset and evaluators

Evaluate with specific dataset

Run evaluation on a particular dataset.

Run an evaluation on my model using my latest test dataset

PROJECT EXPLORATION & RESOURCE MANAGEMENT

List all datasets

See what datasets exist in your project.

List all my datasets

Preview dataset contents

Inspect what’s inside a dataset.

Show me what's inside my dataset. Preview the first few items.

List trained models

See models you’ve fine-tuned.

Show me all the models I've trained

List evaluators

See available judges.

What evaluators do I have set up?

List evaluations

See past evaluation runs.

Show me all my evaluation runs and their results

Check job status

Monitor running operations.

What's the status of my running jobs?

Find latest resource

Locate the most recently created item.

Show me the most recently created dataset

List available models for training

See what models are available for fine-tuning.

What models are available for training?

List available models for synthesis

See models you can use for data generation.

What models can I use for data synthesis?

List available models for evaluation

See models you can use as judges.

What models are available for evaluation?

Resume previous work

Pick up where you left off.

Where did we leave off? What should I do next?

View failed operations

Investigate what went wrong.

Show me any failed operations and what went wrong

RESOURCE CLEANUP

Delete a dataset

Remove a dataset from the project.

Delete my old test dataset

Delete a model

Remove a trained model.

Delete the model I trained last week

Delete an evaluation

Remove an evaluation run.

Delete my failed evaluation run

Delete an evaluator

Remove a judge configuration.

Delete the evaluator I'm no longer using

PLATFORM KNOWLEDGE

Explain training methods

Understand SFT vs OPD.

What training methods do you support and when should I use each?

Explain LoRA vs full fine-tuning

Understand parameter update strategies.

What's the difference between LoRA and full fine-tuning? Which should I choose?

Explain synthesis types

Understand data generation options.

What types of data synthesis are available and when should I use each?

Explain evaluation workflow

Understand how evaluation works.

How does the evaluation workflow work? What do I need to set up?

Explain on-policy distillation

Understand teacher-student training.

What is on-policy distillation and when should I use it instead of SFT?

Understand the full workflow

Get an overview of the end-to-end process.

Walk me through the full workflow for building a custom model from scratch

ITERATION & IMPROVEMENT

Analyze evaluation results

Understand what the scores mean.

Help me analyze my evaluation results and identify where my model is weakest

Generate targeted training data

Create data to fix specific weaknesses.

Generate more training data focused on the areas where my model scored lowest in evaluation

Retrain with adjustments

Modify training config and try again.

Retrain my model with a lower learning rate and more epochs to see if results improve

Add new evaluator

Expand evaluation coverage.

Add a new evaluator to measure conciseness. My model's responses are too long.

Re-evaluate after changes

Re-evaluate my model after changes to see if performance has improved.

Re-evaluate after retraining

Re-evaluate my model after retraining.

Re-evaluate my model after retraining to see if the scores improved compared to the baseline

TIPS FOR BEST RESULTS

Be specific: Include details about your task, audience, and desired format for better results.
Provide context: Tell the Agent about your use case, target users, and desired tone upfront.
Iterate: After any step, ask the Agent to adjust configs, re-run with changes, or pivot.
Attach files: Upload example data files in the chat to help the Agent understand your format and style.

Resources

Recipe configs

Recipe examples

PROMPT LIBRARY

END-TO-END MODEL BUILDING

DATA SYNTHESIS: GENERAL

DATA SYNTHESIS: IMPROVED SAMPLES

DATA SYNTHESIS: GENERATE COMPLETIONS

MODEL TRAINING

EVALUATION: EVALUATOR/JUDGE CREATION

EVALUATION: RUNNING EVALUATIONS

PROJECT EXPLORATION & RESOURCE MANAGEMENT

RESOURCE CLEANUP

PLATFORM KNOWLEDGE

ITERATION & IMPROVEMENT

TIPS FOR BEST RESULTS

​END-TO-END MODEL BUILDING

​DATA SYNTHESIS: GENERAL

​DATA SYNTHESIS: IMPROVED SAMPLES

​DATA SYNTHESIS: GENERATE COMPLETIONS

​MODEL TRAINING

​EVALUATION: EVALUATOR/JUDGE CREATION

​EVALUATION: RUNNING EVALUATIONS

​PROJECT EXPLORATION & RESOURCE MANAGEMENT

​RESOURCE CLEANUP

​PLATFORM KNOWLEDGE

​ITERATION & IMPROVEMENT

​TIPS FOR BEST RESULTS

END-TO-END MODEL BUILDING

DATA SYNTHESIS: GENERAL

DATA SYNTHESIS: IMPROVED SAMPLES

DATA SYNTHESIS: GENERATE COMPLETIONS

MODEL TRAINING

EVALUATION: EVALUATOR/JUDGE CREATION

EVALUATION: RUNNING EVALUATIONS

PROJECT EXPLORATION & RESOURCE MANAGEMENT

RESOURCE CLEANUP

PLATFORM KNOWLEDGE

ITERATION & IMPROVEMENT

TIPS FOR BEST RESULTS