Skip to main content
The following prompts enable a range of machine learning tasks and workflows, including model development, data generation, evaluation, and project management, to name a few.
Please note that while this collection highlights common use cases, it represents only a subset of the Oumi Agent’s full capabilities.

End-to-end model building

Build a full workflow to create a model that handles customer inquiries.
Help me build a custom model for handling customer support tickets, including refunds, billing questions, and shipping issues
Build an end-to-end workflow for a code-focused model.
Build me a coding assistant that can explain errors, generate code snippets, and review pull requests
Create a tutoring model for students
Help me train a model that tutors middle school students in algebra, explaining concepts step by step
Train a model that mimics a specific tone and style
I want to build a model that writes marketing copy in our brand voice: friendly, concise, and professional
Create a model for initial patient intake
Help me create a model that performs initial medical triage by asking patients about symptoms and suggesting urgency levels
Create a model for drafting outbound sales emails
Help me train a model that drafts personalized cold outreach emails based on prospect company and role
Train a model to flag inappropriate content
Build a content moderation model that classifies user-generated posts as safe, needs review, or policy violation

Data synthesis: general

Create a synthetic dataset for a specific task.
Generate 500 synthetic training examples for a customer support chatbot that handles refund requests, billing disputes, and account issues
Create data covering a wide range of situations.
Generate a training dataset with diverse scenarios for a travel booking assistant, covering flights, hotels, cancellations, and itinerary changes
Create data with realistic back-and-forth dialogue.
Generate synthetic multi-turn conversations between a user and a technical support agent troubleshooting Wi-Fi connectivity issues
Create data targeting tricky or unusual inputs.
Generate training data focused on edge cases for a food ordering bot: ambiguous orders, dietary restrictions, out-of-stock items, and off-topic requests
Create data that follows a particular communication style.
Generate training examples for a friendly, casual chatbot that helps users pick outfit recommendations based on occasion and weather
Create labeled data for classification tasks.
Generate 300 examples of customer feedback classified into categories: product quality, shipping speed, customer service, and pricing
Create data where responses follow a specific format.
Generate training examples where the assistant responds with structured JSON containing fields: intent, confidence, and suggested_action
Create a held-out dataset for evaluation purposes.
Generate 100 evaluation examples for a customer support bot covering common and edge-case scenarios to use as a test set

Data synthesis: improved samples

Refine and enhance samples in an existing dataset.
Improve the quality of samples in my existing dataset by making responses more detailed, accurate, and consistent in tone
Target improvements on data that scored poorly in evaluation.
Improve the samples in my dataset that scored low on helpfulness and accuracy based on my evaluation results

Data synthesis: generate completions

Generate model completions for a prompt-only dataset.
Generate completions for my prompt-only dataset using GPT-4o
Re-generate responses with a different model.
Generate new completions for my dataset, replacing the existing responses, using Claude
Add responses with a specific persona or behavior.
Generate completions for my dataset using GPT-4o with the system instruction: You are a helpful and concise technical support agent
Control response creativity/randomness.
Generate completions for my dataset using a temperature of 0.3 for more deterministic responses

Model training

Parameter-efficient fine-tuning on your data.
Fine-tune a model on my training dataset using LoRA
Update all model weights for maximum customization.
Set up full fine-tuning for my model on my training dataset
Use a teacher model to train a student model.
Train a model using on-policy distillation with a teacher model
Customize training configuration.
Fine-tune a model on my dataset with 3 epochs, a learning rate of 2e-5, and LoRA rank 16
Include a validation dataset for monitoring.
Train a model on my training dataset and use my validation dataset to monitor training progress
Point training at a known dataset.
Fine-tune Llama on my customer support dataset

Evaluation: evaluator/judge creation

Evaluate how helpful model responses are.
Create an evaluator that judges how helpful and complete the model's responses are
Evaluate factual correctness.
Create an evaluator that judges whether the model's responses are factually accurate and free of hallucinations
Evaluate responses for harmful content.
Create an evaluator that judges whether responses are safe, avoiding harmful, biased, or inappropriate content
Evaluate communication style.
Create an evaluator that judges whether the model maintains a friendly, professional tone throughout its responses
Evaluate adherence to instructions.
Create an evaluator that judges how well the model follows the specific instructions given in the user's prompt
Evaluate response brevity.
Create an evaluator that judges whether responses are concise and to the point without unnecessary verbosity
Evaluate generated code.
Create an evaluator that judges the quality of generated code: correctness, readability, and adherence to best practices
Set up multiple evaluators at once.
Help me create evaluators for helpfulness, accuracy, safety, and tone for my customer support model

Evaluation: running evaluations

Benchmark an unmodified model before training.
Run a baseline evaluation on the base model using my test dataset and evaluators before I fine-tune it
Assess quality after training.
Evaluate my fine-tuned model using the same test dataset and evaluators I used for the baseline
Test an API model’s performance on your task.
Evaluate GPT-4o on my test dataset using my evaluators to see how it performs on my task
Benchmark two models side by side.
Help me compare my fine-tuned model against the base model by running evaluations on both with the same dataset and evaluators
Run evaluation on a particular dataset.
Run an evaluation on my model using my latest test dataset

Project exploration & resource management

See what datasets exist in your project.
List all my datasets
Inspect what’s inside a dataset.
Show me what's inside my dataset. Preview the first few items.
See models you’ve fine-tuned.
Show me all the models I've trained
See available judges.
What evaluators do I have set up?
See past evaluation runs.
Show me all my evaluation runs and their results
Monitor running operations.
What's the status of my running jobs?
Locate the most recently created item.
Show me the most recently created dataset
See what models are available for fine-tuning.
What models are available for training?

See models you can use for data generation.
What models can I use for data synthesis?
See models you can use as judges.
What models are available for evaluation?
Pick up where you left off.
Where did we leave off? What should I do next?
Investigate what went wrong.
Show me any failed operations and what went wrong

Resource cleanup

Remove a dataset from the project.
Delete my old test dataset
Remove a trained model.
Delete the model I trained last week
Remove an evaluation run.
Delete my failed evaluation run
Remove a judge configuration.
Delete the evaluator I'm no longer using

Platform knowledge

Understand SFT vs OPD.
What training methods do you support and when should I use each?
Understand parameter update strategies.
What's the difference between LoRA and full fine-tuning? Which should I choose?
Understand data generation options.
What types of data synthesis are available and when should I use each?
Understand how evaluation works.
How does the evaluation workflow work? What do I need to set up?
Understand teacher-student training.
What is on-policy distillation and when should I use it instead of SFT?
Get an overview of the end-to-end process.
Walk me through the full workflow for building a custom model from scratch

Iteration & improvement

Understand what the scores mean.
Help me analyze my evaluation results and identify where my model is weakest
Create data to fix specific weaknesses.
Generate more training data focused on the areas where my model scored lowest in evaluation
Modify training config and try again.
Retrain my model with a lower learning rate and more epochs to see if results improve
Expand evaluation coverage.
Add a new evaluator to measure conciseness. My model's responses are too long.
Re-evaluate my model after changes to see if performance has improved.
Re-evaluate my model after retraining.
Re-evaluate my model after retraining to see if the scores improved compared to the baseline

Tips for best results

  • Be specific: include details about your task, audience, and desired format for better results.
  • Provide context: tell the agent about your use case, target users, and desired tone upfront.
  • Iterate: after any step, ask the agent to adjust configs, re-run with changes, or pivot.
  • Attach files: upload example data files in the chat to help the agent understand your format and style.