The following prompts enable a range of machine learning tasks and workflows, including model development, data generation, evaluation, and project management, to name a few.
Please note that while this collection highlights common use cases, it represents only a subset of the Oumi Agent’s full capabilities.
End-to-end model building
Build a Customer Support Bot
Build a full workflow to create a model that handles customer inquiries. Help me build a custom model for handling customer support tickets, including refunds, billing questions, and shipping issues
Build an end-to-end workflow for a code-focused model. Build me a coding assistant that can explain errors, generate code snippets, and review pull requests
Create a tutoring model for students Help me train a model that tutors middle school students in algebra, explaining concepts step by step
Build a Brand Voice Model
Train a model that mimics a specific tone and style I want to build a model that writes marketing copy in our brand voice: friendly, concise, and professional
Build a Medical Triage Assistant
Create a model for initial patient intake Help me create a model that performs initial medical triage by asking patients about symptoms and suggesting urgency levels
Build a Legal Document Summarizer
Train a model to summarize legal texts Build me a model that summarizes legal contracts into plain-language bullet points for non-lawyers
Build a Sales Email Generator
Create a model for drafting outbound sales emails Help me train a model that drafts personalized cold outreach emails based on prospect company and role
Build a Content Moderator
Train a model to flag inappropriate content Build a content moderation model that classifies user-generated posts as safe, needs review, or policy violation
Data synthesis: general
Generate Training Data from Scratch
Create a synthetic dataset for a specific task. Generate 500 synthetic training examples for a customer support chatbot that handles refund requests, billing disputes, and account issues
Generate Diverse Scenario Data
Create data covering a wide range of situations. Generate a training dataset with diverse scenarios for a travel booking assistant, covering flights, hotels, cancellations, and itinerary changes
Generate Multi-Turn Conversations
Create data with realistic back-and-forth dialogue. Generate synthetic multi-turn conversations between a user and a technical support agent troubleshooting Wi-Fi connectivity issues
Create data targeting tricky or unusual inputs. Generate training data focused on edge cases for a food ordering bot: ambiguous orders, dietary restrictions, out-of-stock items, and off-topic requests
Generate Data with Specific Tone
Create data that follows a particular communication style. Generate training examples for a friendly, casual chatbot that helps users pick outfit recommendations based on occasion and weather
Generate Classification Data
Create labeled data for classification tasks. Generate 300 examples of customer feedback classified into categories: product quality, shipping speed, customer service, and pricing
Generate Structured Output Data
Create data where responses follow a specific format. Generate training examples where the assistant responds with structured JSON containing fields: intent, confidence, and suggested_action
Generate Test/Evaluation Data
Create a held-out dataset for evaluation purposes. Generate 100 evaluation examples for a customer support bot covering common and edge-case scenarios to use as a test set
Data synthesis: improved samples
Improve Existing Dataset Quality
Refine and enhance samples in an existing dataset. Improve the quality of samples in my existing dataset by making responses more detailed, accurate, and consistent in tone
Target improvements on data that scored poorly in evaluation. Improve the samples in my dataset that scored low on helpfulness and accuracy based on my evaluation results
Data synthesis: generate completions
Generate model completions for a prompt-only dataset. Generate completions for my prompt-only dataset using GPT-4o
Replace Existing Completions
Re-generate responses with a different model. Generate new completions for my dataset, replacing the existing responses, using Claude
Generate Completions with System Instruction
Add responses with a specific persona or behavior. Generate completions for my dataset using GPT-4o with the system instruction: You are a helpful and concise technical support agent
Generate Completions with Custom Temperature
Control response creativity/randomness. Generate completions for my dataset using a temperature of 0.3 for more deterministic responses
Model training
Parameter-efficient fine-tuning on your data. Fine-tune a model on my training dataset using LoRA
Update all model weights for maximum customization. Set up full fine-tuning for my model on my training dataset
Train with On-Policy Distillation
Use a teacher model to train a student model. Train a model using on-policy distillation with a teacher model
Train with Specific Hyperparameters
Customize training configuration. Fine-tune a model on my dataset with 3 epochs, a learning rate of 2e-5, and LoRA rank 16
Train with Validation Set
Include a validation dataset for monitoring. Train a model on my training dataset and use my validation dataset to monitor training progress
Point training at a known dataset. Fine-tune Llama on my customer support dataset
Evaluation: evaluator/judge creation
Create a Helpfulness Judge
Evaluate how helpful model responses are. Create an evaluator that judges how helpful and complete the model's responses are
Evaluate factual correctness. Create an evaluator that judges whether the model's responses are factually accurate and free of hallucinations
Evaluate responses for harmful content. Create an evaluator that judges whether responses are safe, avoiding harmful, biased, or inappropriate content
Evaluate communication style. Create an evaluator that judges whether the model maintains a friendly, professional tone throughout its responses
Create an Instruction-following Judge
Evaluate adherence to instructions. Create an evaluator that judges how well the model follows the specific instructions given in the user's prompt
Create a Conciseness Judge
Evaluate response brevity. Create an evaluator that judges whether responses are concise and to the point without unnecessary verbosity
Create a Code Quality Judge
Evaluate generated code. Create an evaluator that judges the quality of generated code: correctness, readability, and adherence to best practices
Create a Multi-Axis Evaluation Suite
Set up multiple evaluators at once. Help me create evaluators for helpfulness, accuracy, safety, and tone for my customer support model
Evaluation: running evaluations
Run a Baseline Evaluation
Benchmark an unmodified model before training. Run a baseline evaluation on the base model using my test dataset and evaluators before I fine-tune it
Evaluate a Fine-Tuned Model
Assess quality after training. Evaluate my fine-tuned model using the same test dataset and evaluators I used for the baseline
Test an API model’s performance on your task. Evaluate GPT-4o on my test dataset using my evaluators to see how it performs on my task
Benchmark two models side by side. Help me compare my fine-tuned model against the base model by running evaluations on both with the same dataset and evaluators
Evaluate with Specific Dataset
Run evaluation on a particular dataset. Run an evaluation on my model using my latest test dataset
Project exploration & resource management
See what datasets exist in your project.
Inspect what’s inside a dataset. Show me what's inside my dataset. Preview the first few items.
See models you’ve fine-tuned. Show me all the models I've trained
See available judges. What evaluators do I have set up?
See past evaluation runs. Show me all my evaluation runs and their results
Monitor running operations. What's the status of my running jobs?
Locate the most recently created item. Show me the most recently created dataset
List Available Models for Training
See what models are available for fine-tuning. What models are available for training?
List Available Models for Synthesis
See models you can use for data generation. What models can I use for data synthesis?
List Available Models for Evaluation
See models you can use as judges. What models are available for evaluation?
Pick up where you left off. Where did we leave off? What should I do next?
Investigate what went wrong. Show me any failed operations and what went wrong
Resource cleanup
Remove a dataset from the project. Delete my old test dataset
Remove a trained model. Delete the model I trained last week
Remove an evaluation run. Delete my failed evaluation run
Remove a judge configuration. Delete the evaluator I'm no longer using
Understand SFT vs OPD. What training methods do you support and when should I use each?
Explain LoRA vs Full Fine-Tuning
Understand parameter update strategies. What's the difference between LoRA and full fine-tuning? Which should I choose?
Understand data generation options. What types of data synthesis are available and when should I use each?
Explain Evaluation Workflow
Understand how evaluation works. How does the evaluation workflow work? What do I need to set up?
Explain On-Policy Distillation
Understand teacher-student training. What is on-policy distillation and when should I use it instead of SFT?
Understand the Full Workflow
Get an overview of the end-to-end process. Walk me through the full workflow for building a custom model from scratch
Iteration & improvement
Analyze Evaluation Results
Understand what the scores mean. Help me analyze my evaluation results and identify where my model is weakest
Generate Targeted Training Data
Create data to fix specific weaknesses. Generate more training data focused on the areas where my model scored lowest in evaluation
Modify training config and try again. Retrain my model with a lower learning rate and more epochs to see if results improve
Expand evaluation coverage. Add a new evaluator to measure conciseness. My model's responses are too long.
Re-Evaluate After Changes
Re-evaluate my model after changes to see if performance has improved.
Re-Evaluate After Retraining
Re-evaluate my model after retraining. Re-evaluate my model after retraining to see if the scores improved compared to the baseline
Tips for best results
Be specific: include details about your task, audience, and desired format for better results.
Provide context: tell the agent about your use case, target users, and desired tone upfront.
Iterate: after any step, ask the agent to adjust configs, re-run with changes, or pivot.
Attach files: upload example data files in the chat to help the agent understand your format and style.