UNDERSTANDING RESULTS

Once your evaluation job has finished, go to Evaluations and click on your evaluation to view the results. Each evaluator’s score indicates how well the model performed against the defined criteria. Clicking on Explore Results gives access to individual sample-level results.

INTERPRETING RESULTS

You can use your evaluation results to:

Compare baseline models to fine-tuned models
Identify regressions or improvements due after model changes
Decide whether to retrain, adjust data, or refine evaluators

Evaluation results become significantly more valuable when paired with failure mode analysis and data synthesis. By identifying where and why a model underperforms, you can systematically generate targeted data to address those weaknesses, creating a tight feedback loop between evaluation, diagnosis, and improvement. This integrated approach enables more efficient iteration and drives measurable gains in model performance.

Getting started

Oumi workflow

UNDERSTANDING RESULTS

INTERPRETING RESULTS

​INTERPRETING RESULTS

INTERPRETING RESULTS