Skip to main content
Once your evaluation job has finished, go to Evaluations and click on your evaluation to view the results. Each evaluator’s score indicates how well the model performed against the defined criteria. Clicking on Explore Results gives access to individual sample-level results.

Interpreting results

You can use your evaluation results to:
  • Compare baseline models to fine-tuned models
  • Identify regressions or improvements due after model changes
  • Decide whether to retrain, adjust data, or refine evaluators
Evaluation results become significantly more valuable when paired with failure mode analysis and data synthesis. By identifying where and why a model underperforms, you can systematically generate targeted data to address those weaknesses, creating a tight feedback loop between evaluation, diagnosis, and improvement. This integrated approach enables more efficient iteration and drives measurable gains in model performance.