Skip to main content
Your next steps after evaluation depend on your predefined success thresholds. In general, you should:
  • Review the summary metrics and confirm the model version, dataset, and configuration used for the run.
  • If the evaluation was saved as a recipe, use it to verify reproducibility and compare results with prior runs.
  • If metrics meet or exceed your threshold (for example, 98% safety against a 95% requirement), you may proceed to deploy the model.
If results fall below your success criteria, examine detailed outputs to identify failure patterns. Adjust training data, configurations, or evaluators as needed, and refine evaluators if they do not accurately reflect real-world expectations. Oumi supports targeted data synthesis to address specific failure modes, enabling efficient fine-tuning and re-evaluation in the next iteration.