Skip to main content
To improve a model effectively, you first need a clear way to measure its performance. In Oumi, evaluators score model outputs against defined criteria, helping you assess quality before and after training.

Built-in & custom Evaluators

Oumi includes built-in evaluators (such as instruction following, safety, topic adherence, and truthfulness) to help you quickly establish baselines and gather early feedback. You can review, edit, and reuse these evaluators across evaluations, or create custom ones using the Builder to define the exact inputs your judge should consider. Alternatively, you can describe your desired evaluator in natural language with the Oumi Agent, specifying scoring criteria, selecting the evaluator model, and including additional dataset fields for context as needed.
Custom evaluators are reusable and should focus on a single, clearly defined property to ensure consistent and reliable results.

What’s next

Defining Evaluators

Establish criteria for measuring model performance

Evaluator Recipers

Save and reuse evaluator configurations