RUNNING EVALUATIONS

To run an evaluation against an existing model, go to Evaluations and click on Run an Evaluation. Click on Judge-Based Evaluation from the Builder. Under the INPUTS tab, provide the following information:

Model - A hosted or custom model for evaluation.
Evaluators - One or more evaluators to score model outputs.
Dataset - The dataset to evaluate against.
Failure Mode Analysis (optional) - Whether to generate failure modes automatically
Inference Configurations (optional) - Inference parameters like Temperature, Max Tokens, Seed, Requests Per Minute.

After confirming and launching the evaluation job, you can view the results on the Evaluations page.

Documentation Index