Skip to main content
To run an evaluation against an existing model, go to Evaluations and click on Run an Evaluation. Click on Judge-Based Evaluation from the Builder. Under the INPUTS tab, provide the following information:
  • Model - A hosted or custom model for evaluation.
  • Evaluators - One or more evaluators to score model outputs.
  • Dataset - The dataset to evaluate against.
  • Failure Mode Analysis (optional) - Whether to generate failure modes automatically
  • Inference Configurations (optional) - Inference parameters like Temperature, Max Tokens, Seed, Requests Per Minute.
After confirming and launching the evaluation job, you can view the results on the Evaluations page.