> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# EVALUATION RECIPE SCHEMA DEFINITIONS

> Schema definitions for creating evaluations

## OVERVIEW

This document describes the schema used to configure an evaluation.

The recipe defines:

* The model being evaluated
* The evaluators (judges) used
* Inference settings
* The dataset used for evaluation
* Optional failure mode analysis

<Note>Fields marked with (\*) are **required**.</Note>

***

# SCHEMA STRUCTURE

```json theme={null}
{
  "recipe": {
    "recipeConfig": {
      "type": "evaluate",
      "evaluationConfig": {}
    }
  }
}
```

***

# RECIPE \*

Root object containing the evaluation configuration.

| Field        | Type   | Required | Description                                 |
| ------------ | ------ | -------- | ------------------------------------------- |
| recipeConfig | object | ✓        | Defines the evaluation recipe configuration |

***

# RECIPECONFIG \*

Configuration describing the recipe.

| Field            | Type   | Required | Description              |
| ---------------- | ------ | -------- | ------------------------ |
| type             | string | ✓        | Type of recipe           |
| evaluationConfig | object | ✓        | Evaluation configuration |

## ALLOWED VALUES

| Field | Value        |
| ----- | ------------ |
| type  | `"evaluate"` |

***

# EVALUATIONCONFIG \*

Defines how the model evaluation should run.

| Field                | Type      | Required | Description                            |
| -------------------- | --------- | -------- | -------------------------------------- |
| evaluationType       | string    | ✓        | Evaluation strategy                    |
| modelIdentifier      | object    | ✓        | Model to evaluate                      |
| evaluators           | object\[] | ✓        | List of evaluators to run              |
| inferenceConfig      | object    | ✓        | Inference configuration                |
| dataset              | object    | ✓        | Dataset used for evaluation            |
| generateFailureModes | boolean   | ✓        | Whether to analyze evaluation failures |

***

## EVALUATIONTYPE

Defines the evaluation approach.

| Allowed Values   |
| ---------------- |
| `"single_model"` |

***

# MODELIDENTIFIER \*

Specifies the **model being evaluated**.

| Field          | Type   | Required | Description                                |
| -------------- | ------ | -------- | ------------------------------------------ |
| modelType      | enum   | ✓        | Model provider type                        |
| modelName      | string | ✓        | Model name or identifier                   |
| modelId        | number |          | Platform model ID (custom models)          |
| modelVersionId | number |          | Specific model version (latest if omitted) |
| apiKeys        | object |          | Optional API keys                          |

***

## MODELTYPE

Supported model providers.

| Value                  | Description                             |
| ---------------------- | --------------------------------------- |
| `CUSTOM_CLOUD_STORAGE` | Custom model stored in platform storage |
| `ANTHROPIC_API`        | Anthropic API model                     |
| `OPENAI_API`           | OpenAI API model                        |
| `GEMINI_API`           | Google Gemini API model                 |
| `VERTEX_API`           | Google Vertex AI model                  |
| `OUMI_API`             | Oumi API model                          |

***

## APIKEYS

Optional API credentials if not using platform credentials.

| Field        | Type   | Description              |
| ------------ | ------ | ------------------------ |
| anthropic    | string | Anthropic API key        |
| openai       | string | OpenAI API key           |
| googleGemini | string | Google Gemini API key    |
| googleVertex | string | Google Vertex AI API key |

***

# EVALUATORS \*

List of **judge evaluators** used during evaluation.

| Field            | Type   | Required | Description                           |
| ---------------- | ------ | -------- | ------------------------------------- |
| evaluatorId      | number | ✓        | ID of the evaluator                   |
| evaluatorVersion | number |          | Evaluator version (latest if omitted) |

### EXAMPLE

```json theme={null}
{
  "evaluators": [
    {
      "evaluatorId": 10,
      "evaluatorVersion": 2
    }
  ]
}
```

***

# INFERENCECONFIG \*

Defines inference parameters for the **evaluated model**.

| Field                 | Type   | Required | Description                     |
| --------------------- | ------ | -------- | ------------------------------- |
| inferenceTemperature  | number |          | Sampling temperature            |
| inferenceMaxNewTokens | number |          | Maximum tokens generated        |
| inferenceSeed         | number |          | Random seed for reproducibility |
| requestsPerMinute     | number |          | API rate limit                  |

***

# DATASET \*

Defines the dataset used for evaluation.

| Field          | Type   | Required | Description                         |
| -------------- | ------ | -------- | ----------------------------------- |
| datasetId      | number | ✓        | Dataset identifier                  |
| datasetVersion | number |          | Dataset version (latest if omitted) |

***

# GENERATEFAILUREMODES \*

Controls failure analysis generation.

| Value   | Description                           |
| ------- | ------------------------------------- |
| `true`  | Analyze and categorize model failures |
| `false` | Skip failure mode analysis            |

Failure modes may include:

* hallucination
* incorrect reasoning
* missing information
* formatting errors

***

# COMPLETE EXAMPLE

```json theme={null}
{
  "recipe": {
    "recipeConfig": {
      "type": "evaluate",
      "evaluationConfig": {
        "evaluationType": "single_model",
        "modelIdentifier": {
          "modelType": "OPENAI_API",
          "modelName": "gpt-4.1",
          "apiKeys": {
            "openai": "sk-xxxxxxxx"
          }
        },
        "evaluators": [
          {
            "evaluatorId": 12
          }
        ],
        "inferenceConfig": {
          "inferenceTemperature": 0.2,
          "inferenceMaxNewTokens": 512
        },
        "dataset": {
          "datasetId": 42
        },
        "generateFailureModes": true
      }
    }
  }
}
```

***

# VALIDATION RULES

* You must provide at least one evaluator
* `recipeConfig.type` must equal `evaluate`
* `evaluationType` must equal `single_model`
* `modelName` is required
* `modelType` must match one of the supported providers
* `datasetId` must reference a valid dataset
* `generateFailureModes` must be a boolean