> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# EVALUATOR RECIPE SCHEMA DEFINITIONS

> Schema definitions for configuring evaluators

## OVERVIEW

The following describes the configuration schema used to create evaluators in Oumi, including:

* Metadata about the evaluator
* Evaluation parameters
* Model configuration
* Scoring and explanation settings

<Note>Fields marked with (\*) are **required**.</Note>

***

## SCHEMA STRUCTURE

```json theme={null}
{
  "displayName": "",
  "description": "",
  "params": {},
  "modelIdentifier": {},
  "inferenceConfig": {},
  "responseFilterMode": "",
  "generateScoreExplanation": true
}
```

***

# DISPLAYNAME \*

Human-readable name of the evaluator.

## PROPERTIES

| Field       | Type   | Required | Description                           |
| ----------- | ------ | -------- | ------------------------------------- |
| displayName | string | ✓        | Name used to identify the evaluator   |
| description | string |          | Optional description of the evaluator |

### CONSTRAINTS

| Field       | Constraint          |
| ----------- | ------------------- |
| displayName | Minimum length: `1` |

### EXAMPLE

```json theme={null}
{
  "displayName": "Answer Quality Evaluator",
  "description": "Evaluates model responses using a judge model."
}
```

***

# PARAMS \*

Defines the evaluation parameters and scoring behavior.

## PROPERTIES

| Field          | Type    | Required | Description                                |
| -------------- | ------- | -------- | ------------------------------------------ |
| evaluatorType  | string  | ✓        | Type of evaluator                          |
| prompt         | string  | ✓        | Prompt used to guide the evaluation model  |
| judgmentScores | object  |          | Definition of scoring categories           |
| dataFields     | object  |          | Defines input fields used by the evaluator |
| isMultiturn    | boolean |          | Indicates whether evaluation is multi-turn |

***

## EVALUATORTYPE

Defines the evaluator mechanism.

| Allowed Values |
| -------------- |
| `"judge"`      |

***

## PROMPT \*

Prompt provided to the evaluator model to guide scoring behavior.

### CONSTRAINTS

| Field  | Constraint          |
| ------ | ------------------- |
| prompt | Minimum length: `1` |

### EXAMPLE

```json theme={null}
{
  "params": {
    "evaluatorType": "judge",
    "prompt": "Evaluate whether the assistant's response is accurate and helpful."
  }
}
```

***

## JUDGMENTSCORES

Defines the scoring structure used by the evaluator.

This object typically contains the set of **evaluation metrics** or **labels** the evaluator should output.

Example structure:

```json theme={null}
{
  "judgmentScores": {
    "accuracy": {},
    "helpfulness": {},
    "safety": {}
  }
}
```

***

## DATAFIELDS

Defines the dataset fields that are used by the evaluator.

Example structure:

```json theme={null}
{
  "dataFields": {
    "input": "user_question",
    "response": "model_answer",
    "reference": "ground_truth"
  }
}
```

***

## ISMULTITURN

Indicates whether the evaluator processes **multi-turn conversations**.

| Value   | Meaning                         |
| ------- | ------------------------------- |
| `true`  | Evaluates conversation history  |
| `false` | Evaluates single-turn responses |

***

# MODELIDENTIFIER \*

Defines the model used by the evaluator.

This object follows the **Model Identifier schema**.

Example:

```json theme={null}
{
  "modelIdentifier": {
    "modelType": "llm",
    "modelName": "Judge Model",
    "modelId": "judge_model",
    "modelVersionId": "v1"
  }
}
```

***

# INFERENCECONFIG \*

Defines runtime inference behavior for the evaluator model.

Example:

```json theme={null}
{
  "inferenceConfig": {
    "inferenceTemperature": 0.0,
    "inferenceMaxNewTokens": 256
  }
}
```

***

# RESPONSEFILTERMODE

Controls which parts of the model output are used for evaluation.

| Value                   | Description                                |
| ----------------------- | ------------------------------------------ |
| `THINKING_AND_RESPONSE` | Includes both reasoning and final response |
| `RESPONSE_ONLY`         | Uses only the final response               |
| `THINKING_ONLY`         | Uses only the reasoning output             |

***

# GENERATESCOREEXPLANATION \*

Determines whether the evaluator should generate an explanation for the score.

| Value   | Description                                |
| ------- | ------------------------------------------ |
| `true`  | Returns a textual explanation of the score |
| `false` | Returns only the score                     |

***

# COMPLETE EXAMPLE

```json theme={null}
{
  "displayName": "Answer Quality Evaluator",
  "description": "Evaluates responses using a judge model",
  "params": {
    "evaluatorType": "judge",
    "prompt": "Score the assistant response for accuracy and helpfulness.",
    "isMultiturn": false,
    "judgmentScores": {
      "accuracy": {},
      "helpfulness": {}
    },
    "dataFields": {
      "input": "question",
      "response": "answer"
    }
  },
  "modelIdentifier": {
    "modelType": "
```
