Overview
The following describes the configuration schema used to create evaluators in Oumi, including:
- Metadata about the evaluator
- Evaluation parameters
- Model configuration
- Scoring and explanation settings
Fields marked with (*) are required.
Schema structure
{
"displayName": "",
"description": "",
"params": {},
"modelIdentifier": {},
"inferenceConfig": {},
"responseFilterMode": "",
"generateScoreExplanation": true
}
displayName *
Human-readable name of the evaluator.
Properties
| Field | Type | Required | Description |
|---|
| displayName | string | ✓ | Name used to identify the evaluator |
| description | string | | Optional description of the evaluator |
Constraints
| Field | Constraint |
|---|
| displayName | Minimum length: 1 |
Example
{
"displayName": "Answer Quality Evaluator",
"description": "Evaluates model responses using a judge model."
}
params *
Defines the evaluation parameters and scoring behavior.
Properties
| Field | Type | Required | Description |
|---|
| evaluatorType | string | ✓ | Type of evaluator |
| prompt | string | ✓ | Prompt used to guide the evaluation model |
| judgmentScores | object | | Definition of scoring categories |
| dataFields | object | | Defines input fields used by the evaluator |
| isMultiturn | boolean | | Indicates whether evaluation is multi-turn |
evaluatorType
Defines the evaluator mechanism.
prompt *
Prompt provided to the evaluator model to guide scoring behavior.
Constraints
| Field | Constraint |
|---|
| prompt | Minimum length: 1 |
Example
{
"params": {
"evaluatorType": "judge",
"prompt": "Evaluate whether the assistant's response is accurate and helpful."
}
}
judgmentScores
Defines the scoring structure used by the evaluator.
This object typically contains the set of evaluation metrics or labels the evaluator should output.
Example structure:
{
"judgmentScores": {
"accuracy": {},
"helpfulness": {},
"safety": {}
}
}
dataFields
Defines the dataset fields that are used by the evaluator.
Example structure:
{
"dataFields": {
"input": "user_question",
"response": "model_answer",
"reference": "ground_truth"
}
}
isMultiturn
Indicates whether the evaluator processes multi-turn conversations.
| Value | Meaning |
|---|
true | Evaluates conversation history |
false | Evaluates single-turn responses |
modelIdentifier *
Defines the model used by the evaluator.
This object follows the Model Identifier schema.
Example:
{
"modelIdentifier": {
"modelType": "llm",
"modelName": "Judge Model",
"modelId": "judge_model",
"modelVersionId": "v1"
}
}
inferenceConfig *
Defines runtime inference behavior for the evaluator model.
Example:
{
"inferenceConfig": {
"inferenceTemperature": 0.0,
"inferenceMaxNewTokens": 256
}
}
responseFilterMode
Controls which parts of the model output are used for evaluation.
| Value | Description |
|---|
THINKING_AND_RESPONSE | Includes both reasoning and final response |
RESPONSE_ONLY | Uses only the final response |
THINKING_ONLY | Uses only the reasoning output |
generateScoreExplanation *
Determines whether the evaluator should generate an explanation for the score.
| Value | Description |
|---|
true | Returns a textual explanation of the score |
false | Returns only the score |
Complete example
{
"displayName": "Answer Quality Evaluator",
"description": "Evaluates responses using a judge model",
"params": {
"evaluatorType": "judge",
"prompt": "Score the assistant response for accuracy and helpfulness.",
"isMultiturn": false,
"judgmentScores": {
"accuracy": {},
"helpfulness": {}
},
"dataFields": {
"input": "question",
"response": "answer"
}
},
"modelIdentifier": {
"modelType": "