OVERVIEW

The following describes the configuration schema used to create evaluators in Oumi, including:

Metadata about the evaluator
Evaluation parameters
Model configuration
Scoring and explanation settings

Fields marked with (*) are required.

SCHEMA STRUCTURE

{
  "displayName": "",
  "description": "",
  "params": {},
  "modelIdentifier": {},
  "inferenceConfig": {},
  "responseFilterMode": "",
  "generateScoreExplanation": true
}

DISPLAYNAME *

Human-readable name of the evaluator.

PROPERTIES

Field	Type	Required	Description
displayName	string	✓	Name used to identify the evaluator
description	string		Optional description of the evaluator

CONSTRAINTS

Field	Constraint
displayName	Minimum length: `1`

EXAMPLE

{
  "displayName": "Answer Quality Evaluator",
  "description": "Evaluates model responses using a judge model."
}

PARAMS *

Defines the evaluation parameters and scoring behavior.

PROPERTIES

Field	Type	Required	Description
evaluatorType	string	✓	Type of evaluator
prompt	string	✓	Prompt used to guide the evaluation model
judgmentScores	object		Definition of scoring categories
dataFields	object		Defines input fields used by the evaluator
isMultiturn	boolean		Indicates whether evaluation is multi-turn

EVALUATORTYPE

Defines the evaluator mechanism.

Allowed Values
`"judge"`

PROMPT *

Prompt provided to the evaluator model to guide scoring behavior.

CONSTRAINTS

Field	Constraint
prompt	Minimum length: `1`

EXAMPLE

{
  "params": {
    "evaluatorType": "judge",
    "prompt": "Evaluate whether the assistant's response is accurate and helpful."
  }
}

JUDGMENTSCORES

Defines the scoring structure used by the evaluator. This object typically contains the set of evaluation metrics or labels the evaluator should output. Example structure:

{
  "judgmentScores": {
    "accuracy": {},
    "helpfulness": {},
    "safety": {}
  }
}

DATAFIELDS

Defines the dataset fields that are used by the evaluator. Example structure:

{
  "dataFields": {
    "input": "user_question",
    "response": "model_answer",
    "reference": "ground_truth"
  }
}

ISMULTITURN

Indicates whether the evaluator processes multi-turn conversations.

Value	Meaning
`true`	Evaluates conversation history
`false`	Evaluates single-turn responses

MODELIDENTIFIER *

Defines the model used by the evaluator. This object follows the Model Identifier schema. Example:

{
  "modelIdentifier": {
    "modelType": "llm",
    "modelName": "Judge Model",
    "modelId": "judge_model",
    "modelVersionId": "v1"
  }
}

INFERENCECONFIG *

Defines runtime inference behavior for the evaluator model. Example:

{
  "inferenceConfig": {
    "inferenceTemperature": 0.0,
    "inferenceMaxNewTokens": 256
  }
}

RESPONSEFILTERMODE

Controls which parts of the model output are used for evaluation.

Value	Description
`THINKING_AND_RESPONSE`	Includes both reasoning and final response
`RESPONSE_ONLY`	Uses only the final response
`THINKING_ONLY`	Uses only the reasoning output

GENERATESCOREEXPLANATION *

Determines whether the evaluator should generate an explanation for the score.

Value	Description
`true`	Returns a textual explanation of the score
`false`	Returns only the score

COMPLETE EXAMPLE

{
  "displayName": "Answer Quality Evaluator",
  "description": "Evaluates responses using a judge model",
  "params": {
    "evaluatorType": "judge",
    "prompt": "Score the assistant response for accuracy and helpfulness.",
    "isMultiturn": false,
    "judgmentScores": {
      "accuracy": {},
      "helpfulness": {}
    },
    "dataFields": {
      "input": "question",
      "response": "answer"
    }
  },
  "modelIdentifier": {
    "modelType": "

Resources

Recipe configs

Recipe examples

EVALUATOR RECIPE SCHEMA DEFINITIONS

OVERVIEW

SCHEMA STRUCTURE

DISPLAYNAME *

PROPERTIES

CONSTRAINTS

EXAMPLE

PARAMS *

PROPERTIES

EVALUATORTYPE

PROMPT *

CONSTRAINTS

EXAMPLE

JUDGMENTSCORES

DATAFIELDS

ISMULTITURN

MODELIDENTIFIER *

INFERENCECONFIG *

RESPONSEFILTERMODE

GENERATESCOREEXPLANATION *

COMPLETE EXAMPLE

​OVERVIEW

​SCHEMA STRUCTURE

​DISPLAYNAME *

​PROPERTIES

​CONSTRAINTS

​EXAMPLE

​PARAMS *

​PROPERTIES

​EVALUATORTYPE

​PROMPT *

​CONSTRAINTS

​EXAMPLE

​JUDGMENTSCORES

​DATAFIELDS

​ISMULTITURN

​MODELIDENTIFIER *

​INFERENCECONFIG *

​RESPONSEFILTERMODE

​GENERATESCOREEXPLANATION *

​COMPLETE EXAMPLE

OVERVIEW

SCHEMA STRUCTURE

DISPLAYNAME *

PROPERTIES

CONSTRAINTS

EXAMPLE

PARAMS *

PROPERTIES

EVALUATORTYPE

PROMPT *

CONSTRAINTS

EXAMPLE

JUDGMENTSCORES

DATAFIELDS

ISMULTITURN

MODELIDENTIFIER *

INFERENCECONFIG *

RESPONSEFILTERMODE

GENERATESCOREEXPLANATION *

COMPLETE EXAMPLE