Skip to main content

Builder

A visual interface in Oumi used to create and configure machine learning assets such as datasets, evaluators, evaluations, and training workflows.

Conversation

The standardized internal format used by Oumi to represent datasets, where data is structured as a sequence of messages with defined roles (e.g., user, assistant) and associated metadata.

Data explorer

A tool in Oumi for inspecting, filtering, and analyzing datasets to better understand their structure, quality, and content.

Data provenance

Metadata that records the origin, transformations, and lineage of data within a dataset, helping ensure transparency, traceability, and reproducibility.

Data synthesis

The automated generation of new training or evaluation data using models or rules to expand, augment, or balance existing datasets.

Dataset

A structured collection of prompts, responses, or conversations used for training, evaluating, or analyzing machine learning models.

Dense

A neural network architecture where every parameter participates in every forward pass, meaning all parts of the model are active for each input.

Evaluation

The process of running a model against a dataset and scoring its outputs using evaluators to measure performance.

Evaluator

A scoring function or model that assesses the quality of model outputs according to specific criteria, such as accuracy, safety, or instruction adherence.

Failure modes

Recurring patterns where a model produces incorrect, unsafe, or undesired outputs, often used to guide dataset improvements and retraining.

Full-weight Fine-Tuning (FFT)

A training method where all parameters of a model are updated during fine-tuning.

Hyperparameter

A configurable setting that influences how a machine learning model trains or generates predictions. Examples include learning rate, temperature, batch size, and max tokens.

Instruction following

An evaluation criterion that measures how well a model adheres to the instructions given in a prompt.

JSON Lines (JSONL)

A file format where each line is a separate JSON object, commonly used for storing and streaming structured machine learning datasets.

Judge

A model or evaluation system that scores or compares model outputs based on defined criteria.

LLM-as-a-Judge

An evaluation technique where a large language model is used to assess the quality or correctness of another model’s output.

Low-Rank Adaptation (LoRA)

A parameter-efficient fine-tuning technique that updates a small set of additional parameters instead of modifying the entire model.

Max tokens

A parameter that limits the maximum number of tokens a model can generate in a single response.

Mixture-of-Experts (MoE)

A model architecture where multiple specialized sub-networks (experts) are available, and only a subset is activated for each input.

Model

A machine learning system that processes input data and generates predictions or outputs.

Open-weight LLMs

Large language models whose trained weights are publicly available for download and fine-tuning.

Parameter-Efficient Fine-Tuning (PEFT)

Training techniques that adapt a model by updating a small number of parameters rather than the full model.

Parquet

A columnar storage file format optimized for large-scale data processing and analytics.

Retrieval-Augmented Generation (RAG)

A technique that improves model responses by retrieving relevant external information and incorporating it into generation.

Recipe

A reusable configuration file that defines a workflow for tasks such as data synthesis, training, or evaluation.

Requests per minute (RPM)

A rate limit parameter that controls how many API requests can be sent within one minute.

Safety

An evaluation criterion that measures whether model outputs avoid harmful, unsafe, or policy-violating content.

Seed

A value used to initialize random processes so that results can be reproduced consistently.

Supervised fine-tuning (SFT)

A training process where a model learns from labeled prompt–response examples.

Temperature

A parameter that controls randomness in model output generation; higher values increase diversity while lower values make outputs more deterministic.

Topic adherence

An evaluation criterion that measures how well a model stays focused on the subject of the prompt.

Truthfulness

An evaluation criterion that assesses whether a model’s output is factually accurate and not misleading.