> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ANALYZE DATASETS

> Measure, track, and improve the quality of your datasets in Oumi

Oumi automatically scans your uploaded datasets for quality issues, highlights problematic entries, and lets you fix them directly in the platform.

## HOW IT WORKS

When you upload a dataset to the platform, Oumi automatically runs a series of quality tests to identify potential issues. Any failed tests pinpoint conversations that may need to be removed to enhance overall quality.

You can then directly remove problematic rows using the Oumi UI, or export individual quality tests for further analysis.

Oumi currently provides the following quality tests:

* **Total tokens exceed `8000`** - Flags conversations where the total token count exceeds `8,000`, helping identify overly long entries that may affect processing or model performance.

* **Non-alternating user/assistant turns** - Flags conversations where messages do not strictly alternate between user and assistant, ensuring the dataset follows a consistent turn-taking structure.

* **Empty turns detected** - Flags conversations containing empty or whitespace-only messages, which can introduce noise or reduce dataset quality.

## ACCESSING DATASET QUALITY TESTS

After successfully uploading a dataset, you can access its quality tests from the **Datasets** page:

1. Click the `Quality Tests` link for your dataset.
2. Under the `Quality Tests` tab, you can expand each test to view its details.
3. Click `Export Quality Tests` to export the quality tests as a JSONL or Parquet file.

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/-C82V_kXqoBIcXEj/videos/datasets-analyze.mp4?fit=max&auto=format&n=-C82V_kXqoBIcXEj&q=85&s=5b7020159ab9a6df4963f756ac4cea4c" data-path="videos/datasets-analyze.mp4" />

## DELETING FAILED ROWS

To immediately remove rows flagged by Oumi's quality tests during dataset upload:

1. Click the `Quality Tests` link for the dataset with the failing tests.
2. Under the `Quality Tests` tab, you can expand each test and drill down further to analyze row-level errors.
3. Click `Delete Failed Rows` to delete the problematic rows. Oumi will then re-analyze your new dataset version.

<Note>Rather than modifying the original dataset, Oumi creates a new version and applies changes there, ensuring all dataset states are version-controlled and fully recoverable.</Note>

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/7fz32-rMZVuumNeP/videos/datasets-analyze-deleterows.mp4?fit=max&auto=format&n=7fz32-rMZVuumNeP&q=85&s=ff590b02f506b2228820a90a273a4bb2" data-path="videos/datasets-analyze-deleterows.mp4" />

## RESTORING PREVIOUS DATASET VERSIONS

To restore a previous dataset:

1. On the **Datasets** page, click the name of your dataset.
2. Click the three-dot menu to the right of the dataset version you want to restore.
3. Select `Restore Version`. Oumi will restore the selected dataset version as a new version and automatically rerun the quality checks.

<video autoPlay controls muted loop playsInline allowFullScreen className="w-full aspect-video rounded-xl" src="https://mintcdn.com/oumi/7fz32-rMZVuumNeP/videos/analyze-restore-dataset.mp4?fit=max&auto=format&n=7fz32-rMZVuumNeP&q=85&s=daff525a59ccab6e2089bdb1d6bfec89" data-path="videos/analyze-restore-dataset.mp4" />
