> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# DEPLOYMENT OPTIONS

> Export and serve your trained model for production use

The Oumi Agent guides you through the final step of the model development lifecycle, from exporting your trained model to selecting and configuring the right inference target. Using your performance goals, latency requirements, and infrastructure constraints as inputs, it recommends whether to deploy locally or to a cloud provider and helps configure your serving setup accordingly.

Deployment decisions that typically involve days of benchmarking and infrastructure research can be made in minutes. By matching your requirements to proven deployment configurations, the Oumi Agent reduces the risk of over-provisioning compute and helps you avoid costly trial-and-error with inference settings. Oumi exports models in a standard format compatible with popular inference engines, so you retain full flexibility over where and how you serve.

***

## DEPLOYMENT WORKFLOW

Deployment in Oumi follows a straightforward sequence:

1. **Export** your trained model from the Oumi platform
2. **Choose an inference target:** run locally on your own hardware, or deploy to a cloud provider
3. **Serve the model** using a compatible inference engine (e.g., vLLM, Hugging Face Transformers)
4. **Monitor and iterate:** re-evaluate and retrain as production data evolves

***

## CHOOSING A DEPLOYMENT TARGET

The right deployment target depends on your latency requirements, data privacy needs, and infrastructure preferences.

|                  | Local Inference                               | Cloud Inference                              |
| ---------------- | --------------------------------------------- | -------------------------------------------- |
| **Best for**     | Development, testing, air-gapped environments | Production, high-throughput, scalable APIs   |
| **Hardware**     | Your own GPU or CPU                           | Cloud GPU instances (AWS, GCP, Lambda, etc.) |
| **Data privacy** | Full control; data never leaves your machine  | Depends on provider and configuration        |
| **Setup effort** | Low; single command with vLLM                 | Moderate; instance provisioning required     |
| **Scalability**  | Limited to local resources                    | Scales horizontally on demand                |
| **Cost**         | Infrastructure you already own                | Pay-per-use or reserved instance pricing     |

***

## LOCAL INFERENCE

Run your exported model directly on your own hardware using [vLLM](https://github.com/vllm-project/vllm) or Hugging Face Transformers. This is the fastest way to get a model running after export and is ideal for iterative testing, internal tools, and privacy-sensitive workloads.

[Learn more about Local Inference →](/guides/deployment/local-inference)

***

## CLOUD INFERENCE

Deploy your exported model to a cloud provider for scalable, production-grade serving. Oumi-exported models are compatible with several managed inference platforms and GPU cloud providers, including AWS Bedrock and Lambda.

[Learn more about Cloud Inference →](/guides/deployment/cloud-inference)

***

## WHAT'S NEXT

<CardGroup cols={3}>
  <Card title="Exporting your model" icon="file-export" href="/guides/deployment/exporting-models">
    Download your trained model artifacts from Oumi.
  </Card>

  <Card title="Local inference" icon="laptop" href="/guides/deployment/local-inference">
    Serve your model on your own hardware with vLLM or Hugging Face.
  </Card>

  <Card title="Cloud inference" icon="cloud" href="/guides/deployment/cloud-inference">
    Deploy to AWS, Lambda, or another GPU cloud provider.
  </Card>
</CardGroup>
