HOSTED INFERENCE

OVERVIEW

Once you’ve generated training data, fine-tuned a model, and evaluated its performance, the final step is deployment. Hosted inference lets you take a model trained on the Oumi Platform and serve it as a live API endpoint, making it available for real-time use.

The Deployments feature is currently in beta.

Each deployed model is assigned its own dedicated endpoint for inference, and you retain full control over its lifecycle, with the ability to create or remove deployments as needed.

ACCESSING DEPLOYMENTS

To deploy a model, you’ll first need a trained model in your project to enable hosted inference.

From the top of the Models page, click on the Deploy Model button; alternatively, click on the + Create Deployment button from the Deployments page.
On the Deploy Model modal window, select either Custom Oumi Model or External Model:

Custom Oumi Model

Provide a unique Deployment Name.
Select a Model from the drop-down.
Click Start → to deploy your model.

External Model

Provide a unique Deployment Name.
Select a Provider from the drop-down.
Select your External Model from the drop-down.
Insert the API key for your provider.
Click Start → to deploy your model.

IMPORTING MODELS

LOCAL INFERENCE

⌘I

​OVERVIEW

​ACCESSING DEPLOYMENTS

​Custom Oumi Model

​External Model

OVERVIEW

ACCESSING DEPLOYMENTS

Custom Oumi Model

External Model