Model Deployment

Full lifecycle of self-hosted model instances · Load balancing · Health checks

Model deployment manages the lifecycle of self-hosted model instances. SaaS APIs (OpenAI / Anthropic, etc.) go directly through the Interface platform and are not managed here.

When to Use This

I have…	Use
OpenAI / Claude API key	Interface platform
Self-hosted Llama / Qwen / DeepSeek	Model deployment (this page)
Internal OpenAI-compatible service	Also this page

Core Actions

New Deployment

Pick a model (from those already registered in Model definition)
Configure the instance:
- Address + port (http://10.0.0.5:8000/v1)
- Resources (GPU count / memory — display only)
- Replicas
After creation, a connectivity test runs automatically (sends a minimal request)

Modify Deployment

Scale replicas
Change endpoint (rolling migration)

Delete

Taking down a deployment = immediate removal from the load-balancer pool.

Load Balancing

For multi-instance deployments:

Strategy	Behavior
Round Robin	Round-robin (default)
Weighted Round Robin	Distribute by weight (more weight on stronger machines)
Failover priority	Primary-backup — backups used only when primary fails

Health checks auto-eject unhealthy instances and auto-restore healthy ones.

Health Checks

Real-time monitoring:

Reachability (heartbeat every N seconds)
Response time (P95)
Error rate

3 consecutive failures → eject → keep probing every N seconds → re-add when healthy.

Typical Scenarios

1 · Pure Self-Hosted Private

Internal Qwen / DeepSeek deployment, with 4 instances on Evose:

Deployment 1: qwen-max, 4 instances, Round Robin
Deployment 2: qwen-embedding, 2 instances, primary-backup Failover

2 · Hybrid Cloud

Some SaaS APIs + some self-hosted. SaaS goes through Interface platform; self-hosted on this page. Pick the org default in Default model configuration.

3 · Cross-Region Failover

Model GPT-4
├─ Interface platform OpenAI US → Round Robin 50%
├─ Interface platform Azure China → Round Robin 30%
└─ Self-hosted OpenAI-compatible → Failover backup 20%

Next Steps

SaaS API integration → Interface platform
How models are used by the business → Default model configuration
Multi-model routing strategy → Interface platform · Routing

Model Deployment

On this page