Model Deployment
Full lifecycle of self-hosted model instances · Load balancing · Health checks
Model deployment manages the lifecycle of self-hosted model instances. SaaS APIs (OpenAI / Anthropic, etc.) go directly through the Interface platform and are not managed here.
When to Use This
| I have… | Use |
|---|---|
| OpenAI / Claude API key | Interface platform |
| Self-hosted Llama / Qwen / DeepSeek | Model deployment (this page) |
| Internal OpenAI-compatible service | Also this page |
Core Actions
New Deployment
- Pick a model (from those already registered in Model definition)
- Configure the instance:
- Address + port (
http://10.0.0.5:8000/v1) - Resources (GPU count / memory — display only)
- Replicas
- Address + port (
- After creation, a connectivity test runs automatically (sends a minimal request)
Modify Deployment
- Scale replicas
- Change endpoint (rolling migration)
Delete
Taking down a deployment = immediate removal from the load-balancer pool.
Load Balancing
For multi-instance deployments:
| Strategy | Behavior |
|---|---|
| Round Robin | Round-robin (default) |
| Weighted Round Robin | Distribute by weight (more weight on stronger machines) |
| Failover priority | Primary-backup — backups used only when primary fails |
Health checks auto-eject unhealthy instances and auto-restore healthy ones.
Health Checks
Real-time monitoring:
- Reachability (heartbeat every N seconds)
- Response time (P95)
- Error rate
3 consecutive failures → eject → keep probing every N seconds → re-add when healthy.
Typical Scenarios
1 · Pure Self-Hosted Private
Internal Qwen / DeepSeek deployment, with 4 instances on Evose:
2 · Hybrid Cloud
Some SaaS APIs + some self-hosted. SaaS goes through Interface platform; self-hosted on this page. Pick the org default in Default model configuration.
3 · Cross-Region Failover
Next Steps
- SaaS API integration → Interface platform
- How models are used by the business → Default model configuration
- Multi-model routing strategy → Interface platform · Routing