Evose
GovernModel Platform

Model Deployment

Full lifecycle of self-hosted model instances · Load balancing · Health checks

Model deployment manages the lifecycle of self-hosted model instances. SaaS APIs (OpenAI / Anthropic, etc.) go directly through the Interface platform and are not managed here.

When to Use This

I have…Use
OpenAI / Claude API keyInterface platform
Self-hosted Llama / Qwen / DeepSeekModel deployment (this page)
Internal OpenAI-compatible serviceAlso this page

Core Actions

New Deployment

  1. Pick a model (from those already registered in Model definition)
  2. Configure the instance:
    • Address + port (http://10.0.0.5:8000/v1)
    • Resources (GPU count / memory — display only)
    • Replicas
  3. After creation, a connectivity test runs automatically (sends a minimal request)

Modify Deployment

  • Scale replicas
  • Change endpoint (rolling migration)

Delete

Taking down a deployment = immediate removal from the load-balancer pool.

Load Balancing

For multi-instance deployments:

StrategyBehavior
Round RobinRound-robin (default)
Weighted Round RobinDistribute by weight (more weight on stronger machines)
Failover priorityPrimary-backup — backups used only when primary fails

Health checks auto-eject unhealthy instances and auto-restore healthy ones.

Health Checks

Real-time monitoring:

  • Reachability (heartbeat every N seconds)
  • Response time (P95)
  • Error rate

3 consecutive failures → eject → keep probing every N seconds → re-add when healthy.

Typical Scenarios

1 · Pure Self-Hosted Private

Internal Qwen / DeepSeek deployment, with 4 instances on Evose:

Deployment 1: qwen-max, 4 instances, Round Robin
Deployment 2: qwen-embedding, 2 instances, primary-backup Failover

2 · Hybrid Cloud

Some SaaS APIs + some self-hosted. SaaS goes through Interface platform; self-hosted on this page. Pick the org default in Default model configuration.

3 · Cross-Region Failover

Model GPT-4
├─ Interface platform OpenAI US → Round Robin 50%
├─ Interface platform Azure China → Round Robin 30%
└─ Self-hosted OpenAI-compatible → Failover backup 20%

Next Steps

On this page