Sync v2.0 version of code to github repo

This commit is contained in:
Jiang-Jia-Jun
2025-06-29 23:29:37 +00:00
parent d151496038
commit 92c2cfa2e7
597 changed files with 78776 additions and 22905 deletions

View File

@@ -0,0 +1,81 @@
# Global Scheduler: Multi-Instance Load Balancing
## Design Overview
Cluster nodes autonomously pull tasks from other nodes during idle periods based on real-time workload, then push execution results back to the originating node.
### What Problem Does the Global Scheduler Solve?
![Local Scheduler](images/LocalScheduler.png)
Standard load balancing distributes requests using round-robin strategy, ensuring equal request counts per inference instance.
In LLM scenarios, request processing time varies significantly based on input/output token counts. Even with equal request distribution, inference completion times differ substantially across instances.
The global scheduler solves this imbalance through cluster-level optimization.
## How the Global Scheduler Works
![Global Scheduler](images/GlobalScheduler.png)
As shown:
* Nodes 1, 2, and n each receive 3 requests
* At time T:
* Node 1 completes all tasks
* Nodes 2 and n process their second requests (1 request remaining each)
* Node 1 steals a task from Node 2, processes it, and pushes results to Node 2s response queue
This secondary load balancing strategy will
✅ Maximizes cluster resource utilization
✅ Reduces Time-To-First-Token (TTFT)
# How to use Global Scheduler
## Prerequisite: Redis Setup
### conda installation
```bash
conda install redis
# Launch
nohup redis-server > redis.log 2>&1 &
```
### apt installation (Debian/Ubuntu)
```bash
# Install
sudo apt install redis-server -y
# Launch
sudo systemctl start redis-server
```
### yum installation (RHEL/CentOS)
```bash
# Install
sudo yum install redis -y
# Launch
sudo systemctl start redis
```
## Launching FastDeploy
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--port 8801 \
--metrics-port 8802 \
--engine-worker-queue-port 8803 \
--scheduler-name global \
--scheduler-ttl 900 \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-db 0 \
--scheduler-password "" \
--scheduler-topic "default" \
--scheduler-min-load_score 3 \
--scheduler-load-shards-num 1
```
[Scheduler Launching Parameter](../online_serving/scheduler.md)
### Deployment notes:
1. Execute this command on multiple machines to create inference instances
2. For single-machine multi-instance deployments: ensure port uniqueness
3. Use Nginx for external load balancing alongside the global schedulers internal balancing