mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-05 00:33:03 +08:00
86 lines
2.4 KiB
Markdown
86 lines
2.4 KiB
Markdown
# Global Scheduler: Multi-Instance Load Balancing
|
||
|
||
## Design Overview
|
||
Cluster nodes autonomously pull tasks from other nodes during idle periods based on real-time workload, then push execution results back to the originating node.
|
||
|
||
### What Problem Does the Global Scheduler Solve?
|
||

|
||
|
||
Standard load balancing distributes requests using round-robin strategy, ensuring equal request counts per inference instance.
|
||
|
||
In LLM scenarios, request processing time varies significantly based on input/output token counts. Even with equal request distribution, inference completion times differ substantially across instances.
|
||
|
||
The global scheduler solves this imbalance through cluster-level optimization.
|
||
|
||
## How the Global Scheduler Works
|
||
|
||

|
||
|
||
As shown:
|
||
* Nodes 1, 2, and n each receive 3 requests
|
||
* At time T:
|
||
* Node 1 completes all tasks
|
||
* Nodes 2 and n process their second requests (1 request remaining each)
|
||
* Node 1 steals a task from Node 2, processes it, and pushes results to Node 2’s response queue
|
||
|
||
This secondary load balancing strategy will
|
||
✅ Maximizes cluster resource utilization
|
||
✅ Reduces Time-To-First-Token (TTFT)
|
||
|
||
# How to use Global Scheduler
|
||
|
||
## Prerequisite: Redis Setup
|
||
|
||
### conda installation
|
||
|
||
```bash
|
||
conda install redis
|
||
# Launch
|
||
nohup redis-server > redis.log 2>&1 &
|
||
```
|
||
|
||
### apt installation (Debian/Ubuntu)
|
||
|
||
```bash
|
||
# Install
|
||
sudo apt install redis-server -y
|
||
# Launch
|
||
sudo systemctl start redis-server
|
||
```
|
||
|
||
### yum installation (RHEL/CentOS)
|
||
|
||
```bash
|
||
# Install
|
||
sudo yum install redis -y
|
||
# Launch
|
||
sudo systemctl start redis
|
||
```
|
||
|
||
## Launching FastDeploy
|
||
|
||
```bash
|
||
python -m fastdeploy.entrypoints.openai.api_server \
|
||
--port 8801 \
|
||
--metrics-port 8802 \
|
||
--engine-worker-queue-port 8803 \
|
||
--model baidu/ERNIE-4.5-0.3B-Paddle \
|
||
--scheduler-name global \
|
||
--scheduler-ttl 900 \
|
||
--scheduler-host "127.0.0.1" \
|
||
--scheduler-port 6379 \
|
||
--scheduler-db 0 \
|
||
--scheduler-password "" \
|
||
--scheduler-topic "default" \
|
||
--scheduler-min-load-score 3 \
|
||
--scheduler-load-shards-num 1
|
||
```
|
||
|
||
[Scheduler Launching Parameter](../online_serving/scheduler.md)
|
||
|
||
### Deployment notes:
|
||
|
||
1. Execute this command on multiple machines to create inference instances
|
||
2. For single-machine multi-instance deployments: ensure port uniqueness
|
||
3. Use Nginx for external load balancing alongside the global scheduler’s internal balancing
|