Sync v2.0 version of code to github repo

2025-12-24 13:28:13 +08:00 · 2025-06-29 23:29:37 +00:00
parent d151496038
commit 92c2cfa2e7
597 changed files with 78776 additions and 22905 deletions
--- a/docs/features/load_balance.md
+++ b/docs/features/load_balance.md
@@ -0,0 +1,81 @@
+# Global Scheduler: Multi-Instance Load Balancing
+
+## Design Overview
+Cluster nodes autonomously pull tasks from other nodes during idle periods based on real-time workload, then push execution results back to the originating node.
+
+### What Problem Does the Global Scheduler Solve?
+![Local Scheduler](images/LocalScheduler.png)
+
+Standard load balancing distributes requests using round-robin strategy, ensuring equal request counts per inference instance.
+
+In LLM scenarios, request processing time varies significantly based on input/output token counts. Even with equal request distribution, inference completion times differ substantially across instances.
+
+The global scheduler solves this imbalance through cluster-level optimization.
+
+## How the Global Scheduler Works
+
+![Global Scheduler](images/GlobalScheduler.png)
+
+As shown:
+* Nodes 1, 2, and n each receive 3 requests
+* At time T:
+  * Node 1 completes all tasks
+  * Nodes 2 and n process their second requests (1 request remaining each)
+* Node 1 steals a task from Node 2, processes it, and pushes results to Node 2’s response queue
+
+This secondary load balancing strategy will
+✅ Maximizes cluster resource utilization
+✅ Reduces Time-To-First-Token (TTFT)
+
+# How to use Global Scheduler
+
+## Prerequisite: Redis Setup
+
+### conda installation
+
+```bash
+conda install redis
+# Launch
+nohup redis-server > redis.log 2>&1 &
+```
+### apt installation (Debian/Ubuntu)
+
+```bash
+# Install
+sudo apt install redis-server -y
+# Launch
+sudo systemctl start redis-server
+```
+
+### yum installation (RHEL/CentOS)
+
+```bash
+# Install
+sudo yum install redis -y
+# Launch
+sudo systemctl start redis
+```
+
+## Launching FastDeploy
+```bash
+python -m fastdeploy.entrypoints.openai.api_server \
+       --port 8801 \
+       --metrics-port 8802 \
+       --engine-worker-queue-port 8803 \
+       --scheduler-name global \
+       --scheduler-ttl 900 \
+       --scheduler-host "127.0.0.1" \
+       --scheduler-port 6379 \
+       --scheduler-db 0 \
+       --scheduler-password "" \
+       --scheduler-topic "default" \
+       --scheduler-min-load_score 3 \
+       --scheduler-load-shards-num 1
+```
+[Scheduler Launching Parameter](../online_serving/scheduler.md)
+
+### Deployment notes:
+
+1. Execute this command on multiple machines to create inference instances
+2. For single-machine multi-instance deployments: ensure port uniqueness
+3. Use Nginx for external load balancing alongside the global scheduler’s internal balancing