FastDeploy/docs/online_serving/scheduler.md

# Scheduler

FastDeploy currently supports two types of schedulers: **Local Scheduler** and **Global Scheduler**. The Global Scheduler is designed for large-scale clusters, enabling secondary load balancing across nodes based on real-time workload metrics.

## Scheduling Strategies

### Local Scheduler
The Local Scheduler functions similarly to a memory manager, performing eviction policies based on **task queue length** and **TTL** configurations.

### Global Scheduler
The Global Scheduler is implemented using Redis. Each node actively steals tasks from others when its GPU is idle, then pushes the execution results back to the originating node.

### PD-Separated Scheduler
Building upon the Global Scheduler, FastDeploy introduces the **PD-Separated Scheduling Strategy**, specifically optimized for large language model inference scenarios. It decouples the inference pipeline into two distinct phases:
- **Prefill Phase**: Builds KV cache, which is compute-intensive with high memory usage but low latency.
- **Decode Phase**: Performs autoregressive decoding, which is sequential and time-consuming but requires less memory.

By separating roles (prefill nodes handle request processing while decode nodes manage generation), this strategy enables finer-grained resource allocation, improving throughput and GPU utilization.

## Configuration Parameters
| Parameter Name                       | Type     | Required | Default   | Scope                  | Description                                                                 |
| ------------------------------------ | -------- | -------- | --------- | ---------------------- | --------------------------------------------------------------------------- |
| scheduler_name                       | str      | No       | local     | local,global,splitwise | Scheduler type: `local`, `global`, or `splitwise`                          |
| scheduler_max_size                   | int      | No       | -1        | local                  | Maximum task queue length                                                  |
| scheduler_ttl                        | int      | No       | 900       | local,global,splitwise | Maximum task time-to-live (seconds)                                        |
| scheduler_host                       | str      | No       | 127.0.0.1 | global,splitwise       | Redis server host                                                          |
| scheduler_port                       | int      | No       | 6379      | global,splitwise       | Redis server port                                                          |
| scheduler_db                         | int      | No       | 0         | global,splitwise       | Redis database index                                                       |
| scheduler_password                   | str      | No       | ""        | global,splitwise       | Redis access password                                                      |
| scheduler_topic                      | str      | No       | default   | global,splitwise       | Nodes under the same topic participate in task scheduling                  |
| scheduler_min_load_score             | float    | No       | 3         | global                 | Minimum load threshold for task stealing (idle nodes steal from busy ones) |
| scheduler_load_shards_num            | int      | No       | 1         | global                 | Number of shards for cluster load tracking                                 |
| scheduler_sync_period                | int      | No       | 5         | splitwise              | Node load synchronization interval (seconds)                               |
| scheduler_expire_period              | int      | No       | 3000      | splitwise              | Node heartbeat expiration time (seconds)                                   |
| scheduler_release_load_expire_period | int      | No       | 600       | splitwise              | Request expiration time for load release (seconds)                         |
| scheduler_reader_parallel            | int      | No       | 4         | splitwise              | Number of output reader threads                                            |
| scheduler_writer_parallel            | int      | No       | 4         | splitwise              | Number of writer threads                                                   |
| scheduler_reader_batch_size          | int      | No       | 200       | splitwise              | Batch size for fetching results from Redis                                 |
| scheduler_writer_batch_size          | int      | No       | 200       | splitwise              | Batch size for writing results to Redis                                    |