Files
FastDeploy/examples/splitwise
Yonghua Li f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 15:38:43 +08:00
..

Run the Examples on NVIDIA CUDA GPU

Prepare the Environment

Refer to NVIDIA CUDA GPU Installation to pull the docker image, such as:

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0

In the docker container, the NVIDIA MLNX_OFED and Redis are pre-installed.

Build and install FastDeploy

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

export ENABLE_FD_RDMA=1

# Argument 1: Whether to build wheel package (1 for yes, 0 for compile only)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators
# Argument 4: Target GPU architectures
bash build.sh 1 python false [80,90]

Run the Examples

Run the shell scripts in this directory, bash start_v0_tp1.sh or bash start_v1_tp1.sh

Note that, there are two methods for splitwise deployment:

  • v0: using splitwise_scheduler or dp_scheduler, in which the requests are scheduled in the engine.
  • v1: using router, in which the requests are scheduled in the router.

Run the Examples on Kunlunxin XPU

Coming soon...