mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script

2025-12-17 15:50:42 +08:00

README.md

[PD Disaggregation] [Refine] Refine splitwise deployment (#5151 )

2025-11-21 15:30:24 +08:00

start_mixed.sh

[PD Disaggregation] [Refine] Refine splitwise deployment (#5151 )

2025-11-21 15:30:24 +08:00

start_v0_tp1.sh

[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

2025-12-17 15:50:42 +08:00

start_v1_dp2.sh

[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

2025-12-17 15:50:42 +08:00

start_v1_tp1.sh

[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

2025-12-17 15:50:42 +08:00

start_v1_tp2.sh

[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

2025-12-17 15:50:42 +08:00

stop.sh

[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 )

2025-12-02 14:11:50 +08:00

test.sh

[Feature] [PD] add simple router and refine splitwise deployment (#4709 )

2025-11-06 14:56:02 +08:00

utils.sh

[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 )

2025-12-17 15:50:42 +08:00

README.md

Run the Examples on NVIDIA CUDA GPU

Prepare the Environment

Refer to NVIDIA CUDA GPU Installation to pull the docker image, such as:

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0

In the docker container, the NVIDIA MLNX_OFED and Redis are pre-installed.

Build and install FastDeploy

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

export ENABLE_FD_RDMA=1

# Argument 1: Whether to build wheel package (1 for yes, 0 for compile only)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators
# Argument 4: Target GPU architectures
bash build.sh 1 python false [80,90]

Run the Examples

Run the shell scripts in this directory, bash start_v0_tp1.sh or bash start_v1_tp1.sh

Note that, there are two methods for splitwise deployment:

v0: using splitwise_scheduler or dp_scheduler, in which the requests are scheduled in the engine.
v1: using router, in which the requests are scheduled in the router.

Run the Examples on Kunlunxin XPU

Coming soon...