mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script
Run the Examples on NVIDIA CUDA GPU
Prepare the Environment
Refer to NVIDIA CUDA GPU Installation to pull the docker image, such as:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0
In the docker container, the NVIDIA MLNX_OFED and Redis are pre-installed.
Build and install FastDeploy
git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy
export ENABLE_FD_RDMA=1
# Argument 1: Whether to build wheel package (1 for yes, 0 for compile only)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators
# Argument 4: Target GPU architectures
bash build.sh 1 python false [80,90]
Run the Examples
Run the shell scripts in this directory, bash start_v0_tp1.sh or bash start_v1_tp1.sh
Note that, there are two methods for splitwise deployment:
- v0: using splitwise_scheduler or dp_scheduler, in which the requests are scheduled in the engine.
- v1: using router, in which the requests are scheduled in the router.
Run the Examples on Kunlunxin XPU
Coming soon...