apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

Jiang-Jia-Jun 3bdd54ef6e

CE Compile Job / ce_job_pre_check (push) Has been cancelled

Details

CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled

Details

CE Compile Job / FD-Clone-Linux (push) Has been cancelled

Details

CE Compile Job / Show Code Archive Output (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8090 (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8689 (push) Has been cancelled

Details

CE Compile Job / CE_UPLOAD (push) Has been cancelled

Details

Deploy GitHub Pages / deploy (push) Has been cancelled

Details

Publish Job / publish_pre_check (push) Has been cancelled

Details

Publish Job / print_publish_pre_check_outputs (push) Has been cancelled

Details

Publish Job / FD-Clone-Linux (push) Has been cancelled

Details

Publish Job / Show Code Archive Output (push) Has been cancelled

Details

Publish Job / BUILD_SM8090 (push) Has been cancelled

Details

Publish Job / BUILD_SM8689 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled

Details

Publish Job / Run FD Image Build (push) Has been cancelled

Details

Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

Publish Job / Run Base Tests (push) Has been cancelled

Details

Publish Job / Run Accuracy Tests (push) Has been cancelled

Details

Publish Job / Run Stable Tests (push) Has been cancelled

Details

CI Images Build / FD-Clone-Linux (push) Has been cancelled

Details

CI Images Build / Show Code Archive Output (push) Has been cancelled

Details

CI Images Build / CI Images Build (push) Has been cancelled

Details

CI Images Build / BUILD_SM8090 (push) Has been cancelled

Details

CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

CI Images Build / Run Base Tests (push) Has been cancelled

Details

CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled

Details

Disable unsupported feature in multi-node deployment docs

2025-12-10 20:23:19 +08:00

2.9 KiB

Raw Blame History

简体中文

Multi-Node Deployment

Overview

Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.

Environment Preparation

Network Requirements

All nodes must be within the same local network
Ensure bidirectional connectivity between all nodes (test using ping and nc -zv)

Software Requirements

Install the same version of FastDeploy on all nodes
[Recommended] Install and configure MPI (OpenMPI or MPICH)

Tensor Parallel Deployment

Recommended Launch Method

We recommend using mpirun for one-command startup without manually starting each node.

Usage Instructions

Execute the same command on all machines
The IP order in the ips parameter determines the node startup sequence
The first IP will be designated as the master node
Ensure all nodes can resolve each other's hostnames

Online inference startup example:

python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-300B-A47B-Paddle \
--port 8180 \
--metrics-port 8181 \
--engine-worker-queue-port 8182 \
--max-model-len 32768 \
--max-num-seqs 32 \
--tensor-parallel-size 16 \
--graph-optimization-config '{"use_cudagraph":false}' \
--no-enable-prefix-caching \
--disable-custom-all-reduce \
--ips 192.168.1.101,192.168.1.102

💡 Multi-node tensor parallel deployment currently does not support CUDA Graphs, Prefix Caching, or Custom AllReduce, and these features must be explicitly disabled in the deployment command.

Offline startup example:

from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"

sampling_params = SamplingParams(temperature=0.1, max_tokens=30)
llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")
if llm._check_master():
    output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)
    print(output)

Notes:
Only the master node can receive completion requests
Always send requests to the master node (the first IP in the ips list)
The master node will distribute workloads across all nodes

Parameter Description

`ips` Parameter

Type: string
Format: Comma-separated IPv4 addresses
Description: Specifies the IP addresses of all nodes in the deployment group
Required: Only for multi-node deployments
Example: "192.168.1.101,192.168.1.102,192.168.1.103"

`tensor_parallel_size` Parameter

Type: integer
Description: Total number of GPUs across all nodes
Required: Yes
Example: For 2 nodes with 8 GPUs each, set to 16

2.9 KiB Raw Blame History

Multi-Node Deployment

Overview

Environment Preparation

Network Requirements

Software Requirements

Tensor Parallel Deployment

Recommended Launch Method

Usage Instructions

Parameter Description

ips Parameter

tensor_parallel_size Parameter

2.9 KiB

Raw Blame History

`ips` Parameter

`tensor_parallel_size` Parameter