mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-31 11:56:44 +08:00

Files

ltd0924 5a84324798 [Doc] Add multinode deployment documents (#3417 )

* Create multi-node_deployment.md

* Create multi-node_deployment.md

* Update mkdocs.yml

2025-08-15 10:37:04 +08:00

2.6 KiB

Raw Blame History

Multi-Node Deployment

Overview

Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.

Environment Preparation

Network Requirements

All nodes must be within the same local network
Ensure bidirectional connectivity between all nodes (test using ping and nc -zv)

Software Requirements

Install the same version of FastDeploy on all nodes
[Recommended] Install and configure MPI (OpenMPI or MPICH)

Tensor Parallel Deployment

Recommended Launch Method

We recommend using mpirun for one-command startup without manually starting each node.

Usage Instructions

Execute the same command on all machines
The IP order in the ips parameter determines the node startup sequence
The first IP will be designated as the master node
Ensure all nodes can resolve each other's hostnames

Online inference startup example:

python -m fastdeploy.entrypoints.openai.api_server \  
--model baidu/ERNIE-4.5-300B-A47B-Paddle \  
--port 8180 \  
--metrics-port 8181 \  
--engine-worker-queue-port 8182 \  
--max-model-len 32768 \  
--max-num-seqs 32 \  
--tensor-parallel-size 16 \  
--ips 192.168.1.101,192.168.1.102

Offline startup example:

from fastdeploy.engine.sampling_params import SamplingParams  
from fastdeploy.entrypoints.llm import LLM  

model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"  

sampling_params = SamplingParams(temperature=0.1, max_tokens=30)  
llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")  
if llm._check_master():  
    output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)  
    print(output)

Notes:

Only the master node can receive completion requests
Always send requests to the master node (the first IP in the ips list)
The master node will distribute workloads across all nodes

Parameter Description

`ips` Parameter

Type: string
Format: Comma-separated IPv4 addresses
Description: Specifies the IP addresses of all nodes in the deployment group
Required: Only for multi-node deployments
Example: "192.168.1.101,192.168.1.102,192.168.1.103"

`tensor_parallel_size` Parameter

Type: integer
Description: Total number of GPUs across all nodes
Required: Yes
Example: For 2 nodes with 8 GPUs each, set to 16

2.6 KiB Raw Blame History