mirror of
				https://github.com/PaddlePaddle/FastDeploy.git
				synced 2025-10-31 11:56:44 +08:00 
			
		
		
		
	
		
			
				
	
	
	
		
			2.6 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			2.6 KiB
		
	
	
	
	
	
	
	
Multi-Node Deployment
Overview
Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.
Environment Preparation
Network Requirements
- All nodes must be within the same local network
- Ensure bidirectional connectivity between all nodes (test using pingandnc -zv)
Software Requirements
- Install the same version of FastDeploy on all nodes
- [Recommended] Install and configure MPI (OpenMPI or MPICH)
Tensor Parallel Deployment
Recommended Launch Method
We recommend using mpirun for one-command startup without manually starting each node.
Usage Instructions
- Execute the same command on all machines
- The IP order in the ipsparameter determines the node startup sequence
- The first IP will be designated as the master node
- Ensure all nodes can resolve each other's hostnames
- 
Online inference startup example: python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-300B-A47B-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \ --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ --ips 192.168.1.101,192.168.1.102
- 
Offline startup example: from fastdeploy.engine.sampling_params import SamplingParams from fastdeploy.entrypoints.llm import LLM model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle" sampling_params = SamplingParams(temperature=0.1, max_tokens=30) llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102") if llm._check_master(): output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params) print(output)
- 
Notes: 
- Only the master node can receive completion requests
- Always send requests to the master node (the first IP in the ips list)
- The master node will distribute workloads across all nodes
Parameter Description
ips Parameter
- Type: string
- Format: Comma-separated IPv4 addresses
- Description: Specifies the IP addresses of all nodes in the deployment group
- Required: Only for multi-node deployments
- Example: "192.168.1.101,192.168.1.102,192.168.1.103"
tensor_parallel_size Parameter
- Type: integer
- Description: Total number of GPUs across all nodes
- Required: Yes
- Example: For 2 nodes with 8 GPUs each, set to 16
