Files
FastDeploy/docs/features/multi-node_deployment.md
Jiang-Jia-Jun 3bdd54ef6e
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
Disable unsupported feature in multi-node deployment docs
2025-12-10 20:23:19 +08:00

2.9 KiB

简体中文

Multi-Node Deployment

Overview

Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.

Environment Preparation

Network Requirements

  1. All nodes must be within the same local network
  2. Ensure bidirectional connectivity between all nodes (test using ping and nc -zv)

Software Requirements

  1. Install the same version of FastDeploy on all nodes
  2. [Recommended] Install and configure MPI (OpenMPI or MPICH)

Tensor Parallel Deployment

We recommend using mpirun for one-command startup without manually starting each node.

Usage Instructions

  1. Execute the same command on all machines
  2. The IP order in the ips parameter determines the node startup sequence
  3. The first IP will be designated as the master node
  4. Ensure all nodes can resolve each other's hostnames
  • Online inference startup example:

    python -m fastdeploy.entrypoints.openai.api_server \
    --model baidu/ERNIE-4.5-300B-A47B-Paddle \
    --port 8180 \
    --metrics-port 8181 \
    --engine-worker-queue-port 8182 \
    --max-model-len 32768 \
    --max-num-seqs 32 \
    --tensor-parallel-size 16 \
    --graph-optimization-config '{"use_cudagraph":false}' \
    --no-enable-prefix-caching \
    --disable-custom-all-reduce \
    --ips 192.168.1.101,192.168.1.102
    

💡 Multi-node tensor parallel deployment currently does not support CUDA Graphs, Prefix Caching, or Custom AllReduce, and these features must be explicitly disabled in the deployment command.

  • Offline startup example:

    from fastdeploy.engine.sampling_params import SamplingParams
    from fastdeploy.entrypoints.llm import LLM
    
    model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"
    
    sampling_params = SamplingParams(temperature=0.1, max_tokens=30)
    llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")
    if llm._check_master():
        output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)
        print(output)
    
  • Notes:

  • Only the master node can receive completion requests

  • Always send requests to the master node (the first IP in the ips list)

  • The master node will distribute workloads across all nodes

Parameter Description

ips Parameter

  • Type: string
  • Format: Comma-separated IPv4 addresses
  • Description: Specifies the IP addresses of all nodes in the deployment group
  • Required: Only for multi-node deployments
  • Example: "192.168.1.101,192.168.1.102,192.168.1.103"

tensor_parallel_size Parameter

  • Type: integer
  • Description: Total number of GPUs across all nodes
  • Required: Yes
  • Example: For 2 nodes with 8 GPUs each, set to 16