[Doc] Add multinode deployment documents (#3417)

* Create multi-node_deployment.md * Create multi-node_deployment.md * Update mkdocs.yml
2025-12-24 13:28:13 +08:00 · 2025-08-15 10:37:04 +08:00
parent f0f00a6025
commit 5a84324798
3 changed files with 146 additions and 0 deletions
--- a/docs/features/multi-node_deployment.md
+++ b/docs/features/multi-node_deployment.md
@@ -0,0 +1,71 @@
+# Multi-Node Deployment
+
+## Overview  
+Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.
+
+## Environment Preparation  
+#### Network Requirements  
+1. All nodes must be within the same local network  
+2. Ensure bidirectional connectivity between all nodes (test using `ping` and `nc -zv`)  
+
+#### Software Requirements  
+1. Install the same version of FastDeploy on all nodes  
+2. [Recommended] Install and configure MPI (OpenMPI or MPICH)  
+
+## Tensor Parallel Deployment  
+
+### Recommended Launch Method  
+We recommend using mpirun for one-command startup without manually starting each node.  
+
+### Usage Instructions  
+1. Execute the same command on all machines  
+2. The IP order in the `ips` parameter determines the node startup sequence  
+3. The first IP will be designated as the master node  
+4. Ensure all nodes can resolve each other's hostnames  
+
+* Online inference startup example:  
+    ```shell  
+    python -m fastdeploy.entrypoints.openai.api_server \  
+    --model baidu/ERNIE-4.5-300B-A47B-Paddle \  
+    --port 8180 \  
+    --metrics-port 8181 \  
+    --engine-worker-queue-port 8182 \  
+    --max-model-len 32768 \  
+    --max-num-seqs 32 \  
+    --tensor-parallel-size 16 \  
+    --ips 192.168.1.101,192.168.1.102  
+    ```  
+
+* Offline startup example:  
+    ```python  
+    from fastdeploy.engine.sampling_params import SamplingParams  
+    from fastdeploy.entrypoints.llm import LLM  
+      
+    model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"  
+      
+    sampling_params = SamplingParams(temperature=0.1, max_tokens=30)  
+    llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")  
+    if llm._check_master():  
+        output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)  
+        print(output)  
+    ```  
+
+* Notes:  
+- Only the master node can receive completion requests  
+- Always send requests to the master node (the first IP in the ips list)  
+- The master node will distribute workloads across all nodes  
+
+### Parameter Description  
+
+#### `ips` Parameter  
+- **Type**: `string`  
+- **Format**: Comma-separated IPv4 addresses  
+- **Description**: Specifies the IP addresses of all nodes in the deployment group  
+- **Required**: Only for multi-node deployments  
+- **Example**: `"192.168.1.101,192.168.1.102,192.168.1.103"`  
+
+#### `tensor_parallel_size` Parameter  
+- **Type**: `integer`  
+- **Description**: Total number of GPUs across all nodes  
+- **Required**: Yes  
+- **Example**: For 2 nodes with 8 GPUs each, set to 16