diff --git a/docs/features/multi-node_deployment.md b/docs/features/multi-node_deployment.md index ca1ee94f5..cb7daaaec 100644 --- a/docs/features/multi-node_deployment.md +++ b/docs/features/multi-node_deployment.md @@ -36,9 +36,14 @@ We recommend using mpirun for one-command startup without manually starting each --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ + --graph-optimization-config '{"use_cudagraph":false}' \ + --no-enable-prefix-caching \ + --disable-custom-all-reduce \ --ips 192.168.1.101,192.168.1.102 ``` +> :bulb: Multi-node tensor parallel deployment currently does not support CUDA Graphs, Prefix Caching, or Custom AllReduce, and these features must be explicitly disabled in the deployment command. + * Offline startup example: ```python diff --git a/docs/zh/features/multi-node_deployment.md b/docs/zh/features/multi-node_deployment.md index 7789f588f..a712c7c1f 100644 --- a/docs/zh/features/multi-node_deployment.md +++ b/docs/zh/features/multi-node_deployment.md @@ -36,9 +36,14 @@ --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ + --graph-optimization-config '{"use_cudagraph":false}' \ + --no-enable-prefix-caching \ + --disable-custom-all-reduce \ --ips 192.168.1.101,192.168.1.102 ``` +> :bulb: 多机张量并行部署暂不支持CUDAGraphs、PrefixCaching与CustomAllReduce,需在部署命令中显示关闭。 + * 离线启动示例: ```python