Modified to support custom all reduce by default (#3538)

2025-12-24 13:28:13 +08:00 · 2025-08-22 16:59:05 +08:00
parent 27666ee586
commit df7c31012b
15 changed files with 18 additions and 30 deletions
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -37,7 +37,7 @@ When using FastDeploy to deploy models (including offline inference and service
 | ```reasoning_parser``` | `str` | Specify the reasoning parser to extract reasoning content from model output |
 | ```use_cudagraph```                | `bool`      | Whether to use cuda graph, default False. It is recommended to read [graph_optimization.md](./features/graph_optimization.md) carefully before opening. Custom all-reduce needs to be enabled at the same time in multi-card scenarios. |
 | ```graph_optimization_config```    | `dict[str]`       | Can configure parameters related to calculation graph optimization, the default value is'{"use_cudagraph":false, "graph_opt_level":0, "cudagraph_capture_sizes": null }'，Detailed description reference [graph_optimization.md](./features/graph_optimization.md)|
-| ```enable_custom_all_reduce``` | `bool` | Enable Custom all-reduce, default: False |
+| ```disable_custom_all_reduce``` | `bool` | Disable Custom all-reduce, default: False |
 | ```splitwise_role``` | `str` | Whether to enable splitwise inference, default value: mixed, supported parameters: ["mixed", "decode", "prefill"] |
 | ```innode_prefill_ports``` | `str` | Internal engine startup ports for prefill instances (only required for single-machine PD separation), default: None |
 | ```guided_decoding_backend``` | `str` | Specify the guided decoding backend to use, supports `auto`, `xgrammar`, `off`, default: `off` |