Modified to support custom all reduce by default (#3538)

2025-12-24 13:28:13 +08:00 · 2025-08-22 16:59:05 +08:00
parent 27666ee586
commit df7c31012b
15 changed files with 18 additions and 30 deletions
--- a/docs/features/graph_optimization.md
+++ b/docs/features/graph_optimization.md
@@ -18,7 +18,7 @@ FastDeploy's `GraphOptimizationBackend` design architecture is as follows, **som
 ## 1. GraphOptimizationBackend Current usage restrictions
 In the CUDAGraph multi-device inference task, you need to use the Custom all-reduce operator to perform multi-card all-reduce.

-Before version 2.2, neither the CUDAGraph nor the Custom all-reduce operators were enabled by default. You need to add `--enable-custom-all-reduce` to the startup command to manually enable it.
+Before version 2.2, the CUDAGraph was not enabled by default. the Custom all-reduce operators was enabled by default.

 ### 1.1 The multi-device scene needs to be enabled Custom all-reduce
 The `FLAGS_max_partition_size` environment variable controls the `gridDim` execution configuration of Kernel in CascadeAppend Attention, and dynamic execution configuration will cause CUDAGraph execution to fail.