Modified to support custom all reduce by default (#3538)

This commit is contained in:
zhink
2025-08-22 16:59:05 +08:00
committed by GitHub
parent 27666ee586
commit df7c31012b
15 changed files with 18 additions and 30 deletions

View File

@@ -18,7 +18,7 @@ FastDeploy's `GraphOptimizationBackend` design architecture is as follows, **som
## 1. GraphOptimizationBackend Current usage restrictions
In the CUDAGraph multi-device inference task, you need to use the Custom all-reduce operator to perform multi-card all-reduce.
Before version 2.2, neither the CUDAGraph nor the Custom all-reduce operators were enabled by default. You need to add `--enable-custom-all-reduce` to the startup command to manually enable it.
Before version 2.2, the CUDAGraph was not enabled by default. the Custom all-reduce operators was enabled by default.
### 1.1 The multi-device scene needs to be enabled Custom all-reduce
The `FLAGS_max_partition_size` environment variable controls the `gridDim` execution configuration of Kernel in CascadeAppend Attention, and dynamic execution configuration will cause CUDAGraph execution to fail.