mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
Modified to support custom all reduce by default (#3538)
This commit is contained in:
@@ -77,8 +77,7 @@ Add the following lines to the startup parameters
|
||||
```
|
||||
Notes:
|
||||
1. Usually, no additional parameters need to be set, but CUDAGraph will generate some additional memory overhead, which may need to be adjusted in some scenarios with limited memory. For detailed parameter adjustments, please refer to [GraphOptimizationBackend](../features/graph_optimization.md) for related configuration parameter descriptions
|
||||
2. When CUDAGraph is enabled, if running with multi-GPUs TP>1, `--enable-custom-all-reduce` must be specified at the same time.
|
||||
3. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
2. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
|
||||
#### 2.2.6 Rejection Sampling
|
||||
**Idea:**
|
||||
|
||||
@@ -87,8 +87,7 @@ Add the following lines to the startup parameters
|
||||
```
|
||||
Notes:
|
||||
1. Usually, no additional parameters need to be set, but CUDAGraph will generate some additional memory overhead, which may need to be adjusted in some scenarios with limited memory. For detailed parameter adjustments, please refer to [GraphOptimizationBackend](../features/graph_optimization.md) for related configuration parameter descriptions
|
||||
2. When CUDAGraph is enabled, if running with multi-GPUs TP>1, `--enable-custom-all-reduce` must be specified at the same time.
|
||||
3. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
2. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
|
||||
#### 2.2.6 Rejection Sampling
|
||||
**Idea:**
|
||||
|
||||
@@ -132,12 +132,10 @@ CUDAGraph is a GPU computing acceleration technology provided by NVIDIA. It achi
|
||||
Add the following lines to the startup parameters
|
||||
```
|
||||
--use-cudagraph
|
||||
--enable-custom-all-reduce
|
||||
```
|
||||
Notes:
|
||||
1. Usually, no additional parameters need to be set, but CUDAGraph will generate some additional memory overhead, which may need to be adjusted in some scenarios with limited memory. For detailed parameter adjustments, please refer to [GraphOptimizationBackend](../features/graph_optimization.md) for related configuration parameter descriptions
|
||||
2. When CUDAGraph is enabled, if running with multi-GPUs TP>1, `--enable-custom-all-reduce` must be specified at the same time.
|
||||
3. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
2. When CUDAGraph is enabled, the scenario of `max-model-len > 32768` is not currently supported.
|
||||
|
||||
## FAQ
|
||||
If you encounter any problems during use, you can refer to [FAQ](./FAQ.md).
|
||||
|
||||
Reference in New Issue
Block a user