[TSP] Support qwen3 moe tsp + cudagraph (#4871)

* support qwen3_moe tsp mode * fix * fix * update * update * update * fix * support external_rmsnorm * update * fix
2025-12-24 13:28:13 +08:00 · 2025-11-10 23:37:51 +08:00
parent fb2eb403ab
commit 3dc0ffa46d
28 changed files with 173 additions and 273 deletions
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -40,6 +40,8 @@ When using FastDeploy to deploy models (including offline inference and service
 | ```use_cudagraph```                | `bool`      | __[DEPRECATED]__ CUDAGraph is enabled by default since version 2.3. It is recommended to read [graph_optimization.md](./features/graph_optimization.md) carefully before opening. |
 | ```graph_optimization_config```    | `dict[str]`       | Can configure parameters related to calculation graph optimization, the default value is'{"use_cudagraph":true, "graph_opt_level":0}'，Detailed description reference [graph_optimization.md](./features/graph_optimization.md)|
 | ```disable_custom_all_reduce``` | `bool` | Disable Custom all-reduce, default: False |
+| ```use_internode_ll_two_stage``` | `bool` | Use two stage communication in deepep moe, default: False |
+| ```disable_sequence_parallel_moe``` | `bool` | Disable sequence parallel moe, default: False |
 | ```splitwise_role``` | `str` | Whether to enable splitwise inference, default value: mixed, supported parameters: ["mixed", "decode", "prefill"] |
 | ```innode_prefill_ports``` | `str` | Internal engine startup ports for prefill instances (only required for single-machine PD separation), default: None |
 | ```guided_decoding_backend``` | `str` | Specify the guided decoding backend to use, supports `auto`, `xgrammar`, `off`, default: `off` |