[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP (#4430)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled

* Fix MTP dummy run bug

* Target Model and Draft Model using the same flag

* aovid moe bug in cudagraph padding

* In mtp replace use_cudagraph as step_use_cudagraph
This commit is contained in:
RAM
2025-10-17 14:22:05 +08:00
committed by GitHub
parent cfd93c0966
commit 920df5be5a
3 changed files with 29 additions and 33 deletions

View File

@@ -1253,7 +1253,9 @@ class GPUModelRunner(ModelRunnerBase):
if self.speculative_decoding:
if self.speculative_method == "mtp":
self.proposer.run(full_hidden_states=model_output)
self.proposer.run(
full_hidden_states=model_output, step_use_cudagraph=self.forward_meta.step_use_cudagraph
)
else:
self.proposer.run(share_inputs=self.share_inputs)
@@ -1600,7 +1602,9 @@ class GPUModelRunner(ModelRunnerBase):
# 6. Speculative decode
if self.speculative_decoding:
if self.speculative_method == "mtp":
self.proposer.run(full_hidden_states=model_output)
self.proposer.run(
full_hidden_states=model_output, step_use_cudagraph=self.forward_meta.step_use_cudagraph
)
else:
self.proposer.run(share_inputs=self.share_inputs)