[Speculative Decoding][MTP]Support attn mask offset (#4641)

* [MTP]Merge support attn (#4591) * support mask_offset in speculate decoding * fix dummpy run output * add unit test * fix unit test import * support attn_mask_offset in mtp mode * add update_attn_mask op * fix unit test && fix code-style
2025-12-24 13:28:13 +08:00 · 2025-11-03 10:08:01 +08:00
parent f44f4bafd1
commit 11398790d3
13 changed files with 638 additions and 111 deletions
--- a/custom_ops/setup_ops.py
+++ b/custom_ops/setup_ops.py
@@ -305,6 +305,7 @@ elif paddle.is_compiled_with_cuda():
        "gpu_ops/merge_prefill_decode_output.cu",
        "gpu_ops/limit_thinking_content_length_v1.cu",
        "gpu_ops/limit_thinking_content_length_v2.cu",
+        "gpu_ops/update_attn_mask_offsets.cu",
    ]

    # pd_disaggregation