[Feature][MTP]Support MTP for rl-model (#4009)

* qk norm for speculate decode C16 * support mtp in v1_scheduler mode * support mtp rope_3d * support mtp features * add unit test && del some log --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: xiaoxiaohehe001 <hiteezsf@163.com>
2025-10-05 16:48:03 +08:00 · 2025-09-10 13:34:37 +08:00
parent cce2410fad
commit 2f473ba966
21 changed files with 1465 additions and 531 deletions
--- a/fastdeploy/output/token_processor.py
+++ b/fastdeploy/output/token_processor.py
@@ -261,7 +261,7 @@ class TokenProcessor:

    def _compute_speculative_status(self):
        # TODO(liuzichang): Supplement more statistics
-        interval = 10
+        interval = 1
        if self.speculative_stats_step % interval == 0:
            accept_ratio = 1 - self.total_step * 1.0 / self.number_of_output_tokens
            spec_logger.info(