[Feature][MTP]Support MTP for rl-model (#4009)

* qk norm for speculate decode C16 * support mtp in v1_scheduler mode * support mtp rope_3d * support mtp features * add unit test && del some log --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: xiaoxiaohehe001 <hiteezsf@163.com>
2025-09-27 04:46:16 +08:00 · 2025-09-10 13:34:37 +08:00
parent cce2410fad
commit 2f473ba966
21 changed files with 1465 additions and 531 deletions
--- a/fastdeploy/config.py
+++ b/fastdeploy/config.py
@@ -889,7 +889,7 @@ class CacheConfig:
        else:
            self.kv_cache_ratio = 0.75
        self.enc_dec_block_num = 0 if current_platform.is_iluvatar() else 2
-        self.prealloc_dec_block_slot_num_threshold = 5
+        self.prealloc_dec_block_slot_num_threshold = 12
        self.cache_dtype = "bfloat16"
        self.model_cfg = None
        self.enable_chunked_prefill = False