Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled

* fix mtp eh_proj layer

* fix mtp update_cfg function

* fix stringdoc

* simplify class name
This commit is contained in:
GoldPancake
2025-07-04 14:15:04 +08:00
committed by GitHub
parent a5ae88ded9
commit e7fa57ebae
3 changed files with 136 additions and 4 deletions

View File

@@ -68,8 +68,7 @@ class MTPProposer(Proposer):
"""
Update config for MTP from global config
"""
self.model_config.architectures[0] = self.model_config.architectures[
0].replace("MoeForCausalLM", "MTPForCausalLM")
self.model_config.architectures[0] = "Ernie4_5_MTPForCausalLM"
self.speculative_config.sharing_model = main_model
self.model_config.num_layers = 1
self.parallel_config.model_name_or_path = (