* qk norm for speculate decode C16
* support mtp in v1_scheduler mode
* support mtp rope_3d
* support mtp features
* add unit test && del some log
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: xiaoxiaohehe001 <hiteezsf@163.com>
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug