Commit Graph

9 Commits

Author SHA1 Message Date
xiaoxiaohehe001
f265a26f8b support mtp rope_3d (#3791)
* support mtp rope_3d

* Update speculate_write_cache_with_rope_kernel.cu
2025-09-04 17:18:05 +08:00
Yuan Xiaolan
fa58a9fa8f qk norm for speculate decode C16 (#3637) 2025-09-03 14:53:56 +08:00
Liumengyuan
e93d4cfcdd Add with_output version AppendAttention (#3302)
* get use_output from fd_config

* add clear TODO description

* add mask_offset para to align with develop

* fix bug

* fix use_output logic

* fix sot bug
2025-08-28 17:10:18 +08:00
lzy
1e06b9fa6d make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset

* add unittest
2025-08-14 03:40:55 -07:00
Ryan
ed6bff215a fix custom op order rms_norm_eps (#3348) 2025-08-13 10:12:49 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
周周周
ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
周周周
aa76085d1f [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
2025-07-16 20:10:57 +08:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00