Commit Graph

7 Commits

Author SHA1 Message Date
Yuan Xiaolan
1e86418c4a optimize dy_cfp8's performance (#4145)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: carryyu <569782149@qq.com>
2025-09-19 09:35:28 +08:00
Yuan Xiaolan
25aa2d94aa cp dynamic Cfp8 (#4120)
* supports dynamic Cfp8

* add unittest

* fix dynamic Cfp8 computing error

* fix Cfp8 for RL load

---------

Co-authored-by: carryyu <569782149@qq.com>
2025-09-17 11:55:47 +08:00
freeliuzc
76759108c9 [Feature][SpeculativeDecoding]Support tree-attention (#3514)
* support tree-attention

* fix merge bug

* fix unit-test api

* fix merge bug
2025-08-22 13:36:41 +08:00
lzy
1e06b9fa6d make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset

* add unittest
2025-08-14 03:40:55 -07:00
周周周
ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
周周周
aa76085d1f [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
2025-07-16 20:10:57 +08:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00