Yuan Xiaolan
|
1e86418c4a
|
optimize dy_cfp8's performance (#4145)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: carryyu <569782149@qq.com>
|
2025-09-19 09:35:28 +08:00 |
|
Yuan Xiaolan
|
25aa2d94aa
|
cp dynamic Cfp8 (#4120)
* supports dynamic Cfp8
* add unittest
* fix dynamic Cfp8 computing error
* fix Cfp8 for RL load
---------
Co-authored-by: carryyu <569782149@qq.com>
|
2025-09-17 11:55:47 +08:00 |
|
freeliuzc
|
76759108c9
|
[Feature][SpeculativeDecoding]Support tree-attention (#3514)
* support tree-attention
* fix merge bug
* fix unit-test api
* fix merge bug
|
2025-08-22 13:36:41 +08:00 |
|
lzy
|
1e06b9fa6d
|
make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset
* add unittest
|
2025-08-14 03:40:55 -07:00 |
|
周周周
|
ddb10ac509
|
[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
|
2025-07-17 18:41:31 +08:00 |
|
周周周
|
aa76085d1f
|
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
|
2025-07-16 20:10:57 +08:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|