Commit Graph

14 Commits

Author SHA1 Message Date
RAM
d850660872 [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989)
* reset decoder_block_shape_q buffer

* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch

* update decode_max_tile_size

* fix pre-commit

* update block_multihead_attn_backend

* update flas attn backend

* update MLA Attention

* update XPU Attention

* update gcu,iluvatar model runner

* Update MTP

* fix MTP bug
2025-07-31 00:09:31 +08:00
lizhenyun01
238766e403 fix c4 prompt_cache 2025-07-28 14:31:37 +08:00
lizhenyun01
6235ef3881 fix chunk_prefill 2025-07-24 12:00:52 +08:00
lizhenyun01
29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
lizhenyun01
e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
K11OntheBoat
e991777757 [Feature] DeepseekV3 use pd_build_static_op (#2948)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 15:03:41 +08:00
K11OntheBoat
8020927f50 [BugFix] Rename attention params of deepseekv3 (#2939)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 14:01:30 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
周周周
d306944f4f remove cum_offsets from get_block_shape_and_split_kv_block (#2913)
* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove cum_offsets from get_block_shape_and_split_kv_block

* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
周周周
ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
周周周
aa76085d1f [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
2025-07-16 20:10:57 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00