周周周
6fa34102e8
[Others]get_block_shape_and_split_kv_block clean code ( #5123 )
2025-11-20 16:40:04 +08:00
Haonan Luo
1b9f351d21
Support GPT-OSS-BF16 ( #4240 )
...
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-10-20 14:44:58 +08:00
Ryan
49cea8fb1c
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp ( #3694 )
...
* rm inplace info && to(gpu)
* update append_attention
* unpin paddle version
* add full_cuda_graph=False
* add blank line
---------
Co-authored-by: SigureMo <sigure.qaq@gmail.com >
2025-10-17 10:57:55 +08:00
Sunny-bot1
a751d977bc
[Optimization] Fuse get_max_len and get_kv_max_len ( #4369 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block
* fuse max_lens and max kv_len
2025-10-13 20:35:00 +08:00
Nyakku Shigure
5f80862578
Remove redundant inplace outputs for append_attention ( #4340 )
2025-10-10 10:21:27 +08:00
RAM
aa27b03bc0
[Executor]CUDAGraph support Speculate Decode ( #3769 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram
* Revert "[Code Simplification] remove cum_offsets (#3410 )"
This reverts commit 32b39620bc .
* success run ngram5 tp4 42bs
* success run ngram5 tp4 42bs
* mtp draft commit
* add decorator for target model
* enable draft model in cudagraph v0.5
* revert revrt cum_offset
* enable target model in cudagraph v0.9 And clean debug code
* Revert "success run ngram"
This reverts commit 8351e83993 .
* add reverted code
* enable target model in cudagraph v0.9
* solve comment
* fix bid < 0
* Enable Target Model Padding And Draft Model in cudagraph
* solve problem
* delete rebuild padding debug note
* fast compile
* Add capture list for mtp
* success run 256 tp1 mtp
* Enable Lite TP2 Bsz256
* realy enable tp2 bsz 256
* fix problem
* Solve problem for Draft model in cudagraph
* Solve comment
* replace emptytensor as zeros
* Solve comments
* Revert "fast compile"
This reverts commit 834639a7ff .
* fix bug
* fix merge bug
* fix typo
* fix bug
---------
Co-authored-by: lizexu <2694294196@qq.com >
Co-authored-by: littledgg <1658565283@qq.com >
Co-authored-by: zeroRains <linjunlu@zerorains.top >
Co-authored-by: gongshaotian <gstain5555@outlook.com >
2025-10-09 21:18:29 +08:00
lzy
af49b81ffd
supports dynamic Cfp8 ( #3767 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
2025-09-07 20:41:29 -07:00
周周周
f6f726c773
clean code in sttantion ( #3917 )
2025-09-05 20:49:01 +08:00
xiaoxiaohehe001
f265a26f8b
support mtp rope_3d ( #3791 )
...
* support mtp rope_3d
* Update speculate_write_cache_with_rope_kernel.cu
2025-09-04 17:18:05 +08:00
Yuan Xiaolan
fa58a9fa8f
qk norm for speculate decode C16 ( #3637 )
2025-09-03 14:53:56 +08:00
Liumengyuan
e93d4cfcdd
Add with_output version AppendAttention ( #3302 )
...
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug
2025-08-28 17:10:18 +08:00
lzy
1e06b9fa6d
make append_attn supports mask_offset ( #3138 )
...
* make append_attn supports mask_offset
* add unittest
2025-08-14 03:40:55 -07:00
Ryan
ed6bff215a
fix custom op order rms_norm_eps ( #3348 )
2025-08-13 10:12:49 +08:00
Yuan Xiaolan
7ce00e597c
support qk norm ( #3145 )
2025-08-05 16:46:14 +08:00
周周周
ddb10ac509
[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )
...
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
周周周
aa76085d1f
[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 )
2025-07-16 20:10:57 +08:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00