lzy
|
5300e73f8b
|
[Others] Maintain the mtp branch temporarily. (#5446) (#5621)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
|
2025-12-17 06:03:25 -08:00 |
|
freeliuzc
|
d7d633a285
|
[Cherry-Pick][CI]Fix write qknorm cache bug in speculative decoding(#5491) (#5617)
* [liuzichang spend 10 dyas]fix write qknorm cache bug
* fix 'fix cachekv bug''
|
2025-12-17 20:08:51 +08:00 |
|
chen
|
5bdef760a2
|
[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486) (#5536)
|
2025-12-15 15:53:34 +08:00 |
|
lzy
|
c71a44c7e5
|
supports mtp split_kv_attn (#5343)
|
2025-12-03 12:40:16 +08:00 |
|
K11OntheBoat
|
2e1680838f
|
[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
|
2025-12-02 14:11:50 +08:00 |
|
lizhenyun01
|
aba4fc657f
|
[Feature] support flash_mask_attention backend (#5134)
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
|
2025-11-28 10:12:16 +08:00 |
|
freeliuzc
|
2d1dade5e2
|
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage (#5155)
* support static cachekv c8 quantization in mtp mode
* optimize memory allocation
|
2025-11-21 15:10:13 +08:00 |
|
周周周
|
385fe6dade
|
[Others] clean code (#5133)
|
2025-11-20 18:44:08 +08:00 |
|
周周周
|
6fa34102e8
|
[Others]get_block_shape_and_split_kv_block clean code (#5123)
|
2025-11-20 16:40:04 +08:00 |
|
chen
|
d58c1db8a0
|
[Feature][OP] Append Attn Support CUDA-PDL (#5072)
|
2025-11-17 20:47:33 +08:00 |
|
周周周
|
b23e684b67
|
revert group size 3 (#5079)
|
2025-11-17 18:54:13 +08:00 |
|
Sunny-bot1
|
8a4ddb29df
|
Revert "[BugFix] Revert skip capture (#5023)" (#5080)
|
2025-11-17 16:14:55 +08:00 |
|
Sunny-bot1
|
249feca65a
|
[BugFix] Revert skip capture (#5023)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Revert "[BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000)"
This reverts commit 05da8e34c0.
* Revert "skip DtoH capture (#4988)"
This reverts commit 5b24013d46.
|
2025-11-13 23:52:51 -08:00 |
|
周周周
|
c0a4393d72
|
[ATTENTION] unitest (#4962)
|
2025-11-14 13:45:53 +08:00 |
|
carryyu
|
6c3d1da62f
|
fix conflicts
|
2025-11-13 20:30:29 +08:00 |
|
Sunny-bot1
|
05da8e34c0
|
[BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000)
* fix metax compile
* fix
|
2025-11-13 00:55:06 -08:00 |
|
Sunny-bot1
|
5b24013d46
|
skip DtoH capture (#4988)
|
2025-11-13 10:57:44 +08:00 |
|
周周周
|
6e01be28e0
|
format code (#4720)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
|
2025-11-01 19:13:50 +08:00 |
|
Zhenghai Zhang
|
1712e1351b
|
【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592)
* autogen MoeFastHardamardImplWrapper template_instantiation
* fix codestyle
* fix codestyle
* add impl cu files
|
2025-10-30 10:28:36 +08:00 |
|
RAM
|
86d5006a57
|
[Graph Optimization][Speculative Decoding] Update yaml and fix typo (#4612)
|
2025-10-28 11:43:26 +08:00 |
|
xiaozude
|
f7069b8057
|
[Metax] adapt DeepSeek (#4498)
|
2025-10-24 10:14:53 +08:00 |
|
Sunny-bot1
|
8718fa34b2
|
support static C8 (#4568)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
|
2025-10-23 22:01:03 +08:00 |
|
RAM
|
9dc5c3e370
|
[Graph Optimization] Support CUDAGraph Padding + MTP (#4545)
* Support CUDAGraph Padding + MTP
* support orther write cache kernel
|
2025-10-23 20:57:26 +08:00 |
|
Haonan Luo
|
1b9f351d21
|
Support GPT-OSS-BF16 (#4240)
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
|
2025-10-20 14:44:58 +08:00 |
|
周周周
|
817210e47f
|
[ATTN]delete code and add ffn and moe layer level test (#4440)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete code
* delete code
* delete code
* commit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
|
2025-10-19 16:23:11 +08:00 |
|
Sunny-bot1
|
930f7b781c
|
[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443)
* get block in cuda graph
* fix sot
|
2025-10-17 10:59:56 +08:00 |
|
Zhenghai Zhang
|
6adfbe07ad
|
【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part (#4383)
* split MultiQueryDecoderAttention template_instantiation
* update comment
* CI
|
2025-10-16 17:08:19 +08:00 |
|
Sunny-bot1
|
6d0cc0dd9c
|
[Optimization] Optimize split_q_block kernel (#4367)
|
2025-10-15 11:28:00 +08:00 |
|
Sunny-bot1
|
a751d977bc
|
[Optimization] Fuse get_max_len and get_kv_max_len (#4369)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block
* fuse max_lens and max kv_len
|
2025-10-13 20:35:00 +08:00 |
|
AIbin
|
f7eaca3971
|
【Bug Fix】mla enables tensorcore by default (#4354)
* mla tensor-core kernel is enabled by default
|
2025-10-10 20:45:16 +08:00 |
|
lzy
|
3aa04fbf21
|
[MTP][Cfp8]supports spec dynamic cfp8 (#4290)
* supports spec dynamic cfp8
* supports spec dynamic cfp8
---------
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
|
2025-10-10 16:08:10 +08:00 |
|
Zhenghai Zhang
|
c46d5e48f8
|
【Hackathon 9th No.86】autogen MultiQueryAppendC8Attention template_instantiation -part (#4330)
* split MultiQueryAppendC8Attention template_instantiation
* update setup_ops.py
* fix ci
* fix bug
|
2025-10-10 15:07:48 +08:00 |
|
AIbin
|
c4ebaf8a07
|
【Inference Optimize】MLA Tensor-Core is enabled by default (#4335)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
|
2025-10-10 10:54:56 +08:00 |
|
RAM
|
aa27b03bc0
|
[Executor]CUDAGraph support Speculate Decode (#3769)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram
* Revert "[Code Simplification] remove cum_offsets (#3410)"
This reverts commit 32b39620bc.
* success run ngram5 tp4 42bs
* success run ngram5 tp4 42bs
* mtp draft commit
* add decorator for target model
* enable draft model in cudagraph v0.5
* revert revrt cum_offset
* enable target model in cudagraph v0.9 And clean debug code
* Revert "success run ngram"
This reverts commit 8351e83993.
* add reverted code
* enable target model in cudagraph v0.9
* solve comment
* fix bid < 0
* Enable Target Model Padding And Draft Model in cudagraph
* solve problem
* delete rebuild padding debug note
* fast compile
* Add capture list for mtp
* success run 256 tp1 mtp
* Enable Lite TP2 Bsz256
* realy enable tp2 bsz 256
* fix problem
* Solve problem for Draft model in cudagraph
* Solve comment
* replace emptytensor as zeros
* Solve comments
* Revert "fast compile"
This reverts commit 834639a7ff.
* fix bug
* fix merge bug
* fix typo
* fix bug
---------
Co-authored-by: lizexu <2694294196@qq.com>
Co-authored-by: littledgg <1658565283@qq.com>
Co-authored-by: zeroRains <linjunlu@zerorains.top>
Co-authored-by: gongshaotian <gstain5555@outlook.com>
|
2025-10-09 21:18:29 +08:00 |
|
Zhenghai Zhang
|
aa045aa84f
|
fix typos (#4274)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
|
2025-09-27 09:25:43 +08:00 |
|
Yohanna
|
44010cee13
|
FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendWriteCacheKVQKV cache_kernel(). Continue when batch_id_per_token[token_idx] is default value -1. (#4218)
|
2025-09-24 14:08:49 +08:00 |
|
Yuan Xiaolan
|
de8638b1e9
|
fix dynamic Cfp8 computing error (#4119)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-09-16 20:21:49 +08:00 |
|
AIbin
|
a7392a0ff9
|
【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886)
* support MLA chunk_size auto search & cuda_graph
|
2025-09-11 10:46:09 +08:00 |
|
chen
|
637d96c6ae
|
[Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support glm45_air
|
2025-09-10 19:36:10 +08:00 |
|
freeliuzc
|
7ee100903f
|
support rope_3d in spec mode (#4034)
|
2025-09-10 03:15:05 -07:00 |
|
lzy
|
f12159b630
|
del batch id per token (#3963)
* Update decoder_write_cache_with_rope_kernel.cu
del batch_id_per_token
* Update decoder_write_cache_with_rope_impl.cuh
* Update test_append_attention.py
* Update test_append_attention.py
|
2025-09-08 21:58:34 +08:00 |
|
co63oc
|
aadd6a94d8
|
fix typos (#3951)
|
2025-09-08 15:22:41 +08:00 |
|
Jundong Liu
|
3d0aaa5923
|
[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459)
* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com>
|
2025-09-08 13:12:24 +08:00 |
|
lzy
|
af49b81ffd
|
supports dynamic Cfp8 (#3767)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
|
2025-09-07 20:41:29 -07:00 |
|
周周周
|
f6f726c773
|
clean code in sttantion (#3917)
|
2025-09-05 20:49:01 +08:00 |
|
xiaoxiaohehe001
|
f265a26f8b
|
support mtp rope_3d (#3791)
* support mtp rope_3d
* Update speculate_write_cache_with_rope_kernel.cu
|
2025-09-04 17:18:05 +08:00 |
|
Yuan Xiaolan
|
fa58a9fa8f
|
qk norm for speculate decode C16 (#3637)
|
2025-09-03 14:53:56 +08:00 |
|
lzy
|
7a521bbf62
|
Modify mask_offset‘s format (#3525)
* modify mask_offset in decode
* modify mask_offset unittest
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-09-02 03:05:35 -07:00 |
|
lizhenyun01
|
bed09ae8f8
|
fix mask_offset in append_attn (#3745)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn
* fix test
|
2025-08-31 15:03:16 +08:00 |
|
xiaoxiaohehe001
|
ad319a87cc
|
support fa3 rope3d (#3622)
|
2025-08-27 11:31:29 +08:00 |
|