RAM
|
aa27b03bc0
|
[Executor]CUDAGraph support Speculate Decode (#3769)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram
* Revert "[Code Simplification] remove cum_offsets (#3410)"
This reverts commit 32b39620bc.
* success run ngram5 tp4 42bs
* success run ngram5 tp4 42bs
* mtp draft commit
* add decorator for target model
* enable draft model in cudagraph v0.5
* revert revrt cum_offset
* enable target model in cudagraph v0.9 And clean debug code
* Revert "success run ngram"
This reverts commit 8351e83993.
* add reverted code
* enable target model in cudagraph v0.9
* solve comment
* fix bid < 0
* Enable Target Model Padding And Draft Model in cudagraph
* solve problem
* delete rebuild padding debug note
* fast compile
* Add capture list for mtp
* success run 256 tp1 mtp
* Enable Lite TP2 Bsz256
* realy enable tp2 bsz 256
* fix problem
* Solve problem for Draft model in cudagraph
* Solve comment
* replace emptytensor as zeros
* Solve comments
* Revert "fast compile"
This reverts commit 834639a7ff.
* fix bug
* fix merge bug
* fix typo
* fix bug
---------
Co-authored-by: lizexu <2694294196@qq.com>
Co-authored-by: littledgg <1658565283@qq.com>
Co-authored-by: zeroRains <linjunlu@zerorains.top>
Co-authored-by: gongshaotian <gstain5555@outlook.com>
|
2025-10-09 21:18:29 +08:00 |
|
lzy
|
af49b81ffd
|
supports dynamic Cfp8 (#3767)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
|
2025-09-07 20:41:29 -07:00 |
|
lzy
|
7a521bbf62
|
Modify mask_offset‘s format (#3525)
* modify mask_offset in decode
* modify mask_offset unittest
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-09-02 03:05:35 -07:00 |
|
freeliuzc
|
76759108c9
|
[Feature][SpeculativeDecoding]Support tree-attention (#3514)
* support tree-attention
* fix merge bug
* fix unit-test api
* fix merge bug
|
2025-08-22 13:36:41 +08:00 |
|
lzy
|
1e06b9fa6d
|
make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset
* add unittest
|
2025-08-14 03:40:55 -07:00 |
|
周周周
|
ddb10ac509
|
[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
|
2025-07-17 18:41:31 +08:00 |
|
周周周
|
aa76085d1f
|
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
|
2025-07-16 20:10:57 +08:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|