YuanRisheng
|
a2ec2c4152
|
[FDConfig]Remove max_model_len in FDConfig (#4350)
* modify max_model_len
* fix unittest
* fix unittest
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
|
2025-10-11 14:04:17 +08:00 |
|
RAM
|
aa27b03bc0
|
[Executor]CUDAGraph support Speculate Decode (#3769)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram
* Revert "[Code Simplification] remove cum_offsets (#3410)"
This reverts commit 32b39620bc.
* success run ngram5 tp4 42bs
* success run ngram5 tp4 42bs
* mtp draft commit
* add decorator for target model
* enable draft model in cudagraph v0.5
* revert revrt cum_offset
* enable target model in cudagraph v0.9 And clean debug code
* Revert "success run ngram"
This reverts commit 8351e83993.
* add reverted code
* enable target model in cudagraph v0.9
* solve comment
* fix bid < 0
* Enable Target Model Padding And Draft Model in cudagraph
* solve problem
* delete rebuild padding debug note
* fast compile
* Add capture list for mtp
* success run 256 tp1 mtp
* Enable Lite TP2 Bsz256
* realy enable tp2 bsz 256
* fix problem
* Solve problem for Draft model in cudagraph
* Solve comment
* replace emptytensor as zeros
* Solve comments
* Revert "fast compile"
This reverts commit 834639a7ff.
* fix bug
* fix merge bug
* fix typo
* fix bug
---------
Co-authored-by: lizexu <2694294196@qq.com>
Co-authored-by: littledgg <1658565283@qq.com>
Co-authored-by: zeroRains <linjunlu@zerorains.top>
Co-authored-by: gongshaotian <gstain5555@outlook.com>
|
2025-10-09 21:18:29 +08:00 |
|
YuanRisheng
|
2e9e53ff7e
|
[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116)
* remove max_num_batched_tokens in parallel config
* remove max_num_seqs
* update test case
* fix test
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
|
2025-09-17 10:43:35 +08:00 |
|
co63oc
|
d6369b4d51
|
fix typos (#3684)
|
2025-09-01 17:50:17 +08:00 |
|
lzy
|
48d760539b
|
fix deepcopy(tp_group) in spec (#3648)
|
2025-08-29 16:08:21 +08:00 |
|
freeliuzc
|
52eda7fdb3
|
[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610)
|
2025-08-26 14:29:22 +08:00 |
|
YuanRisheng
|
6ccc10ad47
|
Unify server-side and model-side Config (Part1) (#3018)
* move cache config
* fix mtp
|
2025-07-28 10:51:52 +08:00 |
|
freeliuzc
|
667547be59
|
support chunk_prefill in MTP (#2705)
|
2025-07-04 11:55:48 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|