YuBaoku
f1ea3830aa
[CI] remove ernie-4_5-vl test_consistency_between_runs ( #4846 )
...
* [CI] update paddlepaddle-gpu==3.2.0 in release/2.2
* [CI] debug paddleformers==0.3.0 in release/2.2
* [CI] update paddlepaddle==3.2.0 in release/2.2
* [CI] remove ernie-4_5-vl test_consistency_between_runs
2025-11-06 14:19:04 +08:00
chen
f660188a85
[cp][BugFix]2.2_fix_custom_ar_unstable_result ( #4436 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [BugFix]Dev fix custom ar unstable result (#4437 )
* code check
2025-10-17 16:04:54 +08:00
luukunn
aebe12a58d
[fix]update apply_chat_template ( #4249 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4086 )
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
* fix
* [fix]update apply_chat_template (#4137 )
* update apply_chat_template
* fix unittest
* fix unittest
* fix
* fix
* fix unit test
* fix
* fix unit test
* add unit test
2025-09-25 16:41:56 +08:00
chen
f38b174a75
Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method ( #4115 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* improve per_token_quant_fp8 performance
* support moe wfp8apf8
* check glm test
* fix noaux_tc op in cudagraph, support noaux_tc return the correct
* check
* check inf and overwrite score in noaux_tc
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-22 21:27:37 +08:00
luukunn
6b47773bd6
[fix]Modify follow-up push parameters and Modify the verification method for thinking length ( #4177 )
...
* [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4086 )
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
* fix
2025-09-22 21:12:05 +08:00
Sunny-bot1
4f460db556
[CP2.2] Machete support group scale & wint8 & v1 loader ( #4166 )
...
* support v1 loader for machete (#3999 )
* [Optimize] Support WINT8 and group scale for Machete (#3905 )
* [Optimize] Machete using group scale default (#4121 )
2025-09-19 11:13:12 +08:00
K11OntheBoat
7f9a9b37f3
Support limit thinking lengths ( #4070 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-09-17 12:40:08 +08:00
chen
fbb4e0f8d1
[CP]Glm45 air 2.2 ( #4073 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928 )
* support glm45_air
* [Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051 )
* check
* fix v1 load for mix and wint8
* check --quantizations 'None'
* check
* support RL rollout
* check v1 loader
* check glm rollout_model, change wfp8afp8 per_token_cast_to_fp8 to native impl
* check rollout moe gate begin layer_id
* check rollout e_score_correction_bias
* delete infer_to_train_mapping={}
* code check
2025-09-15 18:52:58 +08:00
chenjian
4f8ff478b3
[Feature] Support mixed deployment with yiyan adapter in release22 ( #3974 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* fix metrics
* add unit test
* add unit test
* add unit test
* add unit test
* add unit test
* add unit test
2025-09-10 16:01:13 +08:00
yangjianfengo1
dfc94371ee
【FIX】Change the name of sparse attn from moba to plas ( #4006 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* 更新文档
* 【docs】 update readme (#4000 )
* 更新文档
* update readme
* update docs
* 【FIX】Change the name of sparse attn from moba to plas (#3845 )
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
* code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-10 10:04:29 +08:00
zhuzixuan
d43c2f2577
[Optimize]Error messages about Model api. ( #3839 ) ( #3972 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add v1/models interface related
* add model parameters
* default model verification
* unit test
* check model err_msg
* unit test
* type annotation
* model parameter in response
* modify document description
* modify document description
* unit test
* verification
* verification update
* model_name
* pre-commit
* update test case
* update test case
* Update tests/entrypoints/openai/test_serving_models.py
* Update tests/entrypoints/openai/test_serving_models.py
* Update tests/entrypoints/openai/test_serving_models.py
* Update tests/entrypoints/openai/test_serving_models.py
* Update fastdeploy/entrypoints/openai/serving_models.py
* 优化报错信息。
---------
Co-authored-by: yangzichao01 <yangzichao01@baidu.com >
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com >
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-09-09 10:58:11 +08:00
Zhang Yulong
8903f937f9
update ci ( #3953 )
2025-09-08 14:21:25 +08:00
bukejiyu
051e4a881c
ignore ( #3949 )
2025-09-07 23:57:48 +08:00
chenjian
41cd3e24c9
[Feature] Enable prefix caching as default ( #3816 )
...
* [Feature] Enable prefix caching as default
* [Feature] Enable prefix caching as default
* Set prefix caching as default
* skip dynamic load
* fix kill bug
* fix kill bug
* fix kill bug
* fix ci
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-06 09:51:34 +08:00
Zhang Yulong
11b18e5ef0
add cache queue port ( #3904 ) ( #3926 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add cache queue port
* add cache queue port
* add cache queue port
2025-09-06 00:00:12 +08:00
chenjian
fb1e0d6a87
[Feature] Set scheduler v1 as default ( #3812 )
...
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
2025-09-04 11:02:10 +08:00
SunLei
8c0e7d6fe9
Support for async processor added. ( #3870 )
...
* Support for async processor added.
* remove yappi code
2025-09-04 10:35:08 +08:00
yangjianfengo1
b56b015d85
fix port ( #3865 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-04 10:02:08 +08:00
lizhenyun01
bed09ae8f8
fix mask_offset in append_attn ( #3745 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn
* fix test
2025-08-31 15:03:16 +08:00
SunLei
b9af95cf1c
[Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. ( #3674 )
...
* [Feature] add AsyncTokenizerClient
* add decode_image
* Add response_processors with remote decode support.
* [Feature] add tokenizer_base_url startup argument
* Revert comment removal and restore original content.
* [Feature] Non-streaming requests now support remote image decoding.
* Fix parameter type issue in decode_image call.
* Keep completion_token_ids when return_token_ids = False.
* add copyright
2025-08-30 17:06:26 +08:00
lizexu123
b21e085f3e
[Code Simplification] delete print ( #3729 )
2025-08-30 16:19:07 +08:00
lizexu123
455205f991
[Features] support hugging face qwen3 moe ( #3649 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* split ut
* qwen3-30B-A3B
* fix
* add test
* add test_torch_model.py
* fix test_torch_model.py
* delete print
* fix moe
* delete init.py
* fix
* fix
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
2025-08-30 15:26:05 +08:00
YUNSHEN XIE
a18afcfdd9
Optimize coverage jobs ( #3683 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-30 00:12:40 +08:00
yangjianfengo1
3754a9906d
[Feature] block sparse attention ( #3668 )
...
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp 最后一层使用full attn
2025-08-29 19:46:30 +08:00
Yuan Xiaolan
c71ee0831c
add w4afp8 offline script ( #3636 )
2025-08-29 17:56:05 +08:00
Ryan
45f81b34f0
add dtype int32 ( #3692 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
李泳桦
88297240e7
[feat] completion api supports passing input token ids in either prompt or prompt_token_ids ( #3311 )
...
* [feat] completion api supports passing input token ids in either `prompt` or `prompt_token_ids`
* [fix] update comment
* [fix] fix type error
* [test] add a unittest file for serving api test
* [test] try to fix ci error
* [chore] rename test function names
* [test] try to fix ci error
* [test] try to fix ci error
* [test] add tests for qwen
2025-08-29 14:19:42 +08:00
周周周
17b414c2df
MoE Default use triton's blockwise fp8 in TP Case ( #3678 )
2025-08-29 11:07:30 +08:00
Echo-Nie
43d5bd62b4
【Hackathon 9th No.70】supplementary unit test for CPUPlatform and CUDAPlatform ( #3580 )
...
* 功能模块 CUDAPlatform、CPUPlatform 单测补充
* update the "is_cuda" to "is_cuda_and_available"
* fix pre-commit
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-08-29 10:34:05 +08:00
Yuanle Liu
4957908275
add input_processor plugin ( #3657 )
...
* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2025-08-28 22:53:57 +08:00
Divano
17731a8acd
add concurrency cases ( #3689 )
2025-08-28 18:30:19 +08:00
Liumengyuan
e93d4cfcdd
Add with_output version AppendAttention ( #3302 )
...
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug
2025-08-28 17:10:18 +08:00
bukejiyu
73cf6096da
fix ( #3676 )
...
* fix
* update
2025-08-28 17:06:32 +08:00
co63oc
d4fc893fe3
fix typos ( #3633 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-28 14:42:24 +08:00
Echo-Nie
7afcd4b776
【Hackathon 9th No.77】supplementary unit test for get_filtered_metrics ( #3578 )
...
* 功能模块 fastdeploy/metrics/metrics/get_filtered_metrics 单测补充
* fix pre-commit
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-08-28 10:39:02 +08:00
ltd0924
3d92fb09f7
[BugFix] fix parameter is 0 ( #3592 )
...
* Update engine_client.py
* fix
* Update common_engine.py
2025-08-28 09:52:36 +08:00
Sunny-bot1
479c8b85d3
[Optimize]support machete weight only gemm ( #3561 )
...
* support machete weight only gemm
* add generate
* update
* fix
* change file location
* add sm_version limit
* fix
* fix
* fix ci
* fix coverage
* fix xpu
2025-08-28 09:49:58 +08:00
Zero Rains
e37e86b3b8
[V1 Loader]support param create and load for wint2 and xpu backend ( #3581 )
...
* support wint2 backend'
* [V1 Loader]support param create and load for wint2 and xpu backend
* update weight shape name
* update
* update
* update baseline.txt
* update model name
* update baseline.txt
* fix codestyle
* remove debug coode
2025-08-28 09:49:36 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8 .
2025-08-27 17:35:04 +08:00
lzy
1265f6c192
deepgemm don't support tp+ep (for ci) ( #3638 )
...
* deepgemm don't support tp+ep (for ci)
* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
xjkmfa
afb9f327ef
【CI case】for echo finish_reason text_after_process and raw_prediction check ( #3630 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-27 15:21:16 +08:00
YUNSHEN XIE
85afa72763
fix publish task ( #3635 )
...
* fix publish task
* disable ut
2025-08-27 11:14:53 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
RAM
f0a362af18
[CUDAGraph]Switch the scope so that output buffer of CUDAGraph can automatically release ( #3612 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* fix typo
* fix typo
* add print dot files
* fix bug
* Switch the scope so that output buffer of cudagraph can automatically release
* Revert "add print dot files"
This reverts commit dc21809eb5 .
2025-08-26 21:28:19 +08:00
Yuanle Liu
cbce94a00e
rename ernie_xxx to ernie4_5_xxx ( #3621 )
...
* rename ernie_xxx to ernie4_5_xxx
* ci fix
2025-08-26 19:29:27 +08:00
YuanRisheng
642480f5f6
[CI] Standard unittest ( #3606 )
...
* standard unittest
* fix bugs
* fix script
2025-08-26 19:03:11 +08:00
bukejiyu
3200a80de3
[v1 loader]support fp8 ( #3593 )
...
* support fp8
* update ci
2025-08-26 02:42:46 -07:00
freeliuzc
52eda7fdb3
[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram ( #3610 )
2025-08-26 14:29:22 +08:00
Ryan
a5b4866ff1
[CudaGraph][SOT] Add unit tests for splitting the static graph into piecewise graphs that support cuda_graph ( #3590 )
...
* add unitest
* change sot_warmup_sizes
* wtf; add missed commit
2025-08-26 11:25:04 +08:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00