yzwu
dc7facaa7f
[Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph ( #4504 )
2025-10-21 21:41:41 +08:00
RAM
d70aacfbdc
[FDConfig] Turn on the CUDAGraph + MultiModel switch ( #4512 )
2025-10-21 06:21:26 -07:00
SunLei
809c1ac7ec
feat: add post-processing step for pool_output ( #4462 )
...
* feat: add post-processing step for pool_output
* bugfix
* fix: test_serving_embedding
* fix test_request_to_batch_dicts
* fix: code style
2025-10-21 20:24:26 +08:00
RAM
7cbe6b2472
[FDConfig] Turn on the CUDAGraph + Speculative Decoding switch ( #4511 )
2025-10-21 03:34:16 -07:00
ltd0924
fb76cdfb4f
[Fearture] Support mm model close prefix cache ( #4459 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* [BugFix] fix workers more than 1
* fix
* Update api_server.py
* fix
* Update api_server.py
* fix
* [Fearture] Support mm model close prefix cache
* Update api_server.py
* Update engine_client.py
* Update engine_client.py
* add test
* Update test_chat.py
* fix
* fix
* Update test_chat.py
* Update test_chat.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 15:37:59 +08:00
SunLei
ee915220bd
[Speculative Decoding] Add draft_logprobs Support for Speculative Decode MTP ( #4467 )
...
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* fix: postprocess for speculative decode
* test: test_speculative_decoding_use_logprobs
* fix: test_completion_echo
* fix test_max_streaming_tokens
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 14:57:50 +08:00
RAM
775edcc09a
[Executor] Default use CUDAGraph ( #3594 )
...
* add start intercept
* Adjustment GraphOptConfig
* pre-commit
* default use cudagraph
* set default value
* default use cuda graph
* pre-commit
* fix test case bug
* disable rl
* fix moba attention
* only support gpu
* Temporarily disable PD Disaggregation
* set max_num_seqs of test case as 1
* set max_num_seqs and temperature
* fix max_num_batched_tokens bug
* close cuda graph
* success run wint2
* profile run with max_num_batched_tokens
* 1.add c++ memchecker 2.success run wint2
* updatee a800 yaml
* update docs
* 1. delete check 2. fix plas attn test case
* default use use_unique_memory_pool
* add try-except for warmup
* ban mtp, mm, rl
* fix test case mock
* fix ci bug
* fix form_model_get_output_topp0 bug
* fix ci bug
* refine deepseek ci
* refine code
* Disable PD
* fix sot yaml
2025-10-21 14:25:45 +08:00
gaoziyuan
d85ef5352a
【BugFix】fix ep buffer clear ( #4450 )
...
* fix
* fix
2025-10-21 10:56:00 +08:00
ltd0924
a498736af5
[APIServer] support define gunicorn timeout ( #4496 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [BUGFIX] clear request #4286
* [BugFix] support define gunicorn timeout
* Update utils.py
* Update utils.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-20 23:36:07 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00
yinwei
bf03b6fcea
fix vl bug ( #4485 )
2025-10-20 20:13:34 +08:00
yyssys
97ee3c403a
[XPU]Fix w4a8 garbled code issue ( #4493 )
2025-10-20 19:41:11 +08:00
李泳桦
b8d235445e
[fix] remove cache tensor creation for cache_transfer_manager ( #4420 )
...
* [fix] remove cache tensor creation for cache_transfer_manager
* [fix] fix code style
* [fix] fix code style
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-20 16:19:56 +08:00
bukejiyu
de2eaf4f81
add qwen-2.5-7B-PRM/ernie-rm ( #4319 )
2025-10-20 15:31:03 +08:00
GoldPancake
47595a2480
[Feature] support mtp logprob ( #4464 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob
* fix unitest
2025-10-20 15:18:12 +08:00
Haonan Luo
1b9f351d21
Support GPT-OSS-BF16 ( #4240 )
...
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-10-20 14:44:58 +08:00
SuperNova
80a16c4c87
[fix] adjust mctlass moe api ( #4474 )
2025-10-20 14:23:54 +08:00
zhuzixuan
1e59905e34
Optimization of ‘tools’ in request fields ( #4380 )
...
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
2025-10-20 11:04:08 +08:00
RAM
528c55776e
[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP ( #4456 )
...
* Fix MTP dummy run bug
* Target Model and Draft Model using the same flag
* In mtp replace use_cudagraph as step_use_cudagraph
2025-10-20 10:38:55 +08:00
kxz2002
b5b993e48e
【feature】support n parameter ( #4273 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support n parameter
* pre-commit check
* pre-commit check
* restore format_and_add_data
* update n_param
* bug fix index - str to int
* bug fix del child_task
* bug fix metrics
* add debug info
* add debug info2
* remove debug info
* change connecting symbol to '-'
* bugfix change connecting symbol
* bugfix change connecting symbol2
* unit tests fix
* unit test fix2
* unittest add param n=2
* n param add unit tests and adapt to echo
* pre-commit fix
* resolve review
* adjust stop reason
* add unittest for _create_chat_completion_choice
* modify unittest
* solve confict
* solve conflict
* resolve conflict
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: gaoziyuan <m13689897706@163.com >
2025-10-17 20:51:59 +08:00
kxz2002
8ccfd975b5
LLM.chat add "tools" param ( #4415 )
...
* llm add tools param initial commit
* llm add tools param bugfix
* offline add tools add unittests
* fix preprocessor
* move tools paramter into tasks
* change variable name
2025-10-17 20:25:03 +08:00
yinwei
a64c0408b9
[XPU]Fix w4a8 precision bug && rollback moe algo ( #4463 )
...
* fix w4a8 precision bug
* add env
* code stype check
2025-10-17 18:27:53 +08:00
chen
63ef593450
check paddle version for v1 loader ( #4473 )
2025-10-17 17:25:03 +08:00
yzwu
4b661512ca
[Iluvatar GPU] Adapt VL model ( #4313 )
2025-10-17 16:13:38 +08:00
Ayakouji
a3e0a15495
fix seqlen sync ( #4442 )
2025-10-17 14:37:52 +08:00
ddchenhao66
14785eb65d
[XPU] abstract a hardware-agnostic operator wrapper for prefix cache and specify xpu device id definition ( #4455 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-17 14:05:33 +08:00
lizexu123
c234b995ab
[Feature] support pooling model dummy_run ( #4345 )
...
* support qwen3-embedding
* fix ci bug
* support pooling dummy_run
* fix
* delete print
* parallel_config.max_model_len
* delete is_pooling_model in dummy_run
* fix
* fd_model
* fix embedding load
* fix
* fix post_process
2025-10-17 13:30:55 +08:00
Ryan
15b6b8dc25
[CINN] Remove the restriction of automatically falling back to SOT after enabling CINN ( #4411 )
...
* remove CINN limitation
* fix unitest
* fix codestyle
2025-10-17 12:51:07 +08:00
chen
b134e6afe6
[BugFix]Dev fix custom ar unstable result ( #4437 )
2025-10-17 11:47:16 +08:00
Ryan
6160145f82
[SOT] Change warnings to errors and remove fallback operations ( #4378 )
...
* Change warnings to errors and remove fallback operations
* fix unitest
* fix codestyle
2025-10-17 11:27:04 +08:00
chenjian
0413c32b8f
[Optimize] Set preempted schedule log as info level ( #4453 )
2025-10-17 11:25:46 +08:00
Sunny-bot1
930f7b781c
[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend ( #4443 )
...
* get block in cuda graph
* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp ( #3694 )
...
* rm inplace info && to(gpu)
* update append_attention
* unpin paddle version
* add full_cuda_graph=False
* add blank line
---------
Co-authored-by: SigureMo <sigure.qaq@gmail.com >
2025-10-17 10:57:55 +08:00
YuanRisheng
a37c9416ac
[FDConfig]Remove reasoning_parser/guided_decoding_backend/disable_any_whitespace/device_ids in FDConfig ( #4362 )
...
* remove devices id
* fix unittest
* fix ce
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-17 10:40:59 +08:00
xiaolei373
d1637db86a
modify_comment ( #4460 )
2025-10-17 10:10:09 +08:00
chen
db82e9a022
[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True ( #4449 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix group_topk renormalized=True
* check test
2025-10-16 23:17:48 +08:00
xiaolei373
dbca63f862
[bugfix] kill cache_transfer_manager process ( #4401 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-16 20:45:24 +08:00
YuanRisheng
0355235fb9
[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig ( #4400 )
...
* delete some attr in parallel config
* delete comment
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-16 20:00:37 +08:00
Ryan
b87e2c6184
[CUDAGraph]Add support for custom all-reduce operators under SOT mode ( #4386 )
2025-10-16 19:31:19 +08:00
zhupengyang
26ff2f8683
[XPU] refine fused moe ( #4219 )
2025-10-16 19:04:07 +08:00
Jianyu Li
3bbe99eae7
[Intel HPU] Enable dist sampler on intel hpu platform ( #4445 )
2025-10-16 19:02:27 +08:00
LiqinruiG
4251ac5e95
【Fix】 remove text_after_process & raw_prediction ( #4421 )
...
* remove text_after_process & raw_prediction
* remove text_after_process & raw_prediction
2025-10-16 19:00:18 +08:00
kevin
f72be7a2c8
[BUG] fix ep bug ( #4275 )
...
* fix ep bug
* update code
* update code
* update code
* [BugFix] fix config bugs (#4370 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* Update expert_service.py
* Update expert_service.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* update code
---------
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-16 16:46:40 +08:00
SunLei
5abf59715d
perf: optimize ZMQ communication with async queue and single-threaded… ( #4444 )
...
* perf: optimize ZMQ communication with async queue and single-threaded model
* perf: _async_output_busy_loop
* fix: async_output_queue init
2025-10-16 15:46:26 +08:00
Lucas
a5063b96c8
[XPU] moe support VL 0-dim input ( #4408 )
2025-10-16 14:01:01 +08:00
gaoziyuan
fd5dd1a0f1
[Bugfix]fix ep clear buffer perf ( #4389 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix
* Update fused_moe_backend_base.py
2025-10-16 13:05:39 +08:00
chenjian
670aaa3f83
[Bug fix] Fix pd for x1 thinking ( #4433 )
2025-10-16 12:03:45 +08:00
ddchenhao66
8e392f0ea6
[XPU] support prefix cache ( #4423 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-16 11:27:41 +08:00
ltd0924
5bde20b0c9
[BugFix] fix config bugs ( #4370 )
...
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* Update expert_service.py
* Update expert_service.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-16 10:25:21 +08:00
SunLei
b4b579a7ed
Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. ( #4344 )
...
* feat: add OpenAIServing
* feat: add ZmqOpenAIServing & OpenAIServingEmbedding
* feat: Refine the basic ServingEngine class and introduce ServingContext
* fix: codestyle
* fix: request
* fix: pooling_params
* feat: _process_chat_template_kwargs
* feat: support batch request
* feat: pooling_params verify & default parameters
---------
Co-authored-by: sunlei1024 <sunlei1024@example.com >
2025-10-15 19:42:59 +08:00