Commit Graph

3547 Commits

Author SHA1 Message Date
GoldPancake
47595a2480 [Feature] support mtp logprob (#4464)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob

* fix unitest
2025-10-20 15:18:12 +08:00
Haonan Luo
1b9f351d21 Support GPT-OSS-BF16 (#4240)
* [Feature] AppendAtten support sinks & HEAD_DIM=64

* fix bug

* fix bug

* fix bug

* fix bug

* [Feature] support gpt-oss

* fix bug

* add mask

* support-gpt-oss

* support-gpt-oss

* fix long seq

* support wint8

* support wint8

* support wint8

* update test

* change sliding windows init pos

---------

Co-authored-by: ming1753 <ideaminghp@163.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
2025-10-20 14:44:58 +08:00
SuperNova
80a16c4c87 [fix] adjust mctlass moe api (#4474) 2025-10-20 14:23:54 +08:00
zhuzixuan
1e59905e34 Optimization of ‘tools’ in request fields (#4380)
* Remove multiple 'tools'

* Remove multiple 'tools'

* Remove multiple 'tools'

* Remove multiple 'tools'
2025-10-20 11:04:08 +08:00
RAM
528c55776e [Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP (#4456)
* Fix MTP dummy run bug

* Target Model and Draft Model using the same flag

* In mtp replace use_cudagraph as step_use_cudagraph
2025-10-20 10:38:55 +08:00
YuBaoku
c4fc0073cf [CI] Handle unit test issues (#4483) 2025-10-20 10:13:21 +08:00
周周周
817210e47f [ATTN]delete code and add ffn and moe layer level test (#4440)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete code

* delete code

* delete code

* commit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit

* copmmit
2025-10-19 16:23:11 +08:00
kxz2002
b5b993e48e 【feature】support n parameter (#4273)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support n parameter

* pre-commit check

* pre-commit check

* restore format_and_add_data

* update n_param

* bug fix index - str to int

* bug fix del child_task

* bug fix metrics

* add debug info

* add debug info2

* remove debug info

* change connecting symbol to '-'

* bugfix change connecting symbol

* bugfix change connecting symbol2

* unit tests fix

* unit test fix2

* unittest add param n=2

* n param add unit tests and adapt to echo

* pre-commit fix

* resolve review

* adjust stop reason

* add unittest for _create_chat_completion_choice

* modify unittest

* solve confict

* solve conflict

* resolve conflict

---------

Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: gaoziyuan <m13689897706@163.com>
2025-10-17 20:51:59 +08:00
kxz2002
8ccfd975b5 LLM.chat add "tools" param (#4415)
* llm add tools param initial commit

* llm add tools param bugfix

* offline add tools add unittests

* fix preprocessor

* move tools paramter into tasks

* change variable name
2025-10-17 20:25:03 +08:00
yangjianfengo1
329d074326 [Docx] fix the broken link (#4479)
* 修改文档

* 修改文档
2025-10-17 18:28:50 +08:00
yinwei
a64c0408b9 [XPU]Fix w4a8 precision bug && rollback moe algo (#4463)
* fix w4a8 precision bug

* add env

* code stype check
2025-10-17 18:27:53 +08:00
chen
63ef593450 check paddle version for v1 loader (#4473) 2025-10-17 17:25:03 +08:00
yzwu
4b661512ca [Iluvatar GPU] Adapt VL model (#4313) 2025-10-17 16:13:38 +08:00
yangjianfengo1
ba5c2b7e37 [Docx] add language (en/cn) switch links (#4470)
* add install docs

* 修改文档

* 修改文档
2025-10-17 15:47:41 +08:00
Ayakouji
a3e0a15495 fix seqlen sync (#4442) 2025-10-17 14:37:52 +08:00
xiaolei373
720697e265 add environment variables (#4466) 2025-10-17 14:20:01 +08:00
YuBaoku
01510876ab [CI] Fix partial instability issues (#4461) 2025-10-17 14:17:06 +08:00
ddchenhao66
14785eb65d [XPU] abstract a hardware-agnostic operator wrapper for prefix cache and specify xpu device id definition (#4455)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-17 14:05:33 +08:00
lizexu123
c234b995ab [Feature] support pooling model dummy_run (#4345)
* support qwen3-embedding

* fix ci bug

* support pooling dummy_run

* fix

* delete print

* parallel_config.max_model_len

* delete is_pooling_model in dummy_run

* fix

* fd_model

* fix embedding load

* fix

* fix post_process
2025-10-17 13:30:55 +08:00
Ryan
15b6b8dc25 [CINN] Remove the restriction of automatically falling back to SOT after enabling CINN (#4411)
* remove CINN limitation

* fix unitest

* fix codestyle
2025-10-17 12:51:07 +08:00
chen
b134e6afe6 [BugFix]Dev fix custom ar unstable result (#4437) 2025-10-17 11:47:16 +08:00
Ryan
6160145f82 [SOT] Change warnings to errors and remove fallback operations (#4378)
* Change warnings to errors and remove fallback operations

* fix unitest

* fix codestyle
2025-10-17 11:27:04 +08:00
chenjian
0413c32b8f [Optimize] Set preempted schedule log as info level (#4453) 2025-10-17 11:25:46 +08:00
Zero Rains
5885953211 [Others] add PR Template (#4452)
* add PR Template

* update

* update

* update

* update

* update

* update
2025-10-17 11:09:51 +08:00
Sunny-bot1
930f7b781c [Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443)
* get block in cuda graph

* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c [SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694)
* rm inplace info && to(gpu)

* update append_attention

* unpin paddle version

* add full_cuda_graph=False

* add blank line

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
2025-10-17 10:57:55 +08:00
YuanRisheng
a37c9416ac [FDConfig]Remove reasoning_parser/guided_decoding_backend/disable_any_whitespace/device_ids in FDConfig (#4362)
* remove devices id

* fix unittest

* fix ce

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-17 10:40:59 +08:00
xiaolei373
d1637db86a modify_comment (#4460) 2025-10-17 10:10:09 +08:00
chen
db82e9a022 [BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix group_topk renormalized=True

* check test
2025-10-16 23:17:48 +08:00
xiaolei373
dbca63f862 [bugfix] kill cache_transfer_manager process (#4401)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-16 20:45:24 +08:00
YuanRisheng
0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400)
* delete some attr in parallel config

* delete comment

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-16 20:00:37 +08:00
Ryan
b87e2c6184 [CUDAGraph]Add support for custom all-reduce operators under SOT mode (#4386) 2025-10-16 19:31:19 +08:00
zhupengyang
26ff2f8683 [XPU] refine fused moe (#4219) 2025-10-16 19:04:07 +08:00
Jianyu Li
3bbe99eae7 [Intel HPU] Enable dist sampler on intel hpu platform (#4445) 2025-10-16 19:02:27 +08:00
LiqinruiG
4251ac5e95 【Fix】 remove text_after_process & raw_prediction (#4421)
* remove text_after_process &  raw_prediction

* remove text_after_process &  raw_prediction
2025-10-16 19:00:18 +08:00
Zhang Yulong
8f77adc381 Add data dictionary for API response processing (#4454)
Initialize data dictionary for response handling.
2025-10-16 17:23:11 +08:00
Zhenghai Zhang
6adfbe07ad 【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part (#4383)
* split MultiQueryDecoderAttention template_instantiation

* update comment

* CI
2025-10-16 17:08:19 +08:00
kevin
f72be7a2c8 [BUG] fix ep bug (#4275)
* fix ep bug

* update code

* update code

* update code

* [BugFix] fix config bugs (#4370)

* Update expert_service.py

* Update common_engine.py

* Update expert_service.py

* Update expert_service.py

* Update expert_service.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* update code

---------

Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-10-16 16:46:40 +08:00
SunLei
5abf59715d perf: optimize ZMQ communication with async queue and single-threaded… (#4444)
* perf: optimize ZMQ communication with async queue and single-threaded model

* perf: _async_output_busy_loop

* fix: async_output_queue init
2025-10-16 15:46:26 +08:00
Zhang Yulong
98f8c3703a Add filtering for failed requests in benchmark outputs (#4448)
Filter out requests with end_timestamp == 0.0
2025-10-16 14:57:47 +08:00
Zhang Yulong
9dc3968c13 [benchmark] Fix benchmark duration calculation logic (#4446)
* Fix benchmark duration calculation logic

Calculate benchmark duration using filtered outputs.

* Fix benchmark duration calculation using benchmark_outputs
2025-10-16 14:36:29 +08:00
Lucas
a5063b96c8 [XPU] moe support VL 0-dim input (#4408) 2025-10-16 14:01:01 +08:00
gaoziyuan
fd5dd1a0f1 [Bugfix]fix ep clear buffer perf (#4389)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix

* Update fused_moe_backend_base.py
2025-10-16 13:05:39 +08:00
chenjian
670aaa3f83 [Bug fix] Fix pd for x1 thinking (#4433) 2025-10-16 12:03:45 +08:00
ddchenhao66
8e392f0ea6 [XPU] support prefix cache (#4423)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-16 11:27:41 +08:00
ltd0924
5bde20b0c9 [BugFix] fix config bugs (#4370)
* Update expert_service.py

* Update common_engine.py

* Update expert_service.py

* Update expert_service.py

* Update expert_service.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-10-16 10:25:21 +08:00
Zhang Yulong
7f94f063ff Update benchmark_serving.py (#4438)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
丢弃的请求依旧保存,用于结果分析
2025-10-15 20:36:19 +08:00
SunLei
b4b579a7ed Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344)
* feat: add OpenAIServing

* feat: add ZmqOpenAIServing & OpenAIServingEmbedding

* feat: Refine the basic ServingEngine class and introduce ServingContext

* fix: codestyle

* fix: request

* fix: pooling_params

* feat: _process_chat_template_kwargs

* feat: support batch request

* feat: pooling_params verify & default parameters

---------

Co-authored-by: sunlei1024 <sunlei1024@example.com>
2025-10-15 19:42:59 +08:00
freeliuzc
744287e1a9 fix param (#4419) 2025-10-15 18:44:24 +08:00
ltd0924
fbdb056de0 [BUGFIX] clear request #4286 (#4402)
Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-10-15 17:43:28 +08:00