Commit Graph

4071 Commits

Author SHA1 Message Date
Yonghua Li
35846909c7 [fix] fix scheduler hang when input length is very close to max_model_len (#5393)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-05 18:23:42 +08:00
Ayakouji
a8f8791668 [Optimization] Qwen2.5-VL support multi-batch prefill (#5269)
* update

* fix

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix dict access

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-05 18:22:39 +08:00
Lucas
8f2b85362d [XPU] support moe_expert_ffn TGEMM selection (#5375) 2025-12-05 17:49:40 +08:00
Lucas
3aed8d257d [XPU] redirect xvllm/xtdk/xhpc downloading log (#5388) 2025-12-05 17:34:17 +08:00
周周周
c83dc58105 [Feature] support Two batch overlap, mainly used in Prefill (#5078) 2025-12-05 14:58:50 +08:00
qwes5s5
1aefbef0b3 fix trace log (#5386) 2025-12-05 14:45:52 +08:00
lizhenyun01
d436640735 [BugFix] Fix flash_attn_backend 2025-12-05 14:33:38 +08:00
cmcamdy
86b6430582 fix split_rope_cache_kv_encoder in mix mtp (#5384) 2025-12-05 14:33:17 +08:00
Jiaxin Sui
b5a7abe624 [XPU] [CI] Change Paddle Version to Nightly (#5346)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command
2025-12-05 13:01:29 +08:00
fmiao2372
ebe613ccc8 [Intel HPU] fix bug about RP 5138 (#5380) 2025-12-05 11:33:29 +08:00
Lucas
7b0b6e470a [XPU] support XDNN downloading function (#5365) 2025-12-05 11:16:45 +08:00
ming1753
dd2e9a14c7 [BugFix] Compatible with asynchronous functions (#5378)
* [BugFix] fix data_processor asyn bug

* fix bug
2025-12-05 11:05:21 +08:00
zccjjj
e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
bukejiyu
620d1da1c9 deepseek torch (#5373)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-04 23:26:53 +08:00
YuBaoku
1b5fd79d6b [CI] disable test_schedule_output.py in unit_test (#5377) 2025-12-04 23:18:23 +08:00
Juncai
7f4fff4d1e fix get_request from scheduler (#5369) 2025-12-04 21:59:10 +08:00
chenjian
3878a99b69 [Fearture] Support cache kv cache for output tokens (#4535)
* [Fearture] Support cache kv cache for output tokens

* fix bug

* fix ci bug

* improve coverage

* enable output caching by default

* fix ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-04 20:53:08 +08:00
Yonghua Li
b6f8069b36 [fix] update check_model_weights_status loop (#5249) 2025-12-04 19:43:01 +08:00
Yuanle Liu
41c63f6056 remove fastsafetensors (#5371) 2025-12-04 19:22:04 +08:00
xiegegege
b7e1e6c953 [CE]change yaml name 2025-12-04 19:14:11 +08:00
Nyakku Shigure
f88c159de1 [BugFix] Exit if neither modern nor legacy wheel dir not found (#5367)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 16:45:48 +08:00
SunLei
3697110599 [Docs] update FAQ with logprobs MQ limits and deprecation (#5368)
* [doc] update FAQ with logprobs MQ limits and deprecation

* [doc] update FAQ with logprobs MQ limits and deprecation

* update faq
2025-12-04 15:57:04 +08:00
Yonghua Li
f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 15:38:43 +08:00
Longzhi Wang
5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
2025-12-04 13:26:58 +08:00
Juncai
f5bdb36e9b Reduce timeout in unittest (#5366) 2025-12-04 13:19:02 +08:00
fmiao2372
209006e6a6 [Intel HPU] fix memory fragmentation issue due to warmup process and fix moe all_reduce issue (#5357) 2025-12-04 11:29:41 +08:00
lizexu123
946025480e [Bug fix] fix pooling models (#5358)
* fix

* fix

* fix test

* fix gpu_model_runner

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-04 11:06:30 +08:00
qwes5s5
a52aea073c fix logprobs (#5335) 2025-12-04 10:38:51 +08:00
Echo-Nie
96ff402d44 [Optimization] Remove version constraints for setuptools, uvicorn, triton and safetensors, del fastsafetensors (#5330)
* Remove version constraints for setuptools, triton, and fastsafetensors.

* remove version for uvicorn

* fix according to review
2025-12-04 10:07:31 +08:00
Yuanle Liu
be0c960260 [BugFix] dynamic cache kv block_wise_fp8 not need create layer.cache_k_scale (#5362)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-03 05:32:59 -08:00
周周周
a36d60aa18 [FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350)
* commit

* commit

* commit

* commit

* commit

* commit
2025-12-03 05:17:37 -08:00
ming1753
5f8d4aedea [Feature] support audio tts (#5333) 2025-12-03 21:06:48 +08:00
Daci
83dbc4e5dd [Feature] Guided Decoding add LLguidance backend (#5124)
* llguidance

* add requirements_guided_decoding.txt and doc

* fix test_guidance_*.py

* fix test_guidance_*.py && mv

* fix llguidance choice

* test_guidance_*

* rm lazy loader

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 20:23:57 +08:00
ddchenhao66
4e8096bd0d [XPU] xpu support mm prefix cache (#5356)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-03 19:07:34 +08:00
xiaolei373
a4bb3e9960 [bugfix]remove metrics middleware (#5332)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-03 17:07:45 +08:00
lzy
f458cc5ba4 [Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353)
* [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM

* fix test_chunked_moe

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 16:42:10 +08:00
tianlef
04d35ace5e [CE]add wint4 ep (#5355) 2025-12-03 15:17:47 +08:00
Sunny-bot1
d5a9b75b4e fix cutlass ep (#5337) 2025-12-03 14:06:01 +08:00
lzy
690bcb8e50 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315) 2025-12-03 13:33:15 +08:00
Longzhi Wang
f6544c0b1b [CI] Add RD in env CI. (#5345)
* test

* [CI] modify env ci(add RD)

* test done
2025-12-03 13:18:17 +08:00
lzy
c71a44c7e5 supports mtp split_kv_attn (#5343) 2025-12-03 12:40:16 +08:00
YuBaoku
dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
Jiang-Jia-Jun
0eb799a324 Update installation requirements for Kunlunxin XPU 2025-12-03 10:04:29 +08:00
Jiang-Jia-Jun
335ae0f4a4 Update installation requirements for Kunlunxin XPU 2025-12-03 10:04:17 +08:00
Longzhi Wang
21f138f68b [CI] Add env ci (#5331)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* test

* [CI] Add env ci

* test donw
2025-12-02 19:31:25 +08:00
YuBaoku
3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) 2025-12-02 18:57:29 +08:00
Sunny-bot1
3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
fmiao2372
429dd2b1db [Intel HPU] add example benchmark scripts for hpu (#5304)
* [Intel HPU] add example benchmark scripts for hpu

* Revise the code based on the copilot comments

* update code based on comments

* update ci ops version
2025-12-02 18:00:01 +08:00
周周周
fb7f951612 [UNITEST] add test (#5305) 2025-12-02 17:59:01 +08:00
Jiaxin Sui
8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252)
* add xpu ci

* add case

* add case

* fix ci bug

* Update Docker image tag to 'latest' in CI workflow

* Fix set -e usage in run_xpu_ci_pytest.sh

* add pd case

* add case

* Configure pip to use Tsinghua mirror for dependencies

Set the global pip index URL to Tsinghua mirror.

* fix ci bug

* fix bug

* fix bug

---------

Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-02 17:15:51 +08:00