Commit Graph

326 Commits

Author SHA1 Message Date
lizhenyun01
4d2f478d53 [BugFix] fix TPDP mix parallel infer (#4583)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-10-28 16:58:20 +08:00
freeliuzc
c63361fd1d [Speculative Decoding][MTP]Support mtp in epdptp mode (#4614)
* support mtp many features

* support mtp reshard in rl mode

* fix function

* support mtp ep

* support mtp in hybird-dp-tp mode

* default open scheduler_v1 in mtp
2025-10-28 16:02:47 +08:00
ming1753
7681375a19 [BugFix] PaddleOCR-VL fix FD_DEBUG type and support v1 loader (#4605)
* [Bug Fix] PaddleOCRVL fix FD_DEBUG type and support HF model

* fix bug

* fix bug

* fix bug
2025-10-28 09:47:47 +08:00
周周周
3729e910a6 remove dev sync in prefill (#4598) 2025-10-27 19:54:43 +08:00
RAM
25a983ba9c 1.fix the bug of draft model with ep 2.fix sampler bug (#4589) 2025-10-27 17:47:34 +08:00
chen
5c63a089f6 [Feature] Support logprobs_mode (#4567) 2025-10-27 14:27:48 +08:00
CSWYF3634076
acd331780c [V1 loader] Qwen25 VL support v1 loader and torch style safetensors load (#4388)
* [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix

* [Docs]offine infer add apply_chat_template add_generation_prompt parameter

* [Model]qwen2.5VL support --use-cudagraph

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v2

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v3

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v4

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v5

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v6

* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v7

* qwen25vl v1 loader

* qwen25vl v1 loader v2

* qwen25vl v1 loader v3

* qwen25vl v1 loader fix tp2 weight PySafeSlice

* qwen25vl v1 loader no test

* qwen25vl v1 loader add unit test

* qwen25vl v1 loader add unit test v2

* qwen25vl v1 loader add torch unit test v3

* qwen25vl v1 loader add torch unit test v4

* qwen25vl v1 loader add torch unit test v5

* qwen25vl v1 loader add torch unit test v6
2025-10-27 10:54:15 +08:00
Lucas
5c6105f4a2 [XPU] bind some OPs for VL model with pybind (#4522) 2025-10-27 10:50:08 +08:00
ming1753
e4e3cede7f [Feature] Support Paddle-OCR (#4396)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* init

* update code

* fix code style & disable thinking

* adapt for common_engine.update_mm_requests_chunk_size

* use 3d rope

* use flash_attn_unpadded

* opt siglip

* update to be compatible with the latest codebase

* fix typo

* optim OCR performance

* fix bug

* fix bug

* fix bug

* fix bug

* normlize name

* modify xpu rope

* revert logger

* fix bug

* fix bug

* fix bug

* support default_v1

* optim performance

* fix bug

---------

Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com>
Co-authored-by: zhangyue66 <zhangyue66@baidu.com>
2025-10-24 23:34:30 +08:00
yyssys
822dea8d5f [XPU]Moe uses a new operator (#4585)
* [XPU]Moe uses a new operator

* [XPU]Moe uses a new operator

* update response
2025-10-24 23:01:46 +08:00
Ryan
f42ed6d5f2 [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578)
* add new branch for sot

* reorder

* fix batch bug
2025-10-24 18:36:52 +08:00
JYChen
83d45af1f3 fix import image_ops error on some platforms (#4559) 2025-10-24 16:09:20 +08:00
xiaozude
f7069b8057 [Metax] adapt DeepSeek (#4498) 2025-10-24 10:14:53 +08:00
zhupengyang
3a43dbf82d [XPU] merge apply_tp, ops support token_num = 0 (#4507) 2025-10-23 19:09:58 +08:00
Sunny-bot1
4ffe41a747 WINT4/WINT8 dense gemm default use Machete (#4451) 2025-10-23 17:57:59 +08:00
YuanRisheng
ac4f5ca272 delete useless code (#4544)
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-23 13:40:34 +08:00
yzwu
dc7facaa7f [Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph (#4504) 2025-10-21 21:41:41 +08:00
RAM
775edcc09a [Executor] Default use CUDAGraph (#3594)
* add start intercept

* Adjustment GraphOptConfig

* pre-commit

* default use cudagraph

* set default value

* default use cuda graph

* pre-commit

* fix test case bug

* disable rl

* fix moba attention

* only support gpu

* Temporarily disable PD Disaggregation

* set max_num_seqs of test case as 1

* set max_num_seqs and temperature

* fix max_num_batched_tokens bug

* close cuda graph

* success run wint2

* profile run with max_num_batched_tokens

* 1.add c++ memchecker 2.success run wint2

* updatee a800 yaml

* update docs

* 1. delete check 2. fix plas attn test case

* default use use_unique_memory_pool

* add try-except for warmup

* ban mtp, mm, rl

* fix test case mock

* fix ci bug

* fix form_model_get_output_topp0 bug

* fix ci bug

* refine deepseek ci

* refine code

* Disable PD

* fix sot yaml
2025-10-21 14:25:45 +08:00
gaoziyuan
d85ef5352a 【BugFix】fix ep buffer clear (#4450)
* fix

* fix
2025-10-21 10:56:00 +08:00
Yuanle Liu
cef3164c3b Optimizing the performance of think length limit using custom operators (#4279)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl

* delete min_length&max_length

* support limit thinking content strategy

* fix

* fix

* fix

* update

* fix set_value_by_flags_and_idx

* fix

* fix

* fix

* fix

* update

* fix

* fix

* fix typo

* fix ci

* fix

* fix

* support mtp

* fix

* fix

* update

* update
2025-10-20 21:09:13 +08:00
yinwei
bf03b6fcea fix vl bug (#4485) 2025-10-20 20:13:34 +08:00
yyssys
97ee3c403a [XPU]Fix w4a8 garbled code issue (#4493) 2025-10-20 19:41:11 +08:00
bukejiyu
de2eaf4f81 add qwen-2.5-7B-PRM/ernie-rm (#4319) 2025-10-20 15:31:03 +08:00
GoldPancake
47595a2480 [Feature] support mtp logprob (#4464)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob

* fix unitest
2025-10-20 15:18:12 +08:00
Haonan Luo
1b9f351d21 Support GPT-OSS-BF16 (#4240)
* [Feature] AppendAtten support sinks & HEAD_DIM=64

* fix bug

* fix bug

* fix bug

* fix bug

* [Feature] support gpt-oss

* fix bug

* add mask

* support-gpt-oss

* support-gpt-oss

* fix long seq

* support wint8

* support wint8

* support wint8

* update test

* change sliding windows init pos

---------

Co-authored-by: ming1753 <ideaminghp@163.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
2025-10-20 14:44:58 +08:00
SuperNova
80a16c4c87 [fix] adjust mctlass moe api (#4474) 2025-10-20 14:23:54 +08:00
yinwei
a64c0408b9 [XPU]Fix w4a8 precision bug && rollback moe algo (#4463)
* fix w4a8 precision bug

* add env

* code stype check
2025-10-17 18:27:53 +08:00
chen
63ef593450 check paddle version for v1 loader (#4473) 2025-10-17 17:25:03 +08:00
yzwu
4b661512ca [Iluvatar GPU] Adapt VL model (#4313) 2025-10-17 16:13:38 +08:00
Ayakouji
a3e0a15495 fix seqlen sync (#4442) 2025-10-17 14:37:52 +08:00
lizexu123
c234b995ab [Feature] support pooling model dummy_run (#4345)
* support qwen3-embedding

* fix ci bug

* support pooling dummy_run

* fix

* delete print

* parallel_config.max_model_len

* delete is_pooling_model in dummy_run

* fix

* fd_model

* fix embedding load

* fix

* fix post_process
2025-10-17 13:30:55 +08:00
chen
b134e6afe6 [BugFix]Dev fix custom ar unstable result (#4437) 2025-10-17 11:47:16 +08:00
Ryan
6160145f82 [SOT] Change warnings to errors and remove fallback operations (#4378)
* Change warnings to errors and remove fallback operations

* fix unitest

* fix codestyle
2025-10-17 11:27:04 +08:00
Sunny-bot1
930f7b781c [Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443)
* get block in cuda graph

* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c [SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694)
* rm inplace info && to(gpu)

* update append_attention

* unpin paddle version

* add full_cuda_graph=False

* add blank line

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
2025-10-17 10:57:55 +08:00
YuanRisheng
a37c9416ac [FDConfig]Remove reasoning_parser/guided_decoding_backend/disable_any_whitespace/device_ids in FDConfig (#4362)
* remove devices id

* fix unittest

* fix ce

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-17 10:40:59 +08:00
chen
db82e9a022 [BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix group_topk renormalized=True

* check test
2025-10-16 23:17:48 +08:00
YuanRisheng
0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400)
* delete some attr in parallel config

* delete comment

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-16 20:00:37 +08:00
Ryan
b87e2c6184 [CUDAGraph]Add support for custom all-reduce operators under SOT mode (#4386) 2025-10-16 19:31:19 +08:00
zhupengyang
26ff2f8683 [XPU] refine fused moe (#4219) 2025-10-16 19:04:07 +08:00
Jianyu Li
3bbe99eae7 [Intel HPU] Enable dist sampler on intel hpu platform (#4445) 2025-10-16 19:02:27 +08:00
SunLei
5abf59715d perf: optimize ZMQ communication with async queue and single-threaded… (#4444)
* perf: optimize ZMQ communication with async queue and single-threaded model

* perf: _async_output_busy_loop

* fix: async_output_queue init
2025-10-16 15:46:26 +08:00
Lucas
a5063b96c8 [XPU] moe support VL 0-dim input (#4408) 2025-10-16 14:01:01 +08:00
gaoziyuan
fd5dd1a0f1 [Bugfix]fix ep clear buffer perf (#4389)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix

* Update fused_moe_backend_base.py
2025-10-16 13:05:39 +08:00
chenjian
670aaa3f83 [Bug fix] Fix pd for x1 thinking (#4433) 2025-10-16 12:03:45 +08:00
SunLei
b4b579a7ed Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344)
* feat: add OpenAIServing

* feat: add ZmqOpenAIServing & OpenAIServingEmbedding

* feat: Refine the basic ServingEngine class and introduce ServingContext

* fix: codestyle

* fix: request

* fix: pooling_params

* feat: _process_chat_template_kwargs

* feat: support batch request

* feat: pooling_params verify & default parameters

---------

Co-authored-by: sunlei1024 <sunlei1024@example.com>
2025-10-15 19:42:59 +08:00
Lucas
bdc0207277 [XPU] fix VL multi-batch accuracy issue (#4394) 2025-10-15 17:27:43 +08:00
bukejiyu
bcaa98ff9c V1 loader default (#4251)
* v1 laoder

* update

* update
2025-10-15 16:49:17 +08:00
chen
4efd073a41 fix block_wise_fp8_v1_loader_moe_shape (#4384) 2025-10-15 14:08:53 +08:00
freeliuzc
582aebd48b [MTP]support mtp chunk_prefill_v1 (#4366)
* support mtp chunk_prefill_v1

* fix mtp chunkprefill output, fix unit test

* fix unit test

* fix save_output
2025-10-15 13:21:32 +08:00