Commit Graph

238 Commits

Author SHA1 Message Date
chen
4e392e8337 [BugFix]fix v1 loader lm head fp32 (#5270) (#5287)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-11-28 17:52:25 +08:00
ming1753
38f6e6c7c6 [BugFix] fix triton fp8 bug (#4967)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [BugFix] fix triton fp8 bug

* fix bug
2025-11-12 19:28:09 +08:00
ming1753
197a0f7af4 [BugFix] fix VL fp8 bug when moe token_num is 0 (#4929)
* [BugFix] fix VL fp8 bug when moe token_num is 0

* fix bug
2025-11-10 21:16:10 +08:00
yinwei
0df488c7bb support wint8 & wint4 (#4837)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-11-06 10:54:34 +08:00
yinwei
b4aa189483 [XPU] Support V1 Loader in Bf16 (#4746)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add v1 support for bf16

* update

* update

* update

* update

* update

* update code
2025-11-01 16:13:25 +08:00
李泳桦
a012e3608b [Feature] support logits processors (#4515)
* [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor

* [chore] fix code style

* [fix] add unit test & fix existing bugs

* [feat] add engine/worker arg --logits-processors

* [fix] redefine user args as logits_processors_args and fix some bugs

* [fix] fix test_sampler

* Update fastdeploy/model_executor/logits_processor/builtin.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/model_executor/logits_processor/__init__.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/model_executor/test_logits_processor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [fix] fix typo

* Update fastdeploy/engine/sampling_params.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [fix] fix bracelet

* [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits

* [doc] add docs

* [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests

* [fix] fix redundant code

* [feat] skip apply() if no bias is specified

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-29 00:08:53 +08:00
Lucas
0a0c74e717 [XPU] Support PaddleOCR-VL model for XPU (#4529)
* [XPU] support PaddleOCR-VL in XPU

* [XPU] fix PaddleOCR-VL pos_emb_type
2025-10-28 20:35:04 +08:00
freeliuzc
c63361fd1d [Speculative Decoding][MTP]Support mtp in epdptp mode (#4614)
* support mtp many features

* support mtp reshard in rl mode

* fix function

* support mtp ep

* support mtp in hybird-dp-tp mode

* default open scheduler_v1 in mtp
2025-10-28 16:02:47 +08:00
周周周
3729e910a6 remove dev sync in prefill (#4598) 2025-10-27 19:54:43 +08:00
RAM
25a983ba9c 1.fix the bug of draft model with ep 2.fix sampler bug (#4589) 2025-10-27 17:47:34 +08:00
chen
5c63a089f6 [Feature] Support logprobs_mode (#4567) 2025-10-27 14:27:48 +08:00
Lucas
5c6105f4a2 [XPU] bind some OPs for VL model with pybind (#4522) 2025-10-27 10:50:08 +08:00
ming1753
e4e3cede7f [Feature] Support Paddle-OCR (#4396)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* init

* update code

* fix code style & disable thinking

* adapt for common_engine.update_mm_requests_chunk_size

* use 3d rope

* use flash_attn_unpadded

* opt siglip

* update to be compatible with the latest codebase

* fix typo

* optim OCR performance

* fix bug

* fix bug

* fix bug

* fix bug

* normlize name

* modify xpu rope

* revert logger

* fix bug

* fix bug

* fix bug

* support default_v1

* optim performance

* fix bug

---------

Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com>
Co-authored-by: zhangyue66 <zhangyue66@baidu.com>
2025-10-24 23:34:30 +08:00
yyssys
822dea8d5f [XPU]Moe uses a new operator (#4585)
* [XPU]Moe uses a new operator

* [XPU]Moe uses a new operator

* update response
2025-10-24 23:01:46 +08:00
xiaozude
f7069b8057 [Metax] adapt DeepSeek (#4498) 2025-10-24 10:14:53 +08:00
zhupengyang
3a43dbf82d [XPU] merge apply_tp, ops support token_num = 0 (#4507) 2025-10-23 19:09:58 +08:00
Sunny-bot1
4ffe41a747 WINT4/WINT8 dense gemm default use Machete (#4451) 2025-10-23 17:57:59 +08:00
gaoziyuan
d85ef5352a 【BugFix】fix ep buffer clear (#4450)
* fix

* fix
2025-10-21 10:56:00 +08:00
yinwei
bf03b6fcea fix vl bug (#4485) 2025-10-20 20:13:34 +08:00
yyssys
97ee3c403a [XPU]Fix w4a8 garbled code issue (#4493) 2025-10-20 19:41:11 +08:00
bukejiyu
de2eaf4f81 add qwen-2.5-7B-PRM/ernie-rm (#4319) 2025-10-20 15:31:03 +08:00
GoldPancake
47595a2480 [Feature] support mtp logprob (#4464)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob

* fix unitest
2025-10-20 15:18:12 +08:00
Haonan Luo
1b9f351d21 Support GPT-OSS-BF16 (#4240)
* [Feature] AppendAtten support sinks & HEAD_DIM=64

* fix bug

* fix bug

* fix bug

* fix bug

* [Feature] support gpt-oss

* fix bug

* add mask

* support-gpt-oss

* support-gpt-oss

* fix long seq

* support wint8

* support wint8

* support wint8

* update test

* change sliding windows init pos

---------

Co-authored-by: ming1753 <ideaminghp@163.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
2025-10-20 14:44:58 +08:00
SuperNova
80a16c4c87 [fix] adjust mctlass moe api (#4474) 2025-10-20 14:23:54 +08:00
yinwei
a64c0408b9 [XPU]Fix w4a8 precision bug && rollback moe algo (#4463)
* fix w4a8 precision bug

* add env

* code stype check
2025-10-17 18:27:53 +08:00
yzwu
4b661512ca [Iluvatar GPU] Adapt VL model (#4313) 2025-10-17 16:13:38 +08:00
lizexu123
c234b995ab [Feature] support pooling model dummy_run (#4345)
* support qwen3-embedding

* fix ci bug

* support pooling dummy_run

* fix

* delete print

* parallel_config.max_model_len

* delete is_pooling_model in dummy_run

* fix

* fd_model

* fix embedding load

* fix

* fix post_process
2025-10-17 13:30:55 +08:00
chen
b134e6afe6 [BugFix]Dev fix custom ar unstable result (#4437) 2025-10-17 11:47:16 +08:00
Sunny-bot1
930f7b781c [Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443)
* get block in cuda graph

* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c [SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694)
* rm inplace info && to(gpu)

* update append_attention

* unpin paddle version

* add full_cuda_graph=False

* add blank line

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
2025-10-17 10:57:55 +08:00
chen
db82e9a022 [BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix group_topk renormalized=True

* check test
2025-10-16 23:17:48 +08:00
YuanRisheng
0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400)
* delete some attr in parallel config

* delete comment

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-16 20:00:37 +08:00
zhupengyang
26ff2f8683 [XPU] refine fused moe (#4219) 2025-10-16 19:04:07 +08:00
Jianyu Li
3bbe99eae7 [Intel HPU] Enable dist sampler on intel hpu platform (#4445) 2025-10-16 19:02:27 +08:00
Lucas
a5063b96c8 [XPU] moe support VL 0-dim input (#4408) 2025-10-16 14:01:01 +08:00
gaoziyuan
fd5dd1a0f1 [Bugfix]fix ep clear buffer perf (#4389)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix

* Update fused_moe_backend_base.py
2025-10-16 13:05:39 +08:00
SunLei
b4b579a7ed Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344)
* feat: add OpenAIServing

* feat: add ZmqOpenAIServing & OpenAIServingEmbedding

* feat: Refine the basic ServingEngine class and introduce ServingContext

* fix: codestyle

* fix: request

* fix: pooling_params

* feat: _process_chat_template_kwargs

* feat: support batch request

* feat: pooling_params verify & default parameters

---------

Co-authored-by: sunlei1024 <sunlei1024@example.com>
2025-10-15 19:42:59 +08:00
Lucas
bdc0207277 [XPU] fix VL multi-batch accuracy issue (#4394) 2025-10-15 17:27:43 +08:00
bukejiyu
bcaa98ff9c V1 loader default (#4251)
* v1 laoder

* update

* update
2025-10-15 16:49:17 +08:00
chen
4efd073a41 fix block_wise_fp8_v1_loader_moe_shape (#4384) 2025-10-15 14:08:53 +08:00
zhupengyang
d6f775e33b [XPU] fix ep (#4393) 2025-10-15 11:41:05 +08:00
Sunny-bot1
a751d977bc [Optimization] Fuse get_max_len and get_kv_max_len (#4369)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block

* fuse max_lens and max kv_len
2025-10-13 20:35:00 +08:00
YuanRisheng
a2ec2c4152 [FDConfig]Remove max_model_len in FDConfig (#4350)
* modify max_model_len

* fix unittest

* fix unittest

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-11 14:04:17 +08:00
yinwei
20c7b741f4 [XPU] Support W4A8C8-TP4-300B Model (#4068)
* support w4a8

* delete ep block attn

* delete moe_topk_select

* update note

* update

* delte useless info

* update

* add some note

* fix some format

* update scale info

* add ans baseline

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-10-10 15:41:32 +08:00
RAM
aa27b03bc0 [Executor]CUDAGraph support Speculate Decode (#3769)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram

* Revert "[Code Simplification] remove cum_offsets (#3410)"

This reverts commit 32b39620bc.

* success run ngram5 tp4 42bs

* success run ngram5 tp4 42bs

* mtp draft commit

* add decorator for target model

* enable draft model in cudagraph v0.5

* revert revrt cum_offset

* enable target model in cudagraph v0.9 And clean debug code

* Revert "success run ngram"

This reverts commit 8351e83993.

* add reverted code

* enable target model in cudagraph v0.9

* solve comment

* fix bid < 0

* Enable Target Model Padding And Draft Model in cudagraph

* solve problem

* delete rebuild padding debug note

* fast compile

* Add capture list for mtp

* success run 256 tp1 mtp

* Enable Lite TP2 Bsz256

* realy enable tp2 bsz 256

* fix problem

* Solve problem for Draft model in cudagraph

* Solve comment

* replace emptytensor as zeros

* Solve comments

* Revert "fast compile"

This reverts commit 834639a7ff.

* fix bug

* fix merge bug

* fix typo

* fix bug

---------

Co-authored-by: lizexu <2694294196@qq.com>
Co-authored-by: littledgg <1658565283@qq.com>
Co-authored-by: zeroRains <linjunlu@zerorains.top>
Co-authored-by: gongshaotian <gstain5555@outlook.com>
2025-10-09 21:18:29 +08:00
xiaozude
7c919070f7 [Metax] support cutlass moe & optimize flash attention (#4208)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-29 11:22:43 +08:00
李泳桦
6265f4385f [feat] support prefix cache clearing when /clear_load_weight is called (#4008)
* [feat] support clearing prefix cache (cherry-picked from release/2.1)

* [fix] fix ipc suffix, use port instead

* [fix] fix prefix caching not enabled

* [fix] fix key/value_cache_scales indent

* [fix] fix ep group all-reduce

* [fix] fix clear/update lock not working when workers > 1

* [chore] add preemption triggered info log

* [fix] fix code style

* [fix] fix max_num_seqs config

* [fix] do not force enable_prefix_caching=False in dynamic loading

* [fix] fix ci

* Revert "[fix] fix ci"

This reverts commit 0bc6d55cc8.

* [fix] initialize available_gpu_block_num with max_gpu_block_num

* [fix] fix config splitwise_role

* [fix] fix clearing caches synchronization and add more logs

* [chore] print cache_ready_signal in log

* [fix] fix scheduler_config.splitwise_role

* [fix] fix cache_messager cache_ready_signal create=True

* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
Sunny-bot1
aa1cc09c5b fix machete pre quant (#4295) 2025-09-28 16:11:09 +08:00
Lucas
87179cb744 [XPU] support XPU VL model inference (#4030)
* [XPU] support XPU VL model inference

* fix image op import and device check

* rebase develop

* fix perf
2025-09-25 14:34:15 +08:00
chen
7c1fd19f0f [OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238) 2025-09-24 16:39:51 +08:00