FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
Lucas	0a0c74e717	[XPU] Support PaddleOCR-VL model for XPU (#4529 ) * [XPU] support PaddleOCR-VL in XPU * [XPU] fix PaddleOCR-VL pos_emb_type	2025-10-28 20:35:04 +08:00
freeliuzc	c63361fd1d	[Speculative Decoding][MTP]Support mtp in epdptp mode (#4614 ) * support mtp many features * support mtp reshard in rl mode * fix function * support mtp ep * support mtp in hybird-dp-tp mode * default open scheduler_v1 in mtp	2025-10-28 16:02:47 +08:00
周周周	3729e910a6	remove dev sync in prefill (#4598 )	2025-10-27 19:54:43 +08:00
RAM	25a983ba9c	1.fix the bug of draft model with ep 2.fix sampler bug (#4589 )	2025-10-27 17:47:34 +08:00
chen	5c63a089f6	[Feature] Support logprobs_mode (#4567 )	2025-10-27 14:27:48 +08:00
Lucas	5c6105f4a2	[XPU] bind some OPs for VL model with pybind (#4522 )	2025-10-27 10:50:08 +08:00
ming1753	e4e3cede7f	[Feature] Support Paddle-OCR (#4396 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * init * update code * fix code style & disable thinking * adapt for common_engine.update_mm_requests_chunk_size * use 3d rope * use flash_attn_unpadded * opt siglip * update to be compatible with the latest codebase * fix typo * optim OCR performance * fix bug * fix bug * fix bug * fix bug * normlize name * modify xpu rope * revert logger * fix bug * fix bug * fix bug * support default_v1 * optim performance * fix bug --------- Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com> Co-authored-by: zhangyue66 <zhangyue66@baidu.com>	2025-10-24 23:34:30 +08:00
yyssys	822dea8d5f	[XPU]Moe uses a new operator (#4585 ) * [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response	2025-10-24 23:01:46 +08:00
xiaozude	f7069b8057	[Metax] adapt DeepSeek (#4498 )	2025-10-24 10:14:53 +08:00
zhupengyang	3a43dbf82d	[XPU] merge apply_tp, ops support token_num = 0 (#4507 )	2025-10-23 19:09:58 +08:00
Sunny-bot1	4ffe41a747	WINT4/WINT8 dense gemm default use Machete (#4451 )	2025-10-23 17:57:59 +08:00
gaoziyuan	d85ef5352a	【BugFix】fix ep buffer clear (#4450 ) * fix * fix	2025-10-21 10:56:00 +08:00
yinwei	bf03b6fcea	fix vl bug (#4485 )	2025-10-20 20:13:34 +08:00
yyssys	97ee3c403a	[XPU]Fix w4a8 garbled code issue (#4493 )	2025-10-20 19:41:11 +08:00
bukejiyu	de2eaf4f81	add qwen-2.5-7B-PRM/ernie-rm (#4319 )	2025-10-20 15:31:03 +08:00
GoldPancake	47595a2480	[Feature] support mtp logprob (#4464 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mtp logprob * fix unitest	2025-10-20 15:18:12 +08:00
Haonan Luo	1b9f351d21	Support GPT-OSS-BF16 (#4240 ) * [Feature] AppendAtten support sinks & HEAD_DIM=64 * fix bug * fix bug * fix bug * fix bug * [Feature] support gpt-oss * fix bug * add mask * support-gpt-oss * support-gpt-oss * fix long seq * support wint8 * support wint8 * support wint8 * update test * change sliding windows init pos --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-10-20 14:44:58 +08:00
SuperNova	80a16c4c87	[fix] adjust mctlass moe api (#4474 )	2025-10-20 14:23:54 +08:00
yinwei	a64c0408b9	[XPU]Fix w4a8 precision bug && rollback moe algo (#4463 ) * fix w4a8 precision bug * add env * code stype check	2025-10-17 18:27:53 +08:00
yzwu	4b661512ca	[Iluvatar GPU] Adapt VL model (#4313 )	2025-10-17 16:13:38 +08:00
lizexu123	c234b995ab	[Feature] support pooling model dummy_run (#4345 ) * support qwen3-embedding * fix ci bug * support pooling dummy_run * fix * delete print * parallel_config.max_model_len * delete is_pooling_model in dummy_run * fix * fd_model * fix embedding load * fix * fix post_process	2025-10-17 13:30:55 +08:00
chen	b134e6afe6	[BugFix]Dev fix custom ar unstable result (#4437 )	2025-10-17 11:47:16 +08:00
Sunny-bot1	930f7b781c	[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443 ) * get block in cuda graph * fix sot	2025-10-17 10:59:56 +08:00
Ryan	49cea8fb1c	[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694 ) * rm inplace info && to(gpu) * update append_attention * unpin paddle version * add full_cuda_graph=False * add blank line --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>	2025-10-17 10:57:55 +08:00
chen	db82e9a022	[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix group_topk renormalized=True * check test	2025-10-16 23:17:48 +08:00
YuanRisheng	0355235fb9	[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400 ) * delete some attr in parallel config * delete comment --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-16 20:00:37 +08:00
zhupengyang	26ff2f8683	[XPU] refine fused moe (#4219 )	2025-10-16 19:04:07 +08:00
Jianyu Li	3bbe99eae7	[Intel HPU] Enable dist sampler on intel hpu platform (#4445 )	2025-10-16 19:02:27 +08:00
Lucas	a5063b96c8	[XPU] moe support VL 0-dim input (#4408 )	2025-10-16 14:01:01 +08:00
gaoziyuan	fd5dd1a0f1	[Bugfix]fix ep clear buffer perf (#4389 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix * Update fused_moe_backend_base.py	2025-10-16 13:05:39 +08:00
SunLei	b4b579a7ed	Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344 ) * feat: add OpenAIServing * feat: add ZmqOpenAIServing & OpenAIServingEmbedding * feat: Refine the basic ServingEngine class and introduce ServingContext * fix: codestyle * fix: request * fix: pooling_params * feat: _process_chat_template_kwargs * feat: support batch request * feat: pooling_params verify & default parameters --------- Co-authored-by: sunlei1024 <sunlei1024@example.com>	2025-10-15 19:42:59 +08:00
Lucas	bdc0207277	[XPU] fix VL multi-batch accuracy issue (#4394 )	2025-10-15 17:27:43 +08:00
bukejiyu	bcaa98ff9c	V1 loader default (#4251 ) * v1 laoder * update * update	2025-10-15 16:49:17 +08:00
chen	4efd073a41	fix block_wise_fp8_v1_loader_moe_shape (#4384 )	2025-10-15 14:08:53 +08:00
zhupengyang	d6f775e33b	[XPU] fix ep (#4393 )	2025-10-15 11:41:05 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
YuanRisheng	a2ec2c4152	[FDConfig]Remove max_model_len in FDConfig (#4350 ) * modify max_model_len * fix unittest * fix unittest --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-11 14:04:17 +08:00
yinwei	20c7b741f4	[XPU] Support W4A8C8-TP4-300B Model (#4068 ) * support w4a8 * delete ep block attn * delete moe_topk_select * update note * update * delte useless info * update * add some note * fix some format * update scale info * add ans baseline --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-10-10 15:41:32 +08:00
RAM	aa27b03bc0	[Executor]CUDAGraph support Speculate Decode (#3769 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * success run ngram * Revert "[Code Simplification] remove cum_offsets (#3410)" This reverts commit `32b39620bc`. * success run ngram5 tp4 42bs * success run ngram5 tp4 42bs * mtp draft commit * add decorator for target model * enable draft model in cudagraph v0.5 * revert revrt cum_offset * enable target model in cudagraph v0.9 And clean debug code * Revert "success run ngram" This reverts commit `8351e83993`. * add reverted code * enable target model in cudagraph v0.9 * solve comment * fix bid < 0 * Enable Target Model Padding And Draft Model in cudagraph * solve problem * delete rebuild padding debug note * fast compile * Add capture list for mtp * success run 256 tp1 mtp * Enable Lite TP2 Bsz256 * realy enable tp2 bsz 256 * fix problem * Solve problem for Draft model in cudagraph * Solve comment * replace emptytensor as zeros * Solve comments * Revert "fast compile" This reverts commit `834639a7ff`. * fix bug * fix merge bug * fix typo * fix bug --------- Co-authored-by: lizexu <2694294196@qq.com> Co-authored-by: littledgg <1658565283@qq.com> Co-authored-by: zeroRains <linjunlu@zerorains.top> Co-authored-by: gongshaotian <gstain5555@outlook.com>	2025-10-09 21:18:29 +08:00
xiaozude	7c919070f7	[Metax] support cutlass moe & optimize flash attention (#4208 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-09-29 11:22:43 +08:00
李泳桦	6265f4385f	[feat] support prefix cache clearing when `/clear_load_weight` is called (#4008 ) * [feat] support clearing prefix cache (cherry-picked from release/2.1) * [fix] fix ipc suffix, use port instead * [fix] fix prefix caching not enabled * [fix] fix key/value_cache_scales indent * [fix] fix ep group all-reduce * [fix] fix clear/update lock not working when workers > 1 * [chore] add preemption triggered info log * [fix] fix code style * [fix] fix max_num_seqs config * [fix] do not force enable_prefix_caching=False in dynamic loading * [fix] fix ci * Revert "[fix] fix ci" This reverts commit `0bc6d55cc8`. * [fix] initialize available_gpu_block_num with max_gpu_block_num * [fix] fix config splitwise_role * [fix] fix clearing caches synchronization and add more logs * [chore] print cache_ready_signal in log * [fix] fix scheduler_config.splitwise_role * [fix] fix cache_messager cache_ready_signal create=True * [fix] stop cache messager from launching in mixed deployment	2025-09-28 19:42:53 +08:00
Sunny-bot1	aa1cc09c5b	fix machete pre quant (#4295 )	2025-09-28 16:11:09 +08:00
Lucas	87179cb744	[XPU] support XPU VL model inference (#4030 ) * [XPU] support XPU VL model inference * fix image op import and device check * rebase develop * fix perf	2025-09-25 14:34:15 +08:00
chen	7c1fd19f0f	[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )	2025-09-24 16:39:51 +08:00
lizexu123	e8318b7477	[BugFix] fix qwen3-embedding model tp>1 (#4223 ) * support qwen3-embedding * fix ci bug * fix * fix ci bug * fix ci bug * fix * fix qwen3-embedding * fix * fix * fix	2025-09-24 14:13:26 +08:00
chen	3161014e49	[BugFix]fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param (#4229 ) * fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param * include_stop_str_in_output=False not return eos text	2025-09-24 14:12:05 +08:00
fmiao2372	f1b5392e20	[Intel HPU] Support intel hpu platform (#4161 ) * [Intel HPU] Support intel hpu platform * fix some issues * apply precommit and move AttentionBackend_HPU * fix format issue * correct ops import * fix ci issue * update code in layers * fix code style issue * remove dense tp moe ep mode * fix enc_dec_block_num * fix rebase issue * rename hpu to gaudi in readme * rename ForwardMeta_HPU to HPUForwardMeta	2025-09-24 12:27:50 +08:00
chen	1a6283424e	Fix noaux_tc cuda Error 700 in CUDAGraph (#4174 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-09-23 18:41:33 +08:00
lizexu123	c96a535a5d	[Feature] support qwen3-embedding model load (#4202 ) * support qwen3-embedding * fix ci bug * fix * fix ci bug * fix ci bug * fix	2025-09-23 00:14:35 -07:00
yangjianfengo1	4325b737e7	【FIX】Change the name of sparse attn from moba to plas (#4006 ) (#4076 ) * 【FIX】Change the name of sparse attn from moba to plas (#4006) * 更新文档 * 【docs】 update readme (#4000) * 更新文档 * update readme * update docs * 【FIX】Change the name of sparse attn from moba to plas (#3845) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci * code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * fix max_num_seqs * fix test load attn --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-23 10:26:40 +08:00

1 2 3 4 5

232 Commits