FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
zhupengyang	3a43dbf82d	[XPU] merge apply_tp, ops support token_num = 0 (#4507 )	2025-10-23 19:09:58 +08:00
Sunny-bot1	4ffe41a747	WINT4/WINT8 dense gemm default use Machete (#4451 )	2025-10-23 17:57:59 +08:00
YuanRisheng	ac4f5ca272	delete useless code (#4544 ) Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-23 13:40:34 +08:00
yzwu	dc7facaa7f	[Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph (#4504 )	2025-10-21 21:41:41 +08:00
RAM	775edcc09a	[Executor] Default use CUDAGraph (#3594 ) * add start intercept * Adjustment GraphOptConfig * pre-commit * default use cudagraph * set default value * default use cuda graph * pre-commit * fix test case bug * disable rl * fix moba attention * only support gpu * Temporarily disable PD Disaggregation * set max_num_seqs of test case as 1 * set max_num_seqs and temperature * fix max_num_batched_tokens bug * close cuda graph * success run wint2 * profile run with max_num_batched_tokens * 1.add c++ memchecker 2.success run wint2 * updatee a800 yaml * update docs * 1. delete check 2. fix plas attn test case * default use use_unique_memory_pool * add try-except for warmup * ban mtp, mm, rl * fix test case mock * fix ci bug * fix form_model_get_output_topp0 bug * fix ci bug * refine deepseek ci * refine code * Disable PD * fix sot yaml	2025-10-21 14:25:45 +08:00
gaoziyuan	d85ef5352a	【BugFix】fix ep buffer clear (#4450 ) * fix * fix	2025-10-21 10:56:00 +08:00
Yuanle Liu	cef3164c3b	Optimizing the performance of think length limit using custom operators (#4279 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * delete impl * delete min_length&max_length * support limit thinking content strategy * fix * fix * fix * update * fix set_value_by_flags_and_idx * fix * fix * fix * fix * update * fix * fix * fix typo * fix ci * fix * fix * support mtp * fix * fix * update * update	2025-10-20 21:09:13 +08:00
yinwei	bf03b6fcea	fix vl bug (#4485 )	2025-10-20 20:13:34 +08:00
yyssys	97ee3c403a	[XPU]Fix w4a8 garbled code issue (#4493 )	2025-10-20 19:41:11 +08:00
bukejiyu	de2eaf4f81	add qwen-2.5-7B-PRM/ernie-rm (#4319 )	2025-10-20 15:31:03 +08:00
GoldPancake	47595a2480	[Feature] support mtp logprob (#4464 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mtp logprob * fix unitest	2025-10-20 15:18:12 +08:00
Haonan Luo	1b9f351d21	Support GPT-OSS-BF16 (#4240 ) * [Feature] AppendAtten support sinks & HEAD_DIM=64 * fix bug * fix bug * fix bug * fix bug * [Feature] support gpt-oss * fix bug * add mask * support-gpt-oss * support-gpt-oss * fix long seq * support wint8 * support wint8 * support wint8 * update test * change sliding windows init pos --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-10-20 14:44:58 +08:00
SuperNova	80a16c4c87	[fix] adjust mctlass moe api (#4474 )	2025-10-20 14:23:54 +08:00
yinwei	a64c0408b9	[XPU]Fix w4a8 precision bug && rollback moe algo (#4463 ) * fix w4a8 precision bug * add env * code stype check	2025-10-17 18:27:53 +08:00
chen	63ef593450	check paddle version for v1 loader (#4473 )	2025-10-17 17:25:03 +08:00
yzwu	4b661512ca	[Iluvatar GPU] Adapt VL model (#4313 )	2025-10-17 16:13:38 +08:00
Ayakouji	a3e0a15495	fix seqlen sync (#4442 )	2025-10-17 14:37:52 +08:00
lizexu123	c234b995ab	[Feature] support pooling model dummy_run (#4345 ) * support qwen3-embedding * fix ci bug * support pooling dummy_run * fix * delete print * parallel_config.max_model_len * delete is_pooling_model in dummy_run * fix * fd_model * fix embedding load * fix * fix post_process	2025-10-17 13:30:55 +08:00
chen	b134e6afe6	[BugFix]Dev fix custom ar unstable result (#4437 )	2025-10-17 11:47:16 +08:00
Ryan	6160145f82	[SOT] Change warnings to errors and remove fallback operations (#4378 ) * Change warnings to errors and remove fallback operations * fix unitest * fix codestyle	2025-10-17 11:27:04 +08:00
Sunny-bot1	930f7b781c	[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443 ) * get block in cuda graph * fix sot	2025-10-17 10:59:56 +08:00
Ryan	49cea8fb1c	[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694 ) * rm inplace info && to(gpu) * update append_attention * unpin paddle version * add full_cuda_graph=False * add blank line --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>	2025-10-17 10:57:55 +08:00
YuanRisheng	a37c9416ac	[FDConfig]Remove reasoning_parser/guided_decoding_backend/disable_any_whitespace/device_ids in FDConfig (#4362 ) * remove devices id * fix unittest * fix ce --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-17 10:40:59 +08:00
chen	db82e9a022	[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix group_topk renormalized=True * check test	2025-10-16 23:17:48 +08:00
YuanRisheng	0355235fb9	[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400 ) * delete some attr in parallel config * delete comment --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-16 20:00:37 +08:00
Ryan	b87e2c6184	[CUDAGraph]Add support for custom all-reduce operators under SOT mode (#4386 )	2025-10-16 19:31:19 +08:00
zhupengyang	26ff2f8683	[XPU] refine fused moe (#4219 )	2025-10-16 19:04:07 +08:00
Jianyu Li	3bbe99eae7	[Intel HPU] Enable dist sampler on intel hpu platform (#4445 )	2025-10-16 19:02:27 +08:00
SunLei	5abf59715d	perf: optimize ZMQ communication with async queue and single-threaded… (#4444 ) * perf: optimize ZMQ communication with async queue and single-threaded model * perf: _async_output_busy_loop * fix: async_output_queue init	2025-10-16 15:46:26 +08:00
Lucas	a5063b96c8	[XPU] moe support VL 0-dim input (#4408 )	2025-10-16 14:01:01 +08:00
gaoziyuan	fd5dd1a0f1	[Bugfix]fix ep clear buffer perf (#4389 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix * Update fused_moe_backend_base.py	2025-10-16 13:05:39 +08:00
chenjian	670aaa3f83	[Bug fix] Fix pd for x1 thinking (#4433 )	2025-10-16 12:03:45 +08:00
SunLei	b4b579a7ed	Feature：Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344 ) * feat: add OpenAIServing * feat: add ZmqOpenAIServing & OpenAIServingEmbedding * feat: Refine the basic ServingEngine class and introduce ServingContext * fix: codestyle * fix: request * fix: pooling_params * feat: _process_chat_template_kwargs * feat: support batch request * feat: pooling_params verify & default parameters --------- Co-authored-by: sunlei1024 <sunlei1024@example.com>	2025-10-15 19:42:59 +08:00
Lucas	bdc0207277	[XPU] fix VL multi-batch accuracy issue (#4394 )	2025-10-15 17:27:43 +08:00
bukejiyu	bcaa98ff9c	V1 loader default (#4251 ) * v1 laoder * update * update	2025-10-15 16:49:17 +08:00
chen	4efd073a41	fix block_wise_fp8_v1_loader_moe_shape (#4384 )	2025-10-15 14:08:53 +08:00
freeliuzc	582aebd48b	[MTP]support mtp chunk_prefill_v1 (#4366 ) * support mtp chunk_prefill_v1 * fix mtp chunkprefill output, fix unit test * fix unit test * fix save_output	2025-10-15 13:21:32 +08:00
zhupengyang	d6f775e33b	[XPU] fix ep (#4393 )	2025-10-15 11:41:05 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
YuanRisheng	a2ec2c4152	[FDConfig]Remove max_model_len in FDConfig (#4350 ) * modify max_model_len * fix unittest * fix unittest --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-11 14:04:17 +08:00
yinwei	20c7b741f4	[XPU] Support W4A8C8-TP4-300B Model (#4068 ) * support w4a8 * delete ep block attn * delete moe_topk_select * update note * update * delte useless info * update * add some note * fix some format * update scale info * add ans baseline --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-10-10 15:41:32 +08:00
RAM	aa27b03bc0	[Executor]CUDAGraph support Speculate Decode (#3769 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * success run ngram * Revert "[Code Simplification] remove cum_offsets (#3410)" This reverts commit `32b39620bc`. * success run ngram5 tp4 42bs * success run ngram5 tp4 42bs * mtp draft commit * add decorator for target model * enable draft model in cudagraph v0.5 * revert revrt cum_offset * enable target model in cudagraph v0.9 And clean debug code * Revert "success run ngram" This reverts commit `8351e83993`. * add reverted code * enable target model in cudagraph v0.9 * solve comment * fix bid < 0 * Enable Target Model Padding And Draft Model in cudagraph * solve problem * delete rebuild padding debug note * fast compile * Add capture list for mtp * success run 256 tp1 mtp * Enable Lite TP2 Bsz256 * realy enable tp2 bsz 256 * fix problem * Solve problem for Draft model in cudagraph * Solve comment * replace emptytensor as zeros * Solve comments * Revert "fast compile" This reverts commit `834639a7ff`. * fix bug * fix merge bug * fix typo * fix bug --------- Co-authored-by: lizexu <2694294196@qq.com> Co-authored-by: littledgg <1658565283@qq.com> Co-authored-by: zeroRains <linjunlu@zerorains.top> Co-authored-by: gongshaotian <gstain5555@outlook.com>	2025-10-09 21:18:29 +08:00
chen	81959c7d88	[NewFeature]custom_allreduce support cudagraph recapture (#4305 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * custom_allreduce support cudagraph recapture * add shut_down/restart default group	2025-09-29 15:56:54 +08:00
xiaozude	7c919070f7	[Metax] support cutlass moe & optimize flash attention (#4208 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-09-29 11:22:43 +08:00
李泳桦	6265f4385f	[feat] support prefix cache clearing when `/clear_load_weight` is called (#4008 ) * [feat] support clearing prefix cache (cherry-picked from release/2.1) * [fix] fix ipc suffix, use port instead * [fix] fix prefix caching not enabled * [fix] fix key/value_cache_scales indent * [fix] fix ep group all-reduce * [fix] fix clear/update lock not working when workers > 1 * [chore] add preemption triggered info log * [fix] fix code style * [fix] fix max_num_seqs config * [fix] do not force enable_prefix_caching=False in dynamic loading * [fix] fix ci * Revert "[fix] fix ci" This reverts commit `0bc6d55cc8`. * [fix] initialize available_gpu_block_num with max_gpu_block_num * [fix] fix config splitwise_role * [fix] fix clearing caches synchronization and add more logs * [chore] print cache_ready_signal in log * [fix] fix scheduler_config.splitwise_role * [fix] fix cache_messager cache_ready_signal create=True * [fix] stop cache messager from launching in mixed deployment	2025-09-28 19:42:53 +08:00
Sunny-bot1	aa1cc09c5b	fix machete pre quant (#4295 )	2025-09-28 16:11:09 +08:00
K11OntheBoat	7b6cb72ab2	Fix wrong batch size of thinking_mask (#4296 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: xiegegege <46314656+xiegegege@users.noreply.github.com>	2025-09-28 14:56:42 +08:00
Zhong Hui	67e693b18b	fix ernie vl distributed attr. (#4215 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-26 14:18:49 +08:00
K11OntheBoat	4515ad21e9	Support limit thinking lengths (#4069 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-09-25 19:55:56 +08:00
Lucas	87179cb744	[XPU] support XPU VL model inference (#4030 ) * [XPU] support XPU VL model inference * fix image op import and device check * rebase develop * fix perf	2025-09-25 14:34:15 +08:00

1 2 3 4 5 ...

313 Commits