FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
lizan1999	ec6811f648	support token num = 0 (#5635 ) Co-authored-by: lizan1999 <lizan03@baidu.com> Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-19 10:20:38 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
lizan1999	e1a9b282eb	fix bug for EP+MTP (#5605 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2025-12-18 14:34:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
Yuanle Liu	cdc0004894	Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563 )" (#5611 ) This reverts commit `73e1d6aa90`.	2025-12-17 13:59:06 +08:00
Yuanle Liu	867803ae10	[BugFix] fix speculate_limit_thinking_content_length (#5590 ) * fix speculate_limit_thinking_content_length * update	2025-12-16 04:31:45 -08:00
chen	27ef3610b5	support glm fa3 (#5586 )	2025-12-16 19:33:27 +08:00
fxyfxy777	73e1d6aa90	[Feature] add ue8m0 for per_token_quant_fp8 (#5563 ) * ue8m0 * add default arg --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-16 18:40:12 +08:00
Echo-Nie	50100f98d7	[Feature] Support fusedmoe on Blackwell (#5325 ) * update sm100 * fix * fix style	2025-12-16 11:58:50 +08:00
freeliuzc	532f9ba227	[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491 ) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''	2025-12-15 18:27:11 +08:00
ddchenhao66	9f70f4310e	[PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-12-15 15:39:38 +08:00
chen	a389bb7c5c	[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486 )	2025-12-12 17:10:17 +08:00
RuohengMa	12c76f8137	[XPU] add speculate_get_logits (#5497 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-12 15:38:30 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
Juncai	d67388a479	[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514 ) * Distinguish the pipelines for sending kv signal in different prefill * up	2025-12-12 14:05:36 +08:00
cmcamdy	3c1f7b85a4	[XPU] support get hidden state for mix (#5513 ) * fix git hidden states * fix code style * fix code style	2025-12-12 10:31:20 +08:00
FocusLuo	c3aaa7e441	[BugFix] Fixed build script issue on Intel HPU platforms (#5455 ) * [INTEL HPU] Fixed build script issue for non-gpu platforms Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu Signed-off-by: Luo, Focus <focus.luo@intel.com> --------- Signed-off-by: Luo, Focus <focus.luo@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-11 16:36:37 +08:00
Neil Zhu	4403a21d4b	[Metax] refactor cutlass moe and optimize flash attention (#5361 ) * [Metax] refactor moe and flash attention backend --------- Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>	2025-12-10 17:15:17 +08:00
Copilot	e38709b499	[BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471 ) * Initial plan * [BugFix] Fix limit_thinking bug - change AND to OR in condition checks Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * Update Chinese comments to reflect OR logic instead of AND Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2025-12-10 11:03:19 +08:00
lzy	99f607eef5	[Others] Maintain the mtp branch temporarily. (#5446 )	2025-12-09 19:17:53 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
xiaozude	df67379bc3	[Metax] modify wrapSize to WARP_SIZE (#5442 )	2025-12-09 01:44:02 -08:00
周周周	31410415db	FA3 support qwen3 (#5441 )	2025-12-09 16:16:16 +08:00
RuohengMa	8178e3fc6a	[XPU] add speculate_step_system_cache (#5397 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2025-12-09 14:40:11 +08:00
K11OntheBoat	8d99bac532	Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-09 14:17:30 +08:00
周周周	2aea8a3a60	[Others] Remove useless code (#5404 )	2025-12-08 13:59:46 +08:00
GoldPancake	8545b705ed	fix top_p_candidates (#5400 ) Co-authored-by: freeliuzc <lzc842650834@gmail.com>	2025-12-05 20:01:05 +08:00
Lucas	8f2b85362d	[XPU] support moe_expert_ffn TGEMM selection (#5375 )	2025-12-05 17:49:40 +08:00
Lucas	3aed8d257d	[XPU] redirect xvllm/xtdk/xhpc downloading log (#5388 )	2025-12-05 17:34:17 +08:00
cmcamdy	86b6430582	fix split_rope_cache_kv_encoder in mix mtp (#5384 )	2025-12-05 14:33:17 +08:00
Lucas	7b0b6e470a	[XPU] support XDNN downloading function (#5365 )	2025-12-05 11:16:45 +08:00
Nyakku Shigure	f88c159de1	[BugFix] Exit if neither modern nor legacy wheel dir not found (#5367 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 16:45:48 +08:00
Yonghua Li	f4119d51b4	[PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197 ) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 15:38:43 +08:00
周周周	a36d60aa18	[FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350 ) * commit * commit * commit * commit * commit * commit	2025-12-03 05:17:37 -08:00
Sunny-bot1	d5a9b75b4e	fix cutlass ep (#5337 )	2025-12-03 14:06:01 +08:00
lzy	c71a44c7e5	supports mtp split_kv_attn (#5343 )	2025-12-03 12:40:16 +08:00
Sunny-bot1	3629db4129	[Quantization] Support w4afp8 MoE dynamic quantization (#5282 ) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <17801055074@163.com>	2025-12-02 18:56:16 +08:00
周周周	fb7f951612	[UNITEST] add test (#5305 )	2025-12-02 17:59:01 +08:00
qw86972190	6048ea37bd	[XPU]add enable_logprob (#5279 ) * [XPU]Update document * [XPU]Update documentation * [XPU]add enable_logprob * Fix code style issues * “doc” * “docs” * “doc” * Fix code style via pre-commit --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com>	2025-12-02 15:32:28 +08:00
K11OntheBoat	2e1680838f	[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 ) * Support deepseekv3 cache transfer for PD deploy * clean some log info --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-02 14:11:50 +08:00
chen	aa35ce449d	[Optimization] EP empty_input_forward Remove Communication (#5254 )	2025-12-01 21:10:40 +08:00
cmcamdy	3149aed750	fix_gather_next_token (#5311 )	2025-12-01 18:00:30 +08:00
K11OntheBoat	7bafcf1df3	[OP]Remove extra H2D in DeepGemm (#5262 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-11-28 14:23:44 +08:00
周周周	95243f012c	[Others] add PADDLE_ENFORCE (#5288 )	2025-11-28 14:23:35 +08:00
lizhenyun01	aba4fc657f	[Feature] support flash_mask_attention backend (#5134 ) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code	2025-11-28 10:12:16 +08:00
cmcamdy	5a67a6d960	[XPU] support kernel for mtp(base) (#4748 ) * [XPU] support kernel for mtp(base) * [XPU] support kernel for mtp(base) * format * format * format * fix gather next token * fix step && add test * fix * mv pre/post process * add adjust batch / gather next token for mtp * fix code style * fix mtp kenrel name * fix mtp kernel test * mv xpu pre/post process * mv xpu pre/post process	2025-11-27 15:05:44 +08:00
GoldPancake	cfc5b0ccf9	[BugFix] fix mtp logprob bugs in chunk prefill (#5244 ) * fix mtp logprob bugs in chunk prefill * fix * fix	2025-11-27 11:31:29 +08:00
freeliuzc	ba915e03e1	[BugFix]Fix attention mask bug in D-Node of PD-split mode (#5245 )	2025-11-26 17:56:28 +08:00
xiaoxiaohehe001	61fc368066	[Fix] fix eplb noaux (#5239 ) * fix eplb noaux * fix eplb noaux	2025-11-26 17:50:51 +08:00
zccjjj	ea3bc5b4ca	[XPU] Fix the error in MoeExpertFFN operator when valid_token_num=0 (#5196 )	2025-11-25 10:07:20 +08:00

1 2 3 4 5 ...

277 Commits