FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-17 22:21:48 +08:00

Author	SHA1	Message	Date
ltd0924	ecf2fd5b9a	[BugFix] vl encoder tokens dtype problem (#3069 )	2025-07-30 15:20:53 +08:00
Yuan Xiaolan	35935da9e5	support W4A8 EPLB (#3075 )	2025-07-30 14:34:12 +08:00
Yzc216	159767717d	[Feature] multi source download (#3072 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path	2025-07-30 14:10:13 +08:00
YuanRisheng	99a70fc722	unify parallel config (#3070 )	2025-07-30 11:41:23 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
zhuzixuan	ad7bb52a28	修复传入max_tokens=1时的报错 (#3068 ) * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错	2025-07-29 23:49:28 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
Zero Rains	b2f9a42d87	[Feature] Support repetition early stop (#3024 ) * support repetition early stop and support user to set the parameter * remove log * fix codestyle * add the early_stop_config to rollout_config * update config and EarlyStopper class * fix the bug for triton * modify the stop method * update description * modify the usage for stop_flags --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-29 22:42:54 +08:00
Yuan Xiaolan	3214fb5393	support model loading for w4a8 offline quant (#3064 ) 支持W4A8 EP 对离线量化权重的load	2025-07-29 21:54:37 +08:00
Longzhi Wang	be0a0f2bb2	fix arguement error in ep when pd (#3060 )	2025-07-29 17:17:24 +08:00
YuanRisheng	502ee92a0a	Unify server-side and model-side Config (Part3) (#3047 ) * merge model config * fix arch * fix rl	2025-07-29 17:07:44 +08:00
Longzhi Wang	907d561523	fix ep when paddle version mismatch (#3056 )	2025-07-29 15:06:49 +08:00
JYChen	dafe02a7b9	[stop sequence] support stop sequence (#3025 ) * stop seqs in multi-ends * unittest for gpu stop op * kernel tid==0	2025-07-29 14:17:37 +08:00
YuanRisheng	1a815b7a2a	Fix Speculative Config bug (#3049 ) * fix speculative bug * fix rl	2025-07-29 10:50:48 +08:00
yinwei	f2a528f9ae	[XPU] Support kvblock centralized management (#3017 )	2025-07-29 10:40:55 +08:00
Yuan Xiaolan	b1d787a272	[fix] w4a8 model loading and hadamard config (#3013 )	2025-07-28 18:17:59 +08:00
K11OntheBoat	83048bbe55	[Feature] Deepseekv3 supports cudagraph (#3041 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-28 17:12:54 +08:00
AIbin	ec52d39e68	【Inference Optimize】Update wint2 weight n-dim reorder (#3042 )	2025-07-28 16:31:56 +08:00
YuanRisheng	bddf403576	Unify server-side and model-side Config (Part2) (#3035 ) * merge speculative and graph opt conifg * add attr	2025-07-28 15:31:48 +08:00
chen	01485cd28b	MTP rejection_topp add topk input (#3031 )	2025-07-28 13:58:45 +08:00
begin2023	dd877f38b1	[Perf] Remove unnecessary operations in non-cuda_graph (#3010 ) * [Perf] Remove unnecessary operations in non-cuda_graph * fix code logic * use suggestion comment * reduce function call * reduce function call * reduce function call * reduce function call	2025-07-27 20:38:29 -07:00
Longzhi Wang	247010d298	fix arguement error (#3030 )	2025-07-28 11:03:29 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
李泳桦	69996a40da	[feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020 ) * [feat] add disable_chat_template in chat api as a substitute for previous raw_request * [fix] pre-commit code check	2025-07-25 20:57:32 +08:00
Longzhi Wang	0700c90caa	[Feat] support mixed ep (#2969 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support mixed ep * fix comment * fix comment * update mixep * fix conflict * fix typo * update * fix typo * fix code style * fix conflict	2025-07-25 15:29:30 +08:00
chen	332154f504	[feature] Support FA2 (#3009 )	2025-07-25 14:09:00 +08:00
EnflameGCU	8c167e130c	[GCU] Update post_process (#3012 )	2025-07-25 11:03:03 +08:00
xiaoxiaohehe001	2970b00dfa	[Feature] Support_eplb (#2997 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Feature] support_eplb * [Feature] support_eplb * [Fix] fix mm ep	2025-07-24 20:22:45 +08:00
littledgg	f37d00e856	[Model] Provide clearer error for missing KV cache quantization scales (#3007 )	2025-07-24 20:15:00 +08:00
EnflameGCU	c40df1802e	[GCU] Update to develop (#2988 )	2025-07-24 19:30:52 +08:00
Yzc216	980126b83a	[Feature] multi source download (#3005 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation	2025-07-24 17:42:09 +08:00
Zero Rains	0fb37ab7e4	update flake8 version to support pre-commit in python3.12 (#3000 ) * update flake8 version to support pre-commit in python3.12 * polish code	2025-07-24 01:43:31 -07:00
ltd0924	f935d6f862	[BugFix] fix multinode deployment (#2977 )	2025-07-24 15:04:04 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
Yzc216	e14587a954	[Feature] multi-source download (#2986 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit	2025-07-24 14:26:37 +08:00
xiaoxiaohehe001	2c0ff068e2	[Fix] fix mm ep empty run (#2999 )	2025-07-24 14:15:55 +08:00
lizhenyun01	29c3292f02	support c4 attn && fix cache	2025-07-24 12:00:52 +08:00
lizexu123	832d25334a	[Code Simplification] fix init_distributed_environment() (#2982 )	2025-07-24 11:43:28 +08:00
bukejiyu	bfeb664ab8	update (#2978 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-24 00:16:42 +08:00
chenjian	85a78d695d	[Feature] Support block scheduler v1 for FD (#2928 ) * Support FD block scheduler v1 * Support FD block scheduler v1 * Support FD block scheduler v1 * Fix according to copilot review * Fix according to review * Remove is_dummy * Fix bug when real_bsz=1 * Fix infer first token cost time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-23 20:31:31 +08:00
Zero Rains	ca0f71bd39	polish code for prefill restrictions (#2991 )	2025-07-23 05:10:14 -07:00
chen	172e69fe17	FA3 fix bug (#2987 )	2025-07-23 19:07:43 +08:00
zhink	1272c7ce98	Fix performance degradation bug of custom_all_reduce (#2981 )	2025-07-23 17:45:44 +08:00
Zero Rains	850c9d98d4	[BugFix] Add prefill restrictions for chunked_prefill+VL (#2983 )	2025-07-23 01:45:57 -07:00
freeliuzc	a39a67334c	fix mtp bug in pd-split mode (#2970 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-23 15:31:16 +08:00
lizexu123	9b22b8d2c3	delete max-len (#2959 )	2025-07-23 15:11:39 +08:00
chen	ad202272ed	【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942 )	2025-07-23 13:02:50 +08:00
lizhenyun01	e51f018577	support chunk_prefill in fa3	2025-07-23 12:19:20 +08:00
Ryan	95b5af24db	[SOT] Add sot warmup (NVIDIA GPU Only) (#2929 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless	2025-07-22 21:36:14 +08:00
Sunny-bot1	7c5e34e72d	[FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967 ) * fix rejection sampling when topp=0 * fix	2025-07-22 05:53:37 -07:00

1 2 3 4 5 ...

727 Commits