FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-09-26 20:41:53 +08:00

Author	SHA1	Message	Date
gaoziyuan	5224f6c434	support qwen3 name_mapping (#3170 )	2025-08-04 16:37:40 +08:00
yinwei	bfef09dd73	update user email (#3087 )	2025-07-31 14:25:31 +08:00
LokeZhou	1d46420c49	[cherry-pick][MM_PROCESS] add _extract_labels (#2879 ) (#2993 ) v2.0.3	2025-07-24 11:04:43 +08:00
ltd0924	fb0f284e67	[BugFix] fix prompt token ids type (#2994 ) * Update serving_completion.py * fix * fix	2025-07-23 21:00:56 +08:00
Zero Rains	5d1788c7b5	polish code for prefill restrictions (#2992 )	2025-07-23 05:12:01 -07:00
Zero Rains	abd238fc12	[Cherry-Pick][BugFix] Add prefill restrictions for chunked_prefill+VL (#2984 )	2025-07-23 01:53:26 -07:00
Jiang-Jia-Jun	e5804b1d98	Revert "[LLM] fix multinode bugs (#2945 )" (#2971 ) This reverts commit `b0f1e0eef4`.	2025-07-22 21:23:48 +08:00
Sunny-bot1	8c43bc8176	[FIX 2.0.3]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2966 ) * fix rejection sampling when topp=0 * fix * fix	2025-07-22 05:53:04 -07:00
ltd0924	b0f1e0eef4	[LLM] fix multinode bugs (#2945 ) * [LLM] fix multinode bugs * [LLM] fix multinode bugs * [LLM] fix multinode bugs * [LLM] fix ci bugs * fix ci bugs * fix ci bugs	2025-07-22 20:23:37 +08:00
ming1753	69be77c8c0	[Feature] support prompt repetition_penalty (#2954 ) * [Feature] support prompt repetition_penalty (#2806) * [Bug Fix] fix bug of prompt penalty (#2888)	2025-07-22 19:42:33 +08:00
gaoziyuan	535a15ab8f	[Fix]Fix vl when import fastdeploy and fix rl config rank bug (#2953 ) * support vl ori_vacab_size * support trainer_degree in name_mapping * fix * fix import error * fix local rank	2025-07-22 04:40:27 -07:00
sg263	580460046f	merge 2.0.2 into 2.0.3 (#2917 ) Co-authored-by: shige <shige@baidu.com>	2025-07-22 14:46:20 +08:00
Sunny-bot1	4dbc483713	[BugFix]Fix sample rejection (#2908 ) (#2949 ) * fix config * fix rejection Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>	2025-07-22 13:39:34 +08:00
gaoziyuan	4ead15822c	【Sync develop】support vl model name_mapping and ori_vocab_size (#2915 ) * support vl ori_vacab_size * support trainer_degree in name_mapping * fix	2025-07-20 23:14:15 -07:00
Jiang-Jia-Jun	f941124402	[Feature] Support include_stop_str_in_output (#2930 ) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-21 10:58:32 +08:00
RAM	b89f083004	[Executor] Fix set capture sizes bug (#2903 )	2025-07-18 10:58:05 +08:00
RAM	4d05ed596c	Update GraphOptimizationBackend docs (#2899 )	2025-07-17 21:41:38 +08:00
ltd0924	bc1866af58	[LLM] delete fixed slot (#2894 )	2025-07-17 20:44:55 +08:00
yulangz	fe237fe92b	[XPU][doc] pick xpu doc fix (#2897 )	2025-07-17 20:01:40 +08:00
YUNSHEN XIE	3a480abcbb	enable CI workflow for pull requests targeting release/* branches (#2886 ) * enable CI workflow for pull requests targeting release/* branches * fix	2025-07-17 16:48:13 +08:00
Yuanle Liu	335609efb6	fix rollout_model and add rl ut (#2882 )	2025-07-17 13:37:54 +08:00
Jiang-Jia-Jun	3464f75f98	Update setup.py	2025-07-16 23:45:05 +08:00
Jiang-Jia-Jun	09d0073fdc	[Sync Code] develop to release/2.0.3 (#2873 ) * [LLM] support send batch data and aggregate data (#2860) * [LLM] support send batch data and aggregate data * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] update * [LLM] Update Multinode Deployment (#2830) * [LLM] fix multinode bugs * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] fix ci bugs * Update fastdeploy/engine/args_utils.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [LLM] update random port * [LLM] update random port * [LLM] fix ci bugs * fix ci bugs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:44:26 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
sg263	42b80182e0	[Trace] add opentelemetry (#2852 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-16 15:33:25 +08:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
yangjianfengo1	a83a3eea5f	将FLAGS_max_partition_size修改为环境变量获取 (#2854 )	2025-07-16 14:14:21 +08:00
xiaoxiaohehe001	0d0340392f	[Fix] Fix mm ep weight init. (#2855 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_45t_mm * Update load_weight_utils.py * Update load_weight_utils.py	2025-07-16 12:02:39 +08:00
YuanRisheng	0253381fb9	fix config (#2858 )	2025-07-16 11:40:10 +08:00
freeliuzc	2d1184aefe	[Fix] fix expert_parallel bug in decoder stage (#2848 )	2025-07-16 11:08:18 +08:00
yulangz	17314ee126	[XPU] Update doc and add scripts for downloading dependencies (#2845 ) * [XPU] update xvllm download * update supported models * fix xpu model runner in huge memory with small model * update doc	2025-07-16 11:05:56 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
Yuanle Liu	61b3997b85	refactor rl get_name_mappings_to_training (#2847 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * refactor rl get_name_mappings_to_training * fix tp>1 * change variable name(ffn1->up_gate_proj/ffn2->down_proj) * change variable name(linear_weight->weight/linear_bias->bias) * add rl names mapping for vl * fix ernie 0.3B error * fix develop code * fix	2025-07-15 07:31:42 -07:00
Zero Rains	e7bcbbab52	Merge vl execution path into normal execution path (#2829 ) * merge vl model into gpu_model runner Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6 * fix chinese Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a * fix the parse parameter Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe * fix the bug in online_inference Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559 * polish code Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c	2025-07-15 22:20:03 +08:00
zhenwenDang	5fc659b900	[Docs] add enable_logprob parameter description (#2850 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-15 19:47:45 +08:00
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
lijingning	9d6a42b334	适配vLLM无arrival_time；适配vLLM model必传；RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no	2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun	1b712bba82	Update setup.py	2025-07-15 14:57:23 +08:00
AIbin	fd91da7b41	【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842 ) * update supported_models doc	2025-07-15 14:35:40 +08:00
bukejiyu	15c8c240b5	[vl] Use top_k from config.json (#2831 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-15 00:39:12 +08:00
freeliuzc	7cdd8d290d	[MTP] optimize mtp infer speed (#2840 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-14 19:50:22 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
freeliuzc	2e81792d64	[fix] fix 'force-reinstall all-depe-packages in build' (#2837 )	2025-07-14 16:50:54 +08:00
AIbin	b7858c22d9	【Update Docs】update supported_models doc (#2836 ) * update supported_models doc	2025-07-14 16:01:34 +08:00
GoldPancake	09bbac6de0	Add DeepGEMM pre-compile tools (#2819 ) This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.	2025-07-14 14:56:41 +08:00
freeliuzc	7f64d408a9	[MTP] support expert-parellel in mtp (#2835 )	2025-07-14 14:28:50 +08:00
lddfym	ece88596ed	fix spelling error (#2827 )	2025-07-14 13:12:57 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00

1 2 3 4 5 ...

2728 Commits