FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
ZhangYulongg	9c8292fb19	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	a5e95013b5	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	93481a5478	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	eb77b1be6d	update ci cases	2025-07-18 21:44:07 +08:00
ming1753	5328daa333	[Bug Fix] fix ep config bug (#2920 )	2025-07-18 19:12:56 +08:00
xiaoxiaohehe001	a42fc3f40b	[Feature] Support 45tVL EP FP8 Infer. (#2909 ) * support_mm_ep_fp8 * support_mm_ep	2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun	fbe3547c95	[Feature] Support include_stop_str_in_output in chat/completion (#2910 ) * [Feature] Support include_stop_str_in_output in chat/completion * Add ci test for include_stop_str_in_output * Update version of openai * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-18 16:59:18 +08:00
gaoziyuan	6efad14b95	support vl ori_vacab_size (#2900 )	2025-07-18 16:26:14 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
YUNSHEN XIE	e81137e581	fix ci workflow (#2896 )	2025-07-18 16:01:00 +08:00
RAM	cd52dc0f65	[Executor] Fix set capture sizes bug (#2902 )	2025-07-18 15:12:19 +08:00
周周周	1339e56282	[XPU] Remove padding_offsets from get_padding_offset.cu (#2911 )	2025-07-18 14:16:44 +08:00
YuanRisheng	0eb5dc18d3	[BugFix]Fix sample rejection (#2908 ) * fix config * fix rejection	2025-07-18 13:44:30 +08:00
sg263	e679567d59	[Trace]fix opentelemetry can not work in uvicorn (#2906 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap * fix opentelemetry can not work in uvicorn * move conf to env --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 23:16:45 +08:00
RAM	bbe2c5c968	Update GraphOptimizationBackend docs (#2898 )	2025-07-17 21:38:18 +08:00
ltd0924	4b14dca1d6	[LLM] delete fixed slots (#2893 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 19:19:54 +08:00
yulangz	c8c280c4d3	[XPU][Doc] fix typo (#2892 )	2025-07-17 19:13:54 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
ming1753	67180c1ff9	[Bug Fix] fix bug of prompt penalty (#2888 )	2025-07-17 17:21:37 +08:00
Xintong Yu	273efba76f	[Fix] remove misleading variables (#2841 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 16:49:14 +08:00
YUNSHEN XIE	1cfba5ba3e	enable CI workflow for pull requests targeting release/* branches (#2887 )	2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun	31cab9f87b	Update test_openai.py	2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun	d3dfa1446c	Update test_openai.py	2025-07-17 16:07:07 +08:00
ltd0924	b630031414	[LLM] fix serval bugs (#2878 )	2025-07-17 14:21:05 +08:00
LokeZhou	f50c25178b	[MM_PROCESS] add _extract_labels (#2879 )	2025-07-17 14:20:01 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
ming1753	1f15ca21e4	[Feature] support prompt repetition_penalty (#2806 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 12:05:52 +08:00
yulangz	7dfd2ea052	[XPU][doc] Update minimal fastdeploy required (#2863 ) * [XPU][doc] update minimal fastdeploy required	2025-07-17 11:33:22 +08:00
GoldPancake	42d4001400	[Features] Add speculative metrics (#2857 )	2025-07-17 11:08:55 +08:00
sg263	52aca233e8	[Trace] fix annotation when add opentelemetry (#2869 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 10:29:16 +08:00
ltd0924	9c25dcca0b	[LLM] Update Multinode Deployment (#2830 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [LLM] fix multinode bugs * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] fix ci bugs * Update fastdeploy/engine/args_utils.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [LLM] update random port * [LLM] update random port * [LLM] fix ci bugs * fix ci bugs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:42:54 +08:00
ltd0924	d245d1ca6c	[LLM] support send batch data and aggregate data (#2860 ) * [LLM] support send batch data and aggregate data * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] update	2025-07-16 23:42:20 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
sg263	42b80182e0	[Trace] add opentelemetry (#2852 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-16 15:33:25 +08:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
yangjianfengo1	a83a3eea5f	将FLAGS_max_partition_size修改为环境变量获取 (#2854 )	2025-07-16 14:14:21 +08:00
xiaoxiaohehe001	0d0340392f	[Fix] Fix mm ep weight init. (#2855 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_45t_mm * Update load_weight_utils.py * Update load_weight_utils.py	2025-07-16 12:02:39 +08:00
YuanRisheng	0253381fb9	fix config (#2858 )	2025-07-16 11:40:10 +08:00
freeliuzc	2d1184aefe	[Fix] fix expert_parallel bug in decoder stage (#2848 )	2025-07-16 11:08:18 +08:00
yulangz	17314ee126	[XPU] Update doc and add scripts for downloading dependencies (#2845 ) * [XPU] update xvllm download * update supported models * fix xpu model runner in huge memory with small model * update doc	2025-07-16 11:05:56 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
Yuanle Liu	61b3997b85	refactor rl get_name_mappings_to_training (#2847 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * refactor rl get_name_mappings_to_training * fix tp>1 * change variable name(ffn1->up_gate_proj/ffn2->down_proj) * change variable name(linear_weight->weight/linear_bias->bias) * add rl names mapping for vl * fix ernie 0.3B error * fix develop code * fix	2025-07-15 07:31:42 -07:00
Zero Rains	e7bcbbab52	Merge vl execution path into normal execution path (#2829 ) * merge vl model into gpu_model runner Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6 * fix chinese Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a * fix the parse parameter Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe * fix the bug in online_inference Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559 * polish code Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c	2025-07-15 22:20:03 +08:00
zhenwenDang	5fc659b900	[Docs] add enable_logprob parameter description (#2850 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-15 19:47:45 +08:00
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
lijingning	9d6a42b334	适配vLLM无arrival_time；适配vLLM model必传；RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no	2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun	1b712bba82	Update setup.py	2025-07-15 14:57:23 +08:00

1 2 3 4 5 ...

2738 Commits