FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-11-03 02:53:26 +08:00

Author	SHA1	Message	Date
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
ming1753	67180c1ff9	[Bug Fix] fix bug of prompt penalty (#2888 )	2025-07-17 17:21:37 +08:00
Xintong Yu	273efba76f	[Fix] remove misleading variables (#2841 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 16:49:14 +08:00
YUNSHEN XIE	1cfba5ba3e	enable CI workflow for pull requests targeting release/* branches (#2887 )	2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun	31cab9f87b	Update test_openai.py	2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun	d3dfa1446c	Update test_openai.py	2025-07-17 16:07:07 +08:00
ltd0924	b630031414	[LLM] fix serval bugs (#2878 )	2025-07-17 14:21:05 +08:00
LokeZhou	f50c25178b	[MM_PROCESS] add _extract_labels (#2879 )	2025-07-17 14:20:01 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
ming1753	1f15ca21e4	[Feature] support prompt repetition_penalty (#2806 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 12:05:52 +08:00
yulangz	7dfd2ea052	[XPU][doc] Update minimal fastdeploy required (#2863 ) * [XPU][doc] update minimal fastdeploy required	2025-07-17 11:33:22 +08:00
GoldPancake	42d4001400	[Features] Add speculative metrics (#2857 )	2025-07-17 11:08:55 +08:00
sg263	52aca233e8	[Trace] fix annotation when add opentelemetry (#2869 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 10:29:16 +08:00
ltd0924	9c25dcca0b	[LLM] Update Multinode Deployment (#2830 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [LLM] fix multinode bugs * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] fix ci bugs * Update fastdeploy/engine/args_utils.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [LLM] update random port * [LLM] update random port * [LLM] fix ci bugs * fix ci bugs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:42:54 +08:00
ltd0924	d245d1ca6c	[LLM] support send batch data and aggregate data (#2860 ) * [LLM] support send batch data and aggregate data * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] update	2025-07-16 23:42:20 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
sg263	42b80182e0	[Trace] add opentelemetry (#2852 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-16 15:33:25 +08:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
yangjianfengo1	a83a3eea5f	将FLAGS_max_partition_size修改为环境变量获取 (#2854 )	2025-07-16 14:14:21 +08:00
xiaoxiaohehe001	0d0340392f	[Fix] Fix mm ep weight init. (#2855 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_45t_mm * Update load_weight_utils.py * Update load_weight_utils.py	2025-07-16 12:02:39 +08:00
YuanRisheng	0253381fb9	fix config (#2858 )	2025-07-16 11:40:10 +08:00
freeliuzc	2d1184aefe	[Fix] fix expert_parallel bug in decoder stage (#2848 )	2025-07-16 11:08:18 +08:00
yulangz	17314ee126	[XPU] Update doc and add scripts for downloading dependencies (#2845 ) * [XPU] update xvllm download * update supported models * fix xpu model runner in huge memory with small model * update doc	2025-07-16 11:05:56 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
Yuanle Liu	61b3997b85	refactor rl get_name_mappings_to_training (#2847 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * refactor rl get_name_mappings_to_training * fix tp>1 * change variable name(ffn1->up_gate_proj/ffn2->down_proj) * change variable name(linear_weight->weight/linear_bias->bias) * add rl names mapping for vl * fix ernie 0.3B error * fix develop code * fix	2025-07-15 07:31:42 -07:00
Zero Rains	e7bcbbab52	Merge vl execution path into normal execution path (#2829 ) * merge vl model into gpu_model runner Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6 * fix chinese Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a * fix the parse parameter Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe * fix the bug in online_inference Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559 * polish code Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c	2025-07-15 22:20:03 +08:00
zhenwenDang	5fc659b900	[Docs] add enable_logprob parameter description (#2850 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-15 19:47:45 +08:00
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
lijingning	9d6a42b334	适配vLLM无arrival_time；适配vLLM model必传；RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no	2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun	1b712bba82	Update setup.py	2025-07-15 14:57:23 +08:00
AIbin	fd91da7b41	【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842 ) * update supported_models doc	2025-07-15 14:35:40 +08:00
bukejiyu	15c8c240b5	[vl] Use top_k from config.json (#2831 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-15 00:39:12 +08:00
freeliuzc	7cdd8d290d	[MTP] optimize mtp infer speed (#2840 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-14 19:50:22 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
freeliuzc	2e81792d64	[fix] fix 'force-reinstall all-depe-packages in build' (#2837 )	2025-07-14 16:50:54 +08:00
AIbin	b7858c22d9	【Update Docs】update supported_models doc (#2836 ) * update supported_models doc	2025-07-14 16:01:34 +08:00
GoldPancake	09bbac6de0	Add DeepGEMM pre-compile tools (#2819 ) This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.	2025-07-14 14:56:41 +08:00
freeliuzc	7f64d408a9	[MTP] support expert-parellel in mtp (#2835 )	2025-07-14 14:28:50 +08:00
lddfym	ece88596ed	fix spelling error (#2827 )	2025-07-14 13:12:57 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00
xiegegege	16940822a7	add result save for ci (#2824 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details LGTM	2025-07-12 23:34:46 +08:00
zhenwenDang	d48c03413f	Feature/logprob bug fix (#2817 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens * Prevent response_logprobs.logprob_token_ids[0] from going out of bounds	2025-07-12 16:48:51 +08:00
gaoziyuan	e9e8443ea8	fix num_blocks_local when small size model in TP2 running mode (#2792 )	2025-07-12 12:50:48 +08:00
gaoziyuan	749b2e9c89	support qwen3moe name_mapping (#2820 )	2025-07-12 12:05:54 +08:00
Sunny-bot1	f6ad26fc08	fix topp default value (#2814 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-11 17:10:21 +08:00
zhink	c08561c13a	[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799 )	2025-07-11 15:09:43 +08:00
chen	2c3607407f	check (#2811 )	2025-07-11 13:54:52 +08:00

1 2 3 4 5 ...

2721 Commits