FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
ming1753	67180c1ff9	[Bug Fix] fix bug of prompt penalty (#2888 )	2025-07-17 17:21:37 +08:00
Xintong Yu	273efba76f	[Fix] remove misleading variables (#2841 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 16:49:14 +08:00
Jiang-Jia-Jun	31cab9f87b	Update test_openai.py	2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun	d3dfa1446c	Update test_openai.py	2025-07-17 16:07:07 +08:00
ltd0924	b630031414	[LLM] fix serval bugs (#2878 )	2025-07-17 14:21:05 +08:00
LokeZhou	f50c25178b	[MM_PROCESS] add _extract_labels (#2879 )	2025-07-17 14:20:01 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
ming1753	1f15ca21e4	[Feature] support prompt repetition_penalty (#2806 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 12:05:52 +08:00
GoldPancake	42d4001400	[Features] Add speculative metrics (#2857 )	2025-07-17 11:08:55 +08:00
sg263	52aca233e8	[Trace] fix annotation when add opentelemetry (#2869 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 10:29:16 +08:00
ltd0924	9c25dcca0b	[LLM] Update Multinode Deployment (#2830 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [LLM] fix multinode bugs * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] fix ci bugs * Update fastdeploy/engine/args_utils.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [LLM] update random port * [LLM] update random port * [LLM] fix ci bugs * fix ci bugs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:42:54 +08:00
ltd0924	d245d1ca6c	[LLM] support send batch data and aggregate data (#2860 ) * [LLM] support send batch data and aggregate data * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] update	2025-07-16 23:42:20 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
sg263	42b80182e0	[Trace] add opentelemetry (#2852 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-16 15:33:25 +08:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
yangjianfengo1	a83a3eea5f	将FLAGS_max_partition_size修改为环境变量获取 (#2854 )	2025-07-16 14:14:21 +08:00
xiaoxiaohehe001	0d0340392f	[Fix] Fix mm ep weight init. (#2855 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_45t_mm * Update load_weight_utils.py * Update load_weight_utils.py	2025-07-16 12:02:39 +08:00
YuanRisheng	0253381fb9	fix config (#2858 )	2025-07-16 11:40:10 +08:00
freeliuzc	2d1184aefe	[Fix] fix expert_parallel bug in decoder stage (#2848 )	2025-07-16 11:08:18 +08:00
yulangz	17314ee126	[XPU] Update doc and add scripts for downloading dependencies (#2845 ) * [XPU] update xvllm download * update supported models * fix xpu model runner in huge memory with small model * update doc	2025-07-16 11:05:56 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
Yuanle Liu	61b3997b85	refactor rl get_name_mappings_to_training (#2847 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * refactor rl get_name_mappings_to_training * fix tp>1 * change variable name(ffn1->up_gate_proj/ffn2->down_proj) * change variable name(linear_weight->weight/linear_bias->bias) * add rl names mapping for vl * fix ernie 0.3B error * fix develop code * fix	2025-07-15 07:31:42 -07:00
Zero Rains	e7bcbbab52	Merge vl execution path into normal execution path (#2829 ) * merge vl model into gpu_model runner Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6 * fix chinese Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a * fix the parse parameter Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe * fix the bug in online_inference Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559 * polish code Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c	2025-07-15 22:20:03 +08:00
AIbin	fd91da7b41	【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842 ) * update supported_models doc	2025-07-15 14:35:40 +08:00
bukejiyu	15c8c240b5	[vl] Use top_k from config.json (#2831 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-15 00:39:12 +08:00
freeliuzc	7cdd8d290d	[MTP] optimize mtp infer speed (#2840 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-14 19:50:22 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
freeliuzc	7f64d408a9	[MTP] support expert-parellel in mtp (#2835 )	2025-07-14 14:28:50 +08:00
lddfym	ece88596ed	fix spelling error (#2827 )	2025-07-14 13:12:57 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00
zhenwenDang	d48c03413f	Feature/logprob bug fix (#2817 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens * Prevent response_logprobs.logprob_token_ids[0] from going out of bounds	2025-07-12 16:48:51 +08:00
gaoziyuan	e9e8443ea8	fix num_blocks_local when small size model in TP2 running mode (#2792 )	2025-07-12 12:50:48 +08:00
gaoziyuan	749b2e9c89	support qwen3moe name_mapping (#2820 )	2025-07-12 12:05:54 +08:00
Sunny-bot1	f6ad26fc08	fix topp default value (#2814 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-11 17:10:21 +08:00
zhink	c08561c13a	[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799 )	2025-07-11 15:09:43 +08:00
chen	2c3607407f	check (#2811 )	2025-07-11 13:54:52 +08:00
lddfym	b5e4288704	Global scheduler supports configuring hot updates (#2807 ) * Check if the controller port is available * Global scheduler supports configuring hot updates * add interface: /controller/scheduler * add interface: /controller/scheduler	2025-07-11 13:38:07 +08:00
yinwei	e98937cbba	delete useless file (#2772 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-11 11:46:04 +08:00
Sunny-bot1	240d6236bc	[Fix]fix top_k_top_p sampling (#2801 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix topk-topp * update * add base_non_truncated	2025-07-10 22:35:10 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
lizexu123	8c660a0dfb	[BugFix] fix RMSNorm rms_norm_esp (#2797 ) * fix rms * add vl * fix * add vl * fix * fix	2025-07-10 20:02:24 +08:00
LiqinruiG	ce5adec877	[Doc] modify offline-inerence docs (#2800 ) * modify offline-inerence docs * [bug] remove tool_call_content	2025-07-10 19:41:12 +08:00
yulangz	830de5a925	[XPU] Supports TP4 deployment on 4,5,6,7 (#2794 ) * 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行 * 修改 XPU 文档中多卡说明	2025-07-10 16:48:08 +08:00
chen	d33105baeb	[Feature] Online Chat API Support Return logprobs (#2777 ) * online chat support logprobs * check xpu * check vl_gpu_model_runner and xpu_model_runner * get_worker() check platform	2025-07-10 16:33:40 +08:00
K11OntheBoat	24f934f1f9	[BugFix] Fix low prediction accuracy of deepseekv3 (#2798 )	2025-07-10 16:16:44 +08:00
Sunny-bot1	1e2319cbef	Rename top_p_sampling to top_k_top_p_sampling (#2791 )	2025-07-10 00:09:25 -07:00

1 2 3 4 5 ...

645 Commits