FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-07 01:22:59 +08:00

Author	SHA1	Message	Date
K11OntheBoat	93bb68aa71	[Feature] Marlin MoE backend supports DeepseekV3 (#2962 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-22 18:11:15 +08:00
GoldPancake	dc67c10a7e	[Feature][MTP]Support multi-step MTP (#2952 )	2025-07-22 16:26:29 +08:00
luukunn	920e6b3f60	[Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891 )	2025-07-22 16:13:05 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00
Nyakku Shigure	48e6a0ca26	[SOT] Mark dynamic dims by type annotations (#2771 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [SOT] Mark dynamic dims by type annotations * fix conflict of forward_meta * mark more attn backend * fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS * auto infer implicit 0 dim dynamic dim * revert manual marked dims * revert missing update * auto infer can use unsafe code in warmup stage * check -> type_match * fix codestyle * restore blank line * empty commit * add need_warmup nonlocal; * add doc for resolver * add missing type hints * unquote "ForwardMeta"	2025-07-22 00:23:52 -07:00
李泳桦	2a8a2c06de	[fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951 )	2025-07-22 14:35:56 +08:00
lifulll	2c6a9e887e	native top_p_sampling (#2901 )	2025-07-22 14:09:59 +08:00
gaoziyuan	0eedbdaee0	fix import error (#2944 )	2025-07-22 14:06:01 +08:00
K11OntheBoat	8020927f50	[BugFix] Rename attention params of deepseekv3 (#2939 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun	56102e91e1	[Polish] Return error message of raw_request (#2946 ) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-22 10:21:32 +08:00
zhink	0262ef7eb3	custom all reduce support cuda graph (#2938 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag * rename communication_op to communication	2025-07-21 22:52:03 +08:00
周周周	ff4569f135	remove some code in ep.py (#2947 )	2025-07-21 22:44:57 +08:00
李泳桦	8a619e9db5	[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940 ) * [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body * [fix] return_token_ids not working in curl request * [test] improve some test cases of return_token_ids and prompt_token_ids * [fix] the server responds ok even if request.messages is an empty list	2025-07-21 19:31:14 +08:00
littledgg	2845bde964	[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936 ) * [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph * Fix: Apply black formatting	2025-07-21 16:25:51 +08:00
Yuanle Liu	2f74e93d7e	use dist.all_reduce(min) to sync num_blocks_local (#2933 ) * pre-commit all files check * reduce min num_blocks_local * fix nranks=1 * pre-commit when commit-msg	2025-07-21 01:23:36 -07:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00
gaoziyuan	95a214ae43	support trainer_degree in name_mapping (#2935 )	2025-07-20 23:12:55 -07:00
ltd0924	cc4cec0a74	Update engine_client.py (#2931 )	2025-07-21 11:42:16 +08:00
周周周	8c5407d9e4	remove cum_offsets from ForwardMeta (#2925 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-19 23:57:27 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
ming1753	5328daa333	[Bug Fix] fix ep config bug (#2920 )	2025-07-18 19:12:56 +08:00
xiaoxiaohehe001	a42fc3f40b	[Feature] Support 45tVL EP FP8 Infer. (#2909 ) * support_mm_ep_fp8 * support_mm_ep	2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun	fbe3547c95	[Feature] Support include_stop_str_in_output in chat/completion (#2910 ) * [Feature] Support include_stop_str_in_output in chat/completion * Add ci test for include_stop_str_in_output * Update version of openai * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-18 16:59:18 +08:00
gaoziyuan	6efad14b95	support vl ori_vacab_size (#2900 )	2025-07-18 16:26:14 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
RAM	cd52dc0f65	[Executor] Fix set capture sizes bug (#2902 )	2025-07-18 15:12:19 +08:00
周周周	1339e56282	[XPU] Remove padding_offsets from get_padding_offset.cu (#2911 )	2025-07-18 14:16:44 +08:00
YuanRisheng	0eb5dc18d3	[BugFix]Fix sample rejection (#2908 ) * fix config * fix rejection	2025-07-18 13:44:30 +08:00
sg263	e679567d59	[Trace]fix opentelemetry can not work in uvicorn (#2906 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap * fix opentelemetry can not work in uvicorn * move conf to env --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 23:16:45 +08:00
ltd0924	4b14dca1d6	[LLM] delete fixed slots (#2893 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 19:19:54 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
ming1753	67180c1ff9	[Bug Fix] fix bug of prompt penalty (#2888 )	2025-07-17 17:21:37 +08:00
Xintong Yu	273efba76f	[Fix] remove misleading variables (#2841 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 16:49:14 +08:00
Jiang-Jia-Jun	31cab9f87b	Update test_openai.py	2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun	d3dfa1446c	Update test_openai.py	2025-07-17 16:07:07 +08:00
ltd0924	b630031414	[LLM] fix serval bugs (#2878 )	2025-07-17 14:21:05 +08:00
LokeZhou	f50c25178b	[MM_PROCESS] add _extract_labels (#2879 )	2025-07-17 14:20:01 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
ming1753	1f15ca21e4	[Feature] support prompt repetition_penalty (#2806 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-17 12:05:52 +08:00
GoldPancake	42d4001400	[Features] Add speculative metrics (#2857 )	2025-07-17 11:08:55 +08:00
sg263	52aca233e8	[Trace] fix annotation when add opentelemetry (#2869 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue * fix annotation * fix annotation when add opentelemetry * fix opentelemetry-instrumentation-fastapi * fix pentelemetry-bootstrap --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-17 10:29:16 +08:00
ltd0924	9c25dcca0b	[LLM] Update Multinode Deployment (#2830 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [LLM] fix multinode bugs * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] update multinode deployment * [LLM] fix ci bugs * Update fastdeploy/engine/args_utils.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [LLM] update random port * [LLM] update random port * [LLM] fix ci bugs * fix ci bugs --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:42:54 +08:00
ltd0924	d245d1ca6c	[LLM] support send batch data and aggregate data (#2860 ) * [LLM] support send batch data and aggregate data * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] fix ci bugs * [LLM] update	2025-07-16 23:42:20 +08:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
sg263	42b80182e0	[Trace] add opentelemetry (#2852 ) * add opentelemetry * add opentelemetry * add opentelemetry on dequeue * add opentelemetry on dequeue * add opentelemetry on dequeue --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-16 15:33:25 +08:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
yangjianfengo1	a83a3eea5f	将FLAGS_max_partition_size修改为环境变量获取 (#2854 )	2025-07-16 14:14:21 +08:00
xiaoxiaohehe001	0d0340392f	[Fix] Fix mm ep weight init. (#2855 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_45t_mm * Update load_weight_utils.py * Update load_weight_utils.py	2025-07-16 12:02:39 +08:00

1 2 3 4 5 ...

725 Commits