FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
littledgg	f37d00e856	[Model] Provide clearer error for missing KV cache quantization scales (#3007 )	2025-07-24 20:15:00 +08:00
xiaoxiaohehe001	2c0ff068e2	[Fix] fix mm ep empty run (#2999 )	2025-07-24 14:15:55 +08:00
Nyakku Shigure	48e6a0ca26	[SOT] Mark dynamic dims by type annotations (#2771 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [SOT] Mark dynamic dims by type annotations * fix conflict of forward_meta * mark more attn backend * fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS * auto infer implicit 0 dim dynamic dim * revert manual marked dims * revert missing update * auto infer can use unsafe code in warmup stage * check -> type_match * fix codestyle * restore blank line * empty commit * add need_warmup nonlocal; * add doc for resolver * add missing type hints * unquote "ForwardMeta"	2025-07-22 00:23:52 -07:00
lifulll	2c6a9e887e	native top_p_sampling (#2901 )	2025-07-22 14:09:59 +08:00
zhink	0262ef7eb3	custom all reduce support cuda graph (#2938 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag * rename communication_op to communication	2025-07-21 22:52:03 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
xiaoxiaohehe001	a42fc3f40b	[Feature] Support 45tVL EP FP8 Infer. (#2909 ) * support_mm_ep_fp8 * support_mm_ep	2025-07-18 17:57:15 +08:00
gaoziyuan	6efad14b95	support vl ori_vacab_size (#2900 )	2025-07-18 16:26:14 +08:00
Yuanle Liu	dbb9e2506b	Fix rollout_model init (#2881 )	2025-07-16 22:36:21 -07:00
Yuanle Liu	63d6e7ce06	fix and refine vl (#2866 ) * refine vl config * delete attn_sep * fix vl accuracy	2025-07-16 05:59:28 -07:00
Yuanle Liu	dda4a9f848	rl update (#2861 )	2025-07-16 00:33:10 -07:00
Yuanle Liu	61b3997b85	refactor rl get_name_mappings_to_training (#2847 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * refactor rl get_name_mappings_to_training * fix tp>1 * change variable name(ffn1->up_gate_proj/ffn2->down_proj) * change variable name(linear_weight->weight/linear_bias->bias) * add rl names mapping for vl * fix ernie 0.3B error * fix develop code * fix	2025-07-15 07:31:42 -07:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
bukejiyu	bad53c6b6e	[vl]remove duplicated load logic (#2744 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-13 07:36:26 +08:00
gaoziyuan	749b2e9c89	support qwen3moe name_mapping (#2820 )	2025-07-12 12:05:54 +08:00
zhink	c08561c13a	[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799 )	2025-07-11 15:09:43 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
lizexu123	8c660a0dfb	[BugFix] fix RMSNorm rms_norm_esp (#2797 ) * fix rms * add vl * fix * add vl * fix * fix	2025-07-10 20:02:24 +08:00
K11OntheBoat	24f934f1f9	[BugFix] Fix low prediction accuracy of deepseekv3 (#2798 )	2025-07-10 16:16:44 +08:00
Ryan	b0f525955c	[SOT] Remove breakgraph in post processing && fix datatype (#2780 )	2025-07-10 11:26:00 +08:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
Ryan	c4718fd693	Enable SOT D2St in Multimodal Model (#2735 )	2025-07-09 12:26:18 +08:00
Ryan	f72c4de539	[SOT] Make custom_op dy&st unified (#2733 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * make_custom_op dy&st unified * add instance judgement	2025-07-08 19:21:44 +08:00
lizexu123	525be243e7	[Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737 ) * fix qwen3.py * update * update lm_head tie_word_embeddings * update tie_word_embeddings * fix * fix tie_word_embedding not in config.json --------- Co-authored-by: lizexu <lizexu@baidu.com>	2025-07-07 23:15:27 -07:00
gaoziyuan	26d5d737dd	【Fearture】support qwen2 some func (#2740 ) * add rl qwen model support * fix * fix	2025-07-08 12:03:04 +08:00
Ryan	fefbd65cf8	[SOT] Remove BreakGraph with `paddle.maximum` (#2731 ) * rm if with clip * clip -> maximum * int64 -> int32	2025-07-08 11:44:25 +08:00
liddk1121	1b54a2831e	Adapt for iluvatar gpu (#2684 )	2025-07-07 16:53:14 +08:00
GoldPancake	e7fa57ebae	Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix mtp eh_proj layer * fix mtp update_cfg function * fix stringdoc * simplify class name	2025-07-04 14:15:04 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun	05c670e593	[Sync] Update to latest code (#2679 ) * [Sync] Update to latest code * Add new code files * Add new code files * update code * Try to fix build.sh * Try to fix build.sh * Update code * Update requirements.txt * Update code --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

32 Commits