FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 16:48:03 +08:00

Author	SHA1	Message	Date
yangjianfengo1	8e1b35a09b	【Fix bug] w4afp8 的nblock固定为256，并且fa3的append attn 增加mask参数 (#3771 ) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn	2025-09-02 19:17:01 +08:00
co63oc	d6369b4d51	fix typos (#3684 )	2025-09-01 17:50:17 +08:00
Liumengyuan	e93d4cfcdd	Add with_output version AppendAttention (#3302 ) * get use_output from fd_config * add clear TODO description * add mask_offset para to align with develop * fix bug * fix use_output logic * fix sot bug	2025-08-28 17:10:18 +08:00
xiaoxiaohehe001	ad319a87cc	support fa3 rope3d (#3622 )	2025-08-27 11:31:29 +08:00
yangjianfengo1	3a15e0c53e	【Fix Bug】修复 fa3 支持集中式bug (#3235 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix fa3 集中式bug * 增加qknorm参数	2025-08-06 16:24:27 +08:00
yangjianfengo1	64d7a3194d	集中式支持fa3 (#3112 )	2025-08-01 18:03:36 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
chen	332154f504	[feature] Support FA2 (#3009 )	2025-07-25 14:09:00 +08:00
lizhenyun01	29c3292f02	support c4 attn && fix cache	2025-07-24 12:00:52 +08:00
chen	172e69fe17	FA3 fix bug (#2987 )	2025-07-23 19:07:43 +08:00
lizhenyun01	e51f018577	support chunk_prefill in fa3	2025-07-23 12:19:20 +08:00
Nyakku Shigure	48e6a0ca26	[SOT] Mark dynamic dims by type annotations (#2771 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [SOT] Mark dynamic dims by type annotations * fix conflict of forward_meta * mark more attn backend * fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS * auto infer implicit 0 dim dynamic dim * revert manual marked dims * revert missing update * auto infer can use unsafe code in warmup stage * check -> type_match * fix codestyle * restore blank line * empty commit * add need_warmup nonlocal; * add doc for resolver * add missing type hints * unquote "ForwardMeta"	2025-07-22 00:23:52 -07:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
RichardWooSJTU	fee544e808	fix ep prefill (#2762 )	2025-07-09 14:03:05 +08:00
RichardWooSJTU	6610aa29d0	Revert "[Bug fix] fix attention rank init (#2743 )" (#2761 ) This reverts commit `e8bbe7244b`.	2025-07-09 10:38:12 +08:00
RichardWooSJTU	e8bbe7244b	[Bug fix] fix attention rank init (#2743 ) * fix attention rank init * fix attention rank init	2025-07-08 17:19:49 +08:00
gaoziyuan	26d5d737dd	【Fearture】support qwen2 some func (#2740 ) * add rl qwen model support * fix * fix	2025-07-08 12:03:04 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00

23 Commits