FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-31 20:02:53 +08:00

Author	SHA1	Message	Date
AIbin	a7392a0ff9	【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 ) * support MLA chunk_size auto search & cuda_graph	2025-09-11 10:46:09 +08:00
Jundong Liu	3d0aaa5923	[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 ) * Support prefill in Cudagraph * Refactor GetBlockShapeAndSplitKVBlock Kernel V2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5 * Solve problem about encoder_num_blocks_x_cpu * Add early-exit mechanism for attention kernel * fix test case about append-attention * Update testcode, Add annotations to related tensors * move get_input_length_list * solve test_code * Add annotations about early-exit for attention kernel * Add annotations about early-exit for attention kernel2 * solve comment * solve mtp --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-09-08 13:12:24 +08:00
Liumengyuan	e93d4cfcdd	Add with_output version AppendAttention (#3302 ) * get use_output from fd_config * add clear TODO description * add mask_offset para to align with develop * fix bug * fix use_output logic * fix sot bug	2025-08-28 17:10:18 +08:00
xiaoxiaohehe001	ad319a87cc	support fa3 rope3d (#3622 )	2025-08-27 11:31:29 +08:00
lzy	1e06b9fa6d	make append_attn supports mask_offset (#3138 ) * make append_attn supports mask_offset * add unittest	2025-08-14 03:40:55 -07:00
Yuan Xiaolan	7ce00e597c	support qk norm (#3145 )	2025-08-05 16:46:14 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
lizhenyun01	e51f018577	support chunk_prefill in fa3	2025-07-23 12:19:20 +08:00
lifulll	2c6a9e887e	native top_p_sampling (#2901 )	2025-07-22 14:09:59 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
Ryan	b0f525955c	[SOT] Remove breakgraph in post processing && fix datatype (#2780 )	2025-07-10 11:26:00 +08:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

19 Commits