FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
lizhenyun01	aba4fc657f	[Feature] support flash_mask_attention backend (#5134 ) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code	2025-11-28 10:12:16 +08:00
ddchenhao66	e70e2279ce	[PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113 ) * [XPU] xpu support PD disaggregation * [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards * [XPU] xpu support PD disaggregation in v1 scheduler --------- Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-11-21 14:09:01 +08:00
周周周	6fa34102e8	[Others]get_block_shape_and_split_kv_block clean code (#5123 )	2025-11-20 16:40:04 +08:00
Haonan Luo	1b9f351d21	Support GPT-OSS-BF16 (#4240 ) * [Feature] AppendAtten support sinks & HEAD_DIM=64 * fix bug * fix bug * fix bug * fix bug * [Feature] support gpt-oss * fix bug * add mask * support-gpt-oss * support-gpt-oss * fix long seq * support wint8 * support wint8 * support wint8 * update test * change sliding windows init pos --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-10-20 14:44:58 +08:00
Ryan	49cea8fb1c	[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694 ) * rm inplace info && to(gpu) * update append_attention * unpin paddle version * add full_cuda_graph=False * add blank line --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>	2025-10-17 10:57:55 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
AIbin	a7392a0ff9	【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 ) * support MLA chunk_size auto search & cuda_graph	2025-09-11 10:46:09 +08:00
Jundong Liu	3d0aaa5923	[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 ) * Support prefill in Cudagraph * Refactor GetBlockShapeAndSplitKVBlock Kernel V2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5 * Solve problem about encoder_num_blocks_x_cpu * Add early-exit mechanism for attention kernel * fix test case about append-attention * Update testcode, Add annotations to related tensors * move get_input_length_list * solve test_code * Add annotations about early-exit for attention kernel * Add annotations about early-exit for attention kernel2 * solve comment * solve mtp --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-09-08 13:12:24 +08:00
Liumengyuan	e93d4cfcdd	Add with_output version AppendAttention (#3302 ) * get use_output from fd_config * add clear TODO description * add mask_offset para to align with develop * fix bug * fix use_output logic * fix sot bug	2025-08-28 17:10:18 +08:00
xiaoxiaohehe001	ad319a87cc	support fa3 rope3d (#3622 )	2025-08-27 11:31:29 +08:00
lzy	1e06b9fa6d	make append_attn supports mask_offset (#3138 ) * make append_attn supports mask_offset * add unittest	2025-08-14 03:40:55 -07:00
Yuan Xiaolan	7ce00e597c	support qk norm (#3145 )	2025-08-05 16:46:14 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
lizhenyun01	e51f018577	support chunk_prefill in fa3	2025-07-23 12:19:20 +08:00
lifulll	2c6a9e887e	native top_p_sampling (#2901 )	2025-07-22 14:09:59 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
freeliuzc	d49f8fb30a	[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 ) * support chunk_prefill both normal and speculative_decoding(mtp) * optimize pd-disaggregation config * fix bug	2025-07-17 17:58:08 +08:00
周周周	aa76085d1f	[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)	2025-07-16 20:10:57 +08:00
Ryan	b0f525955c	[SOT] Remove breakgraph in post processing && fix datatype (#2780 )	2025-07-10 11:26:00 +08:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

25 Commits