FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
K11OntheBoat	2e1680838f	[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 ) * Support deepseekv3 cache transfer for PD deploy * clean some log info --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-02 14:11:50 +08:00
lizhenyun01	aba4fc657f	[Feature] support flash_mask_attention backend (#5134 ) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code	2025-11-28 10:12:16 +08:00
ddchenhao66	e70e2279ce	[PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113 ) * [XPU] xpu support PD disaggregation * [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards * [XPU] xpu support PD disaggregation in v1 scheduler --------- Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-11-21 14:09:01 +08:00
Yonghua Li	43097a512a	[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol (#5132 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [fix] fix v1 scheduler profile run for append attention in prefill node * [fix] skip send_signal if kv signal not inited for gpu and xpu * [fix] extend fix to flash_attn & mla_attn * [fix] fix v1 pd run in ipc transfer protocol * [ci] add test for v1 pd profile run using ipc transfer protocol * [style] fix code style check * [style] fix code style again * [fix] fix profile run * [update] remove --num-gpu-blocks-override in example script * [chore] rename forward_meta is_profiling to is_dummy_or_profile_run	2025-11-20 21:39:22 +08:00
周周周	385fe6dade	[Others] clean code (#5133 )	2025-11-20 18:44:08 +08:00
周周周	6fa34102e8	[Others]get_block_shape_and_split_kv_block clean code (#5123 )	2025-11-20 16:40:04 +08:00
ltd0924	5bf48de999	[KVCache] support unified cache backend (#4903 ) * [Feature] support unified cache backend * fix * fix * fix * fix * Update metax_model_runner.py * fix * update * Update test_moba_attention_backend.py --------- Co-authored-by: ltd0924 <luotingdan@baidu.com>	2025-11-12 14:54:52 +08:00
周周周	da6b4c10e5	[ATTENTION] make buffer alloc as a function (#4945 )	2025-11-11 19:17:08 +08:00
lzy	3e9dda39ab	supports pd partn (#4615 ) * supports pd partn * fix codestype	2025-11-04 16:36:35 +08:00
freeliuzc	855a2a609a	fix attn_params (#4787 )	2025-11-04 13:01:38 +08:00
Yuan Xiaolan	8690cf8569	fix Cfp8 for RL load (#4144 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-11-03 17:51:51 +08:00
freeliuzc	11398790d3	[Speculative Decoding][MTP]Support attn mask offset (#4641 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [MTP]Merge support attn (#4591) * support mask_offset in speculate decoding * fix dummpy run output * add unit test * fix unit test import * support attn_mask_offset in mtp mode * add update_attn_mask op * fix unit test && fix code-style	2025-11-03 10:08:01 +08:00
Lucas	0a0c74e717	[XPU] Support PaddleOCR-VL model for XPU (#4529 ) * [XPU] support PaddleOCR-VL in XPU * [XPU] fix PaddleOCR-VL pos_emb_type	2025-10-28 20:35:04 +08:00
freeliuzc	c63361fd1d	[Speculative Decoding][MTP]Support mtp in epdptp mode (#4614 ) * support mtp many features * support mtp reshard in rl mode * fix function * support mtp ep * support mtp in hybird-dp-tp mode * default open scheduler_v1 in mtp	2025-10-28 16:02:47 +08:00
Haonan Luo	1b9f351d21	Support GPT-OSS-BF16 (#4240 ) * [Feature] AppendAtten support sinks & HEAD_DIM=64 * fix bug * fix bug * fix bug * fix bug * [Feature] support gpt-oss * fix bug * add mask * support-gpt-oss * support-gpt-oss * fix long seq * support wint8 * support wint8 * support wint8 * update test * change sliding windows init pos --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-10-20 14:44:58 +08:00
yzwu	4b661512ca	[Iluvatar GPU] Adapt VL model (#4313 )	2025-10-17 16:13:38 +08:00
Sunny-bot1	930f7b781c	[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend (#4443 ) * get block in cuda graph * fix sot	2025-10-17 10:59:56 +08:00
Ryan	49cea8fb1c	[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp (#3694 ) * rm inplace info && to(gpu) * update append_attention * unpin paddle version * add full_cuda_graph=False * add blank line --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>	2025-10-17 10:57:55 +08:00
YuanRisheng	0355235fb9	[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400 ) * delete some attr in parallel config * delete comment --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-16 20:00:37 +08:00
Lucas	bdc0207277	[XPU] fix VL multi-batch accuracy issue (#4394 )	2025-10-15 17:27:43 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
YuanRisheng	a2ec2c4152	[FDConfig]Remove max_model_len in FDConfig (#4350 ) * modify max_model_len * fix unittest * fix unittest --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-11 14:04:17 +08:00
yinwei	20c7b741f4	[XPU] Support W4A8C8-TP4-300B Model (#4068 ) * support w4a8 * delete ep block attn * delete moe_topk_select * update note * update * delte useless info * update * add some note * fix some format * update scale info * add ans baseline --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-10-10 15:41:32 +08:00
Lucas	87179cb744	[XPU] support XPU VL model inference (#4030 ) * [XPU] support XPU VL model inference * fix image op import and device check * rebase develop * fix perf	2025-09-25 14:34:15 +08:00
yangjianfengo1	4325b737e7	【FIX】Change the name of sparse attn from moba to plas (#4006 ) (#4076 ) * 【FIX】Change the name of sparse attn from moba to plas (#4006) * 更新文档 * 【docs】 update readme (#4000) * 更新文档 * update readme * update docs * 【FIX】Change the name of sparse attn from moba to plas (#3845) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci * code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * fix max_num_seqs * fix test load attn --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-23 10:26:40 +08:00
yzwu	504461b6b5	[Iluvatar GPU] Optimize attention performance and fix moe load ckpt error (#3651 )	2025-09-22 21:13:59 +08:00
YuanRisheng	2e9e53ff7e	[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116 ) * remove max_num_batched_tokens in parallel config * remove max_num_seqs * update test case * fix test * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-17 10:43:35 +08:00
co63oc	8466219ec8	fix typos (#3840 ) * fix typos * ci --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-09-12 11:04:38 +08:00
AIbin	a7392a0ff9	【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 ) * support MLA chunk_size auto search & cuda_graph	2025-09-11 10:46:09 +08:00
YuanRisheng	b3fac5bde1	[V1 Loader] Ernie kv cache quant support v1 loader (#3899 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * support c8 for ernie * add unittest * support vl * fix c8	2025-09-09 05:25:08 -07:00
Jiang-Jia-Jun	c60adf4281	Revert "【FIX】Change the name of sparse attn from moba to plas (#3845 )" (#4001 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details This reverts commit `e31c8f7336`.	2025-09-09 11:08:23 +08:00
yangjianfengo1	e31c8f7336	【FIX】Change the name of sparse attn from moba to plas (#3845 ) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci	2025-09-09 10:56:50 +08:00
Jundong Liu	3d0aaa5923	[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 ) * Support prefill in Cudagraph * Refactor GetBlockShapeAndSplitKVBlock Kernel V2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5 * Solve problem about encoder_num_blocks_x_cpu * Add early-exit mechanism for attention kernel * fix test case about append-attention * Update testcode, Add annotations to related tensors * move get_input_length_list * solve test_code * Add annotations about early-exit for attention kernel * Add annotations about early-exit for attention kernel2 * solve comment * solve mtp --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-09-08 13:12:24 +08:00
lzy	af49b81ffd	supports dynamic Cfp8 (#3767 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * supports dynamic Cfp8 * add unittest	2025-09-07 20:41:29 -07:00
yangjianfengo1	8e1b35a09b	【Fix bug] w4afp8 的nblock固定为256，并且fa3的append attn 增加mask参数 (#3771 ) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn	2025-09-02 19:17:01 +08:00
co63oc	d6369b4d51	fix typos (#3684 )	2025-09-01 17:50:17 +08:00
lizhenyun01	bed09ae8f8	fix mask_offset in append_attn (#3745 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix mask_offset in append_attn * fix test	2025-08-31 15:03:16 +08:00
chen	7568b20098	check (#3720 )	2025-08-30 16:04:20 +08:00
yangjianfengo1	3754a9906d	[Feature] block sparse attention (#3668 ) * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config * 修复显存 * add test server * add test server * fix mlp 最后一层使用full attn	2025-08-29 19:46:30 +08:00
lifulll	72094d4d82	enable dcu ci (#3402 )	2025-08-29 10:23:08 +08:00
Yuanle Liu	4957908275	add input_processor plugin (#3657 ) * add input_processor plugin * update * update * update * update * update * update * update * update * update * update * update	2025-08-28 22:53:57 +08:00
Liumengyuan	e93d4cfcdd	Add with_output version AppendAttention (#3302 ) * get use_output from fd_config * add clear TODO description * add mask_offset para to align with develop * fix bug * fix use_output logic * fix sot bug	2025-08-28 17:10:18 +08:00
Jiang-Jia-Jun	c694fa2879	Revert "[Feature] block sparse attention (#3209 )" (#3647 ) This reverts commit `646a0c2fd8`.	2025-08-27 17:35:04 +08:00
xiaoxiaohehe001	ad319a87cc	support fa3 rope3d (#3622 )	2025-08-27 11:31:29 +08:00
yangjianfengo1	646a0c2fd8	[Feature] block sparse attention (#3209 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * 支持稀疏attn * fix bug * code style * fix moba attn get kv shape * 修复a100编译 * codestyle * code style * code style * code style * fix conflict * 增加单侧 * code style * 增加eblite 加载时间 * fix bug * for ci * for ci * for ci * for ci * 支持mlp block size 128 * 增加小算子单测 * fix 单测 mlp * 将环境变量加入到config里面 * fix rollout config	2025-08-26 07:16:04 -07:00
Ryan	bcdfc1d6b9	Add custom op declaration for `all_reduce` (#3473 ) * add custom op declaration * roll back try except	2025-08-20 20:29:58 +08:00
AIbin	beec24fd89	【Inference Optimize】DeepSeek-v3 model inference performance optimization (#3455 ) * DSK_OPT_01 * update FA3	2025-08-19 10:42:42 +08:00
lzy	1e06b9fa6d	make append_attn supports mask_offset (#3138 ) * make append_attn supports mask_offset * add unittest	2025-08-14 03:40:55 -07:00
Kane2011	b4fef2cf29	[MetaxGPU] Support FastDeploy on metax gpu (#3241 ) * [MetaxGPU] Support FastDeploy on metax gpu * Update metax_worker.py 1. change worker log; 2. remove custom allreduce, adapt it later; 3. remove cuda graph; * Update __init__.py 1. remove metax's key work comment * Update __init__.py 1. remove metax's key word comment; 2. add fused_moe_kernel_paddle import --------- Co-authored-by: yongqiangma <xing.wo@163.com>	2025-08-13 11:11:54 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00

1 2

83 Commits