K11OntheBoat
2e1680838f
[PD Disaggregation] Support PD deployment of DeepSeekv3. ( #5251 )
...
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-02 14:11:50 +08:00
lizhenyun01
aba4fc657f
[Feature] support flash_mask_attention backend ( #5134 )
...
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
2025-11-28 10:12:16 +08:00
ddchenhao66
e70e2279ce
[PD Disaggregation][XPU] Add XPU support for PD disaggregation ( #5113 )
...
* [XPU] xpu support PD disaggregation
* [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards
* [XPU] xpu support PD disaggregation in v1 scheduler
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-21 14:09:01 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
周周周
385fe6dade
[Others] clean code ( #5133 )
2025-11-20 18:44:08 +08:00
周周周
6fa34102e8
[Others]get_block_shape_and_split_kv_block clean code ( #5123 )
2025-11-20 16:40:04 +08:00
ltd0924
5bf48de999
[KVCache] support unified cache backend ( #4903 )
...
* [Feature] support unified cache backend
* fix
* fix
* fix
* fix
* Update metax_model_runner.py
* fix
* update
* Update test_moba_attention_backend.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-11-12 14:54:52 +08:00
周周周
da6b4c10e5
[ATTENTION] make buffer alloc as a function ( #4945 )
2025-11-11 19:17:08 +08:00
lzy
3e9dda39ab
supports pd partn ( #4615 )
...
* supports pd partn
* fix codestype
2025-11-04 16:36:35 +08:00
freeliuzc
855a2a609a
fix attn_params ( #4787 )
2025-11-04 13:01:38 +08:00
Yuan Xiaolan
8690cf8569
fix Cfp8 for RL load ( #4144 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-03 17:51:51 +08:00
freeliuzc
11398790d3
[Speculative Decoding][MTP]Support attn mask offset ( #4641 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [MTP]Merge support attn (#4591 )
* support mask_offset in speculate decoding
* fix dummpy run output
* add unit test
* fix unit test import
* support attn_mask_offset in mtp mode
* add update_attn_mask op
* fix unit test && fix code-style
2025-11-03 10:08:01 +08:00
Lucas
0a0c74e717
[XPU] Support PaddleOCR-VL model for XPU ( #4529 )
...
* [XPU] support PaddleOCR-VL in XPU
* [XPU] fix PaddleOCR-VL pos_emb_type
2025-10-28 20:35:04 +08:00
freeliuzc
c63361fd1d
[Speculative Decoding][MTP]Support mtp in epdptp mode ( #4614 )
...
* support mtp many features
* support mtp reshard in rl mode
* fix function
* support mtp ep
* support mtp in hybird-dp-tp mode
* default open scheduler_v1 in mtp
2025-10-28 16:02:47 +08:00
Haonan Luo
1b9f351d21
Support GPT-OSS-BF16 ( #4240 )
...
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-10-20 14:44:58 +08:00
yzwu
4b661512ca
[Iluvatar GPU] Adapt VL model ( #4313 )
2025-10-17 16:13:38 +08:00
Sunny-bot1
930f7b781c
[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend ( #4443 )
...
* get block in cuda graph
* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp ( #3694 )
...
* rm inplace info && to(gpu)
* update append_attention
* unpin paddle version
* add full_cuda_graph=False
* add blank line
---------
Co-authored-by: SigureMo <sigure.qaq@gmail.com >
2025-10-17 10:57:55 +08:00
YuanRisheng
0355235fb9
[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig ( #4400 )
...
* delete some attr in parallel config
* delete comment
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-16 20:00:37 +08:00
Lucas
bdc0207277
[XPU] fix VL multi-batch accuracy issue ( #4394 )
2025-10-15 17:27:43 +08:00
Sunny-bot1
a751d977bc
[Optimization] Fuse get_max_len and get_kv_max_len ( #4369 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block
* fuse max_lens and max kv_len
2025-10-13 20:35:00 +08:00
YuanRisheng
a2ec2c4152
[FDConfig]Remove max_model_len in FDConfig ( #4350 )
...
* modify max_model_len
* fix unittest
* fix unittest
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-11 14:04:17 +08:00
yinwei
20c7b741f4
[XPU] Support W4A8C8-TP4-300B Model ( #4068 )
...
* support w4a8
* delete ep block attn
* delete moe_topk_select
* update note
* update
* delte useless info
* update
* add some note
* fix some format
* update scale info
* add ans baseline
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-10-10 15:41:32 +08:00
Lucas
87179cb744
[XPU] support XPU VL model inference ( #4030 )
...
* [XPU] support XPU VL model inference
* fix image op import and device check
* rebase develop
* fix perf
2025-09-25 14:34:15 +08:00
yangjianfengo1
4325b737e7
【FIX】Change the name of sparse attn from moba to plas ( #4006 ) ( #4076 )
...
* 【FIX】Change the name of sparse attn from moba to plas (#4006 )
* 更新文档
* 【docs】 update readme (#4000 )
* 更新文档
* update readme
* update docs
* 【FIX】Change the name of sparse attn from moba to plas (#3845 )
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
* code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* fix max_num_seqs
* fix test load attn
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-23 10:26:40 +08:00
yzwu
504461b6b5
[Iluvatar GPU] Optimize attention performance and fix moe load ckpt error ( #3651 )
2025-09-22 21:13:59 +08:00
YuanRisheng
2e9e53ff7e
[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config ( #4116 )
...
* remove max_num_batched_tokens in parallel config
* remove max_num_seqs
* update test case
* fix test
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-17 10:43:35 +08:00
co63oc
8466219ec8
fix typos ( #3840 )
...
* fix typos
* ci
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-12 11:04:38 +08:00
AIbin
a7392a0ff9
【Inference Optimize】DeepSeek-V3-model MLA Optimize ( #3886 )
...
* support MLA chunk_size auto search & cuda_graph
2025-09-11 10:46:09 +08:00
YuanRisheng
b3fac5bde1
[V1 Loader] Ernie kv cache quant support v1 loader ( #3899 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support c8 for ernie
* add unittest
* support vl
* fix c8
2025-09-09 05:25:08 -07:00
Jiang-Jia-Jun
c60adf4281
Revert "【FIX】Change the name of sparse attn from moba to plas ( #3845 )" ( #4001 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit e31c8f7336 .
2025-09-09 11:08:23 +08:00
yangjianfengo1
e31c8f7336
【FIX】Change the name of sparse attn from moba to plas ( #3845 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
2025-09-09 10:56:50 +08:00
Jundong Liu
3d0aaa5923
[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )
...
* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-09-08 13:12:24 +08:00
lzy
af49b81ffd
supports dynamic Cfp8 ( #3767 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
2025-09-07 20:41:29 -07:00
yangjianfengo1
8e1b35a09b
【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 ( #3771 )
...
* fix w4afp8
* 增加集中式配置
* codestyle
* fix fa3 append attn
2025-09-02 19:17:01 +08:00
co63oc
d6369b4d51
fix typos ( #3684 )
2025-09-01 17:50:17 +08:00
lizhenyun01
bed09ae8f8
fix mask_offset in append_attn ( #3745 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn
* fix test
2025-08-31 15:03:16 +08:00
chen
7568b20098
check ( #3720 )
2025-08-30 16:04:20 +08:00
yangjianfengo1
3754a9906d
[Feature] block sparse attention ( #3668 )
...
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp 最后一层使用full attn
2025-08-29 19:46:30 +08:00
lifulll
72094d4d82
enable dcu ci ( #3402 )
2025-08-29 10:23:08 +08:00
Yuanle Liu
4957908275
add input_processor plugin ( #3657 )
...
* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2025-08-28 22:53:57 +08:00
Liumengyuan
e93d4cfcdd
Add with_output version AppendAttention ( #3302 )
...
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug
2025-08-28 17:10:18 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8 .
2025-08-27 17:35:04 +08:00
xiaoxiaohehe001
ad319a87cc
support fa3 rope3d ( #3622 )
2025-08-27 11:31:29 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
Ryan
bcdfc1d6b9
Add custom op declaration for all_reduce ( #3473 )
...
* add custom op declaration
* roll back try except
2025-08-20 20:29:58 +08:00
AIbin
beec24fd89
【Inference Optimize】DeepSeek-v3 model inference performance optimization ( #3455 )
...
* DSK_OPT_01
* update FA3
2025-08-19 10:42:42 +08:00
lzy
1e06b9fa6d
make append_attn supports mask_offset ( #3138 )
...
* make append_attn supports mask_offset
* add unittest
2025-08-14 03:40:55 -07:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00