qw86972190
6048ea37bd
[XPU]add enable_logprob ( #5279 )
...
* [XPU]Update document
* [XPU]Update documentation
* [XPU]add enable_logprob
* Fix code style issues
* “doc”
* “docs”
* “doc”
* Fix code style via pre-commit
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com >
2025-12-02 15:32:28 +08:00
K11OntheBoat
2e1680838f
[PD Disaggregation] Support PD deployment of DeepSeekv3. ( #5251 )
...
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-02 14:11:50 +08:00
chen
aa35ce449d
[Optimization] EP empty_input_forward Remove Communication ( #5254 )
2025-12-01 21:10:40 +08:00
cmcamdy
3149aed750
fix_gather_next_token ( #5311 )
2025-12-01 18:00:30 +08:00
K11OntheBoat
7bafcf1df3
[OP]Remove extra H2D in DeepGemm ( #5262 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-11-28 14:23:44 +08:00
周周周
95243f012c
[Others] add PADDLE_ENFORCE ( #5288 )
2025-11-28 14:23:35 +08:00
lizhenyun01
aba4fc657f
[Feature] support flash_mask_attention backend ( #5134 )
...
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
2025-11-28 10:12:16 +08:00
cmcamdy
5a67a6d960
[XPU] support kernel for mtp(base) ( #4748 )
...
* [XPU] support kernel for mtp(base)
* [XPU] support kernel for mtp(base)
* format
* format
* format
* fix gather next token
* fix step && add test
* fix
* mv pre/post process
* add adjust batch / gather next token for mtp
* fix code style
* fix mtp kenrel name
* fix mtp kernel test
* mv xpu pre/post process
* mv xpu pre/post process
2025-11-27 15:05:44 +08:00
GoldPancake
cfc5b0ccf9
[BugFix] fix mtp logprob bugs in chunk prefill ( #5244 )
...
* fix mtp logprob bugs in chunk prefill
* fix
* fix
2025-11-27 11:31:29 +08:00
freeliuzc
ba915e03e1
[BugFix]Fix attention mask bug in D-Node of PD-split mode ( #5245 )
2025-11-26 17:56:28 +08:00
xiaoxiaohehe001
61fc368066
[Fix] fix eplb noaux ( #5239 )
...
* fix eplb noaux
* fix eplb noaux
2025-11-26 17:50:51 +08:00
zccjjj
ea3bc5b4ca
[XPU] Fix the error in MoeExpertFFN operator when valid_token_num=0 ( #5196 )
2025-11-25 10:07:20 +08:00
megemini
c06cfe2447
【Hackathon 9th No.109】[CppExtension] 添加 fastdeploy_ops 目录到 package_data 以支持现代打包方式 ( #5156 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: SigureMo <sigure.qaq@gmail.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-22 01:32:06 +08:00
kevin
c068a4f642
[Feature] dyc8 support prefixcache ( #5125 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* dyc8 support prefixcache
* fix cache_trans test case
* update code
2025-11-21 19:46:26 +08:00
freeliuzc
2d1dade5e2
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage ( #5155 )
...
* support static cachekv c8 quantization in mtp mode
* optimize memory allocation
2025-11-21 15:10:13 +08:00
xiaoxiaohehe001
6ca2651995
[Feature] Support noaux for eplb ( #5143 )
...
* support noaux eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
2025-11-21 14:10:32 +08:00
ddchenhao66
e70e2279ce
[PD Disaggregation][XPU] Add XPU support for PD disaggregation ( #5113 )
...
* [XPU] xpu support PD disaggregation
* [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards
* [XPU] xpu support PD disaggregation in v1 scheduler
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-21 14:09:01 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
Jundong Liu
147b2e5eb0
[BugFix] Fix zero workspace returned by CUB size query under CUDA Graph in MoE dispatch ( #5087 )
...
* fix bug about CubKeyValueSorter::run
* pre-commit and add comment
* pre-commit
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix precommit
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-11-20 20:00:29 +08:00
周周周
385fe6dade
[Others] clean code ( #5133 )
2025-11-20 18:44:08 +08:00
周周周
6fa34102e8
[Others]get_block_shape_and_split_kv_block clean code ( #5123 )
2025-11-20 16:40:04 +08:00
Neil Zhu
0edda75a56
[Metax] optimize cutlass moe and flash attention backend ( #5128 )
2025-11-20 16:12:35 +08:00
freeliuzc
f1e36ff2f7
[Speculative Decoding][MTP]Support stop_seqs and pd-split mode ( #5029 )
...
* support multi_stop_seqs in speculative decoding
* support mtp tp with ep split
* fix custom op register
* fix spec stop_seqs params
2025-11-20 15:26:01 +08:00
chen
9ff418db73
check METAX_GPU ( #5114 )
2025-11-19 16:02:21 +08:00
megemini
3c8c0f0d6c
【Hackathon 9th No.109】[CppExtension] [XPU] Support build Custom OP in setuptools 80+ -part ( #5106 )
...
* [CppExtension] 添加现代Python打包方法兼容性支持
* [CppExtension] 移除构建脚本中的错误退出逻辑
* [CppExtension] 移除现代Python打包兼容性代码,仅保留传统打包方式
* [CppExtension] 恢复现代Python打包兼容性支持并优化目录检测逻辑
2025-11-19 13:33:39 +08:00
lizhenyun01
d11235333e
format flash_mask_attn
2025-11-18 17:18:12 +08:00
lizhenyun01
cd2c4df64a
format flash_mask_attn
2025-11-18 17:18:12 +08:00
yzwu
d5d0602859
[Iluvatar][CI] disable compiling cudaLaunch API ( #5100 )
2025-11-18 14:15:31 +08:00
chen
d58c1db8a0
[Feature][OP] Append Attn Support CUDA-PDL ( #5072 )
2025-11-17 20:47:33 +08:00
周周周
b23e684b67
revert group size 3 ( #5079 )
2025-11-17 18:54:13 +08:00
Sunny-bot1
8a4ddb29df
Revert "[BugFix] Revert skip capture ( #5023 )" ( #5080 )
2025-11-17 16:14:55 +08:00
yangjianfengo1
3afb717995
【Fix】fix deepep dispatch ( #5036 )
...
* fix dispatch
* fix dispatch
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-11-17 10:34:01 +08:00
Sunny-bot1
249feca65a
[BugFix] Revert skip capture ( #5023 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Revert "[BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000 )"
This reverts commit 05da8e34c0 .
* Revert "skip DtoH capture (#4988 )"
This reverts commit 5b24013d46 .
2025-11-13 23:52:51 -08:00
周周周
c0a4393d72
[ATTENTION] unitest ( #4962 )
2025-11-14 13:45:53 +08:00
carryyu
6c3d1da62f
fix conflicts
2025-11-13 20:30:29 +08:00
yangjianfengo1
ae7bee8122
【New Feature】W4afp8 supports per group quantization ( #4987 )
...
* w4afp8 支持per group
* code style
* fix transpose
* revert fast hardmard
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com >
2025-11-13 19:17:27 +08:00
Sunny-bot1
05da8e34c0
[BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block ( #5000 )
...
* fix metax compile
* fix
2025-11-13 00:55:06 -08:00
Sunny-bot1
5b24013d46
skip DtoH capture ( #4988 )
2025-11-13 10:57:44 +08:00
Lucas
da7863ae85
[XPU] fix text_image_gather_scatter when image_token_num == token_num && text_token_num == 1 ( #4882 )
2025-11-12 17:13:22 +08:00
xiaozude
c45b3ccb52
[Metax] optimize flash mla ( #4915 )
2025-11-12 16:43:46 +08:00
yzwu
3707af7a4f
[Iluvatar] add vl into ci and support v1 loader ( #4774 )
2025-11-11 10:50:17 +08:00
Neil Zhu
6de1ce3b25
[Metax] support ERNIE-4.5-VL-28B ( #4820 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-07 04:55:49 -08:00
ming1753
cba185f1fe
[Feature] Optim PaddleOCR-VL ( #4873 )
...
* [Feature] Optim PaddleOCR-VL
* fix bug
2025-11-07 14:56:44 +08:00
YuBaoku
819b2dbbae
Revert "【New Feature】W4afp8 supports per group quantization ( #4272 )" ( #4854 )
...
This reverts commit 93fcf7e4ec .
2025-11-06 17:48:28 +08:00
yangjianfengo1
93fcf7e4ec
【New Feature】W4afp8 supports per group quantization ( #4272 )
...
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-11-05 21:00:23 +08:00
周周周
937eb3c6ed
[get_padding_offset.] clean get_padding_offset.cu ( #4777 )
...
[get_padding_offset.] clean get_padding_offset.cu (#4777 )
2025-11-05 10:47:40 +08:00
xiaozude
74722308f2
[Metax] adapt cutlass moe and fix mla attention ( #4602 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-11-05 10:03:49 +08:00
ddchenhao66
bffa08b74b
[XPU] fix thinking bug where output only contains reasoning_content ( #4761 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-04 14:32:35 +08:00
Neil Zhu
c95d0740ec
[Metax] adapt cutlass moe for ernie-vl ( #4685 )
2025-11-03 17:44:27 +08:00
freeliuzc
11398790d3
[Speculative Decoding][MTP]Support attn mask offset ( #4641 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [MTP]Merge support attn (#4591 )
* support mask_offset in speculate decoding
* fix dummpy run output
* add unit test
* fix unit test import
* support attn_mask_offset in mtp mode
* add update_attn_mask op
* fix unit test && fix code-style
2025-11-03 10:08:01 +08:00