RuohengMa
2c3c983b96
[XPU] modify speculate_verify ( #5522 )
2025-12-23 14:50:30 +08:00
lizexu123
6d323769dd
fix w4afp8 ( #5634 )
2025-12-22 13:39:41 +08:00
chen
a32cb54d0b
[BugFix] Fix custom_all_reduce overflow ( #5662 )
...
* check
* check
* code style
2025-12-19 18:24:21 +08:00
lizan1999
ec6811f648
support token num = 0 ( #5635 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
Co-authored-by: cmcamdy <1027740945@qq.com >
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-19 10:20:38 +08:00
yzwu
ac013803f3
[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode ( #5555 )
2025-12-18 02:14:25 -08:00
lizan1999
e1a9b282eb
fix bug for EP+MTP ( #5605 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
2025-12-18 14:34:54 +08:00
zhupengyang
8735cb5045
[XPU] refactor moe ffn ( #5501 )
...
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
Yuanle Liu
cdc0004894
Revert "[Feature] add ue8m0 for per_token_quant_fp8 ( #5563 )" ( #5611 )
...
This reverts commit 73e1d6aa90 .
2025-12-17 13:59:06 +08:00
Yuanle Liu
867803ae10
[BugFix] fix speculate_limit_thinking_content_length ( #5590 )
...
* fix speculate_limit_thinking_content_length
* update
2025-12-16 04:31:45 -08:00
chen
27ef3610b5
support glm fa3 ( #5586 )
2025-12-16 19:33:27 +08:00
fxyfxy777
73e1d6aa90
[Feature] add ue8m0 for per_token_quant_fp8 ( #5563 )
...
* ue8m0
* add default arg
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-16 18:40:12 +08:00
Echo-Nie
50100f98d7
[Feature] Support fusedmoe on Blackwell ( #5325 )
...
* update sm100
* fix
* fix style
2025-12-16 11:58:50 +08:00
freeliuzc
532f9ba227
[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding ( #5491 )
...
* [liuzichang spend 10 dyas]fix write qknorm cache bug
* fix 'fix cachekv bug''
2025-12-15 18:27:11 +08:00
ddchenhao66
9f70f4310e
[PD Disaggregation][XPU] update_inputs_v1 operator supports PD ( #5550 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-15 15:39:38 +08:00
chen
a389bb7c5c
[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache ( #5486 )
2025-12-12 17:10:17 +08:00
RuohengMa
12c76f8137
[XPU] add speculate_get_logits ( #5497 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_get_logits
* delete context
* add ptr check
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-12 15:38:30 +08:00
Lucas
888c4b992d
[XPU] refactor of block_attn param 'pos_emb_type' ( #5511 )
2025-12-12 14:30:09 +08:00
Juncai
d67388a479
[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill ( #5514 )
...
* Distinguish the pipelines for sending kv signal in different prefill
* up
2025-12-12 14:05:36 +08:00
cmcamdy
3c1f7b85a4
[XPU] support get hidden state for mix ( #5513 )
...
* fix git hidden states
* fix code style
* fix code style
2025-12-12 10:31:20 +08:00
FocusLuo
c3aaa7e441
[BugFix] Fixed build script issue on Intel HPU platforms ( #5455 )
...
* [INTEL HPU] Fixed build script issue for non-gpu platforms
Signed-off-by: Luo, Focus <focus.luo@intel.com >
* [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu
Signed-off-by: Luo, Focus <focus.luo@intel.com >
---------
Signed-off-by: Luo, Focus <focus.luo@intel.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-11 16:36:37 +08:00
Neil Zhu
4403a21d4b
[Metax] refactor cutlass moe and optimize flash attention ( #5361 )
...
* [Metax] refactor moe and flash attention backend
---------
Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com >
2025-12-10 17:15:17 +08:00
Copilot
e38709b499
[BugFix] Fix limit_thinking early return logic in CUDA kernels ( #5471 )
...
* Initial plan
* [BugFix] Fix limit_thinking bug - change AND to OR in condition checks
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* Update Chinese comments to reflect OR logic instead of AND
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
2025-12-10 11:03:19 +08:00
lzy
99f607eef5
[Others] Maintain the mtp branch temporarily. ( #5446 )
2025-12-09 19:17:53 +08:00
lizexu123
95eab9f9ee
[Feature] support stop_token_ids ( #5399 )
...
* support stop_token_ids
* fix
* delete chinese
* support both
* delete print
2025-12-09 17:49:12 +08:00
xiaozude
df67379bc3
[Metax] modify wrapSize to WARP_SIZE ( #5442 )
2025-12-09 01:44:02 -08:00
周周周
31410415db
FA3 support qwen3 ( #5441 )
2025-12-09 16:16:16 +08:00
RuohengMa
8178e3fc6a
[XPU] add speculate_step_system_cache ( #5397 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2025-12-09 14:40:11 +08:00
K11OntheBoat
8d99bac532
Remove CUDA ERROR 9 of inputs of get_padding_offset kernel ( #5440 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-09 14:17:30 +08:00
周周周
2aea8a3a60
[Others] Remove useless code ( #5404 )
2025-12-08 13:59:46 +08:00
GoldPancake
8545b705ed
fix top_p_candidates ( #5400 )
...
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
2025-12-05 20:01:05 +08:00
Lucas
8f2b85362d
[XPU] support moe_expert_ffn TGEMM selection ( #5375 )
2025-12-05 17:49:40 +08:00
Lucas
3aed8d257d
[XPU] redirect xvllm/xtdk/xhpc downloading log ( #5388 )
2025-12-05 17:34:17 +08:00
cmcamdy
86b6430582
fix split_rope_cache_kv_encoder in mix mtp ( #5384 )
2025-12-05 14:33:17 +08:00
Lucas
7b0b6e470a
[XPU] support XDNN downloading function ( #5365 )
2025-12-05 11:16:45 +08:00
Nyakku Shigure
f88c159de1
[BugFix] Exit if neither modern nor legacy wheel dir not found ( #5367 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-04 16:45:48 +08:00
Yonghua Li
f4119d51b4
[PD Disaggregation] support DP via v1 router and decouple DP and EP ( #5197 )
...
* [fix] support DP via v1 router and decouple DP and EP
* [fix] fix scripts
* [fix] reset model path
* [fix] dp use get_output_ep, fix router port type, update scripts
* [merge] merge with latest code
* [chore] remove some debug log
* [fix] fix code style check
* [fix] fix test_multi_api_server for log_dir name
* [chore] reduce logs
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-04 15:38:43 +08:00
周周周
a36d60aa18
[FIX BUG] fix bug in TP in permute_x_fp8_kernel ( #5350 )
...
* commit
* commit
* commit
* commit
* commit
* commit
2025-12-03 05:17:37 -08:00
Sunny-bot1
d5a9b75b4e
fix cutlass ep ( #5337 )
2025-12-03 14:06:01 +08:00
lzy
c71a44c7e5
supports mtp split_kv_attn ( #5343 )
2025-12-03 12:40:16 +08:00
Sunny-bot1
3629db4129
[Quantization] Support w4afp8 MoE dynamic quantization ( #5282 )
...
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com >
2025-12-02 18:56:16 +08:00
周周周
fb7f951612
[UNITEST] add test ( #5305 )
2025-12-02 17:59:01 +08:00
qw86972190
6048ea37bd
[XPU]add enable_logprob ( #5279 )
...
* [XPU]Update document
* [XPU]Update documentation
* [XPU]add enable_logprob
* Fix code style issues
* “doc”
* “docs”
* “doc”
* Fix code style via pre-commit
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com >
2025-12-02 15:32:28 +08:00
K11OntheBoat
2e1680838f
[PD Disaggregation] Support PD deployment of DeepSeekv3. ( #5251 )
...
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-02 14:11:50 +08:00
chen
aa35ce449d
[Optimization] EP empty_input_forward Remove Communication ( #5254 )
2025-12-01 21:10:40 +08:00
cmcamdy
3149aed750
fix_gather_next_token ( #5311 )
2025-12-01 18:00:30 +08:00
K11OntheBoat
7bafcf1df3
[OP]Remove extra H2D in DeepGemm ( #5262 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-11-28 14:23:44 +08:00
周周周
95243f012c
[Others] add PADDLE_ENFORCE ( #5288 )
2025-11-28 14:23:35 +08:00
lizhenyun01
aba4fc657f
[Feature] support flash_mask_attention backend ( #5134 )
...
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
2025-11-28 10:12:16 +08:00
cmcamdy
5a67a6d960
[XPU] support kernel for mtp(base) ( #4748 )
...
* [XPU] support kernel for mtp(base)
* [XPU] support kernel for mtp(base)
* format
* format
* format
* fix gather next token
* fix step && add test
* fix
* mv pre/post process
* add adjust batch / gather next token for mtp
* fix code style
* fix mtp kenrel name
* fix mtp kernel test
* mv xpu pre/post process
* mv xpu pre/post process
2025-11-27 15:05:44 +08:00
GoldPancake
cfc5b0ccf9
[BugFix] fix mtp logprob bugs in chunk prefill ( #5244 )
...
* fix mtp logprob bugs in chunk prefill
* fix
* fix
2025-11-27 11:31:29 +08:00