Commit Graph

277 Commits

Author SHA1 Message Date
lizan1999
ec6811f648 support token num = 0 (#5635)
Co-authored-by: lizan1999 <lizan03@baidu.com>
Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-19 10:20:38 +08:00
yzwu
ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
lizan1999
e1a9b282eb fix bug for EP+MTP (#5605)
Co-authored-by: lizan1999 <lizan03@baidu.com>
2025-12-18 14:34:54 +08:00
zhupengyang
8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
Yuanle Liu
cdc0004894 Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563)" (#5611)
This reverts commit 73e1d6aa90.
2025-12-17 13:59:06 +08:00
Yuanle Liu
867803ae10 [BugFix] fix speculate_limit_thinking_content_length (#5590)
* fix speculate_limit_thinking_content_length

* update
2025-12-16 04:31:45 -08:00
chen
27ef3610b5 support glm fa3 (#5586) 2025-12-16 19:33:27 +08:00
fxyfxy777
73e1d6aa90 [Feature] add ue8m0 for per_token_quant_fp8 (#5563)
* ue8m0

* add default arg

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-16 18:40:12 +08:00
Echo-Nie
50100f98d7 [Feature] Support fusedmoe on Blackwell (#5325)
* update sm100

* fix

* fix style
2025-12-16 11:58:50 +08:00
freeliuzc
532f9ba227 [BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491)
* [liuzichang spend 10 dyas]fix write qknorm cache bug

* fix 'fix cachekv bug''
2025-12-15 18:27:11 +08:00
ddchenhao66
9f70f4310e [PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-15 15:39:38 +08:00
chen
a389bb7c5c [Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486) 2025-12-12 17:10:17 +08:00
RuohengMa
12c76f8137 [XPU] add speculate_get_logits (#5497)
* [XPU] add speculate_step_system_cache

* [XPU] add speculate_step_system_cache

* [XPU] add speculate_get_logits

* delete context

* add ptr check

---------

Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-12 15:38:30 +08:00
Lucas
888c4b992d [XPU] refactor of block_attn param 'pos_emb_type' (#5511) 2025-12-12 14:30:09 +08:00
Juncai
d67388a479 [PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514)
* Distinguish the pipelines for sending kv signal in different prefill

* up
2025-12-12 14:05:36 +08:00
cmcamdy
3c1f7b85a4 [XPU] support get hidden state for mix (#5513)
* fix git hidden states

* fix code style

* fix code style
2025-12-12 10:31:20 +08:00
FocusLuo
c3aaa7e441 [BugFix] Fixed build script issue on Intel HPU platforms (#5455)
* [INTEL HPU]  Fixed build script issue for non-gpu platforms

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu

Signed-off-by: Luo, Focus <focus.luo@intel.com>

---------

Signed-off-by: Luo, Focus <focus.luo@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-11 16:36:37 +08:00
Neil Zhu
4403a21d4b [Metax] refactor cutlass moe and optimize flash attention (#5361)
* [Metax] refactor moe and flash attention backend
---------

Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>
2025-12-10 17:15:17 +08:00
Copilot
e38709b499 [BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471)
* Initial plan

* [BugFix] Fix limit_thinking bug - change AND to OR in condition checks

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* Update Chinese comments to reflect OR logic instead of AND

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
2025-12-10 11:03:19 +08:00
lzy
99f607eef5 [Others] Maintain the mtp branch temporarily. (#5446) 2025-12-09 19:17:53 +08:00
lizexu123
95eab9f9ee [Feature] support stop_token_ids (#5399)
* support stop_token_ids

* fix

* delete chinese

* support both

* delete print
2025-12-09 17:49:12 +08:00
xiaozude
df67379bc3 [Metax] modify wrapSize to WARP_SIZE (#5442) 2025-12-09 01:44:02 -08:00
周周周
31410415db FA3 support qwen3 (#5441) 2025-12-09 16:16:16 +08:00
RuohengMa
8178e3fc6a [XPU] add speculate_step_system_cache (#5397)
* [XPU] add speculate_step_system_cache

* [XPU] add speculate_step_system_cache

---------

Co-authored-by: cmcamdy <1027740945@qq.com>
2025-12-09 14:40:11 +08:00
K11OntheBoat
8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-09 14:17:30 +08:00
周周周
2aea8a3a60 [Others] Remove useless code (#5404) 2025-12-08 13:59:46 +08:00
GoldPancake
8545b705ed fix top_p_candidates (#5400)
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
2025-12-05 20:01:05 +08:00
Lucas
8f2b85362d [XPU] support moe_expert_ffn TGEMM selection (#5375) 2025-12-05 17:49:40 +08:00
Lucas
3aed8d257d [XPU] redirect xvllm/xtdk/xhpc downloading log (#5388) 2025-12-05 17:34:17 +08:00
cmcamdy
86b6430582 fix split_rope_cache_kv_encoder in mix mtp (#5384) 2025-12-05 14:33:17 +08:00
Lucas
7b0b6e470a [XPU] support XDNN downloading function (#5365) 2025-12-05 11:16:45 +08:00
Nyakku Shigure
f88c159de1 [BugFix] Exit if neither modern nor legacy wheel dir not found (#5367)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 16:45:48 +08:00
Yonghua Li
f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 15:38:43 +08:00
周周周
a36d60aa18 [FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350)
* commit

* commit

* commit

* commit

* commit

* commit
2025-12-03 05:17:37 -08:00
Sunny-bot1
d5a9b75b4e fix cutlass ep (#5337) 2025-12-03 14:06:01 +08:00
lzy
c71a44c7e5 supports mtp split_kv_attn (#5343) 2025-12-03 12:40:16 +08:00
Sunny-bot1
3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
周周周
fb7f951612 [UNITEST] add test (#5305) 2025-12-02 17:59:01 +08:00
qw86972190
6048ea37bd [XPU]add enable_logprob (#5279)
* [XPU]Update document

* [XPU]Update documentation

* [XPU]add enable_logprob

* Fix code style issues

* “doc”

* “docs”

* “doc”

* Fix code style via pre-commit

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com>
2025-12-02 15:32:28 +08:00
K11OntheBoat
2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy

* clean some log info

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-02 14:11:50 +08:00
chen
aa35ce449d [Optimization] EP empty_input_forward Remove Communication (#5254) 2025-12-01 21:10:40 +08:00
cmcamdy
3149aed750 fix_gather_next_token (#5311) 2025-12-01 18:00:30 +08:00
K11OntheBoat
7bafcf1df3 [OP]Remove extra H2D in DeepGemm (#5262)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-11-28 14:23:44 +08:00
周周周
95243f012c [Others] add PADDLE_ENFORCE (#5288) 2025-11-28 14:23:35 +08:00
lizhenyun01
aba4fc657f [Feature] support flash_mask_attention backend (#5134)
* [Feature] suppert flash_mask_attention backend

* fix unittest

* clean code
2025-11-28 10:12:16 +08:00
cmcamdy
5a67a6d960 [XPU] support kernel for mtp(base) (#4748)
* [XPU] support kernel for mtp(base)

* [XPU] support kernel for mtp(base)

* format

* format

* format

* fix gather next token

* fix step && add test

* fix

* mv pre/post process

* add adjust batch / gather next token for mtp

* fix code style

* fix mtp kenrel name

* fix mtp kernel test

* mv xpu pre/post process

* mv xpu pre/post process
2025-11-27 15:05:44 +08:00
GoldPancake
cfc5b0ccf9 [BugFix] fix mtp logprob bugs in chunk prefill (#5244)
* fix mtp logprob bugs in chunk prefill

* fix

* fix
2025-11-27 11:31:29 +08:00
freeliuzc
ba915e03e1 [BugFix]Fix attention mask bug in D-Node of PD-split mode (#5245) 2025-11-26 17:56:28 +08:00
xiaoxiaohehe001
61fc368066 [Fix] fix eplb noaux (#5239)
* fix eplb noaux

* fix eplb noaux
2025-11-26 17:50:51 +08:00
zccjjj
ea3bc5b4ca [XPU] Fix the error in MoeExpertFFN operator when valid_token_num=0 (#5196) 2025-11-25 10:07:20 +08:00