chenjian
4c76171b57
[Optimize][Cherry-pick] Robust stabilty for PD deployment #5338 ( #5395 )
...
* [Optimize] Robust stabilty for PD deployment
---------
Co-authored-by: Kaipeng Deng <dengkaipeng@baidu.com >
2025-12-15 18:58:09 +08:00
freeliuzc
6715196924
fix attention bug in spec decoding ( #5480 )
2025-12-10 12:56:13 +08:00
kevin
9b5b08cb72
[Cherry-Pick][BugFix] Fix async download( #5349 ) ( #5347 )
...
* fix mm to_dict bug
* pd support async download
* update code
* update test case
* update log
* Revert "update log"
This reverts commit 6e883150cd .
* update code
* fix mtp bug
2025-12-05 18:59:36 +08:00
lzy
04b2c43806
[Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM ( #5316 )
2025-12-02 13:03:55 +08:00
kevin
f1e1f5da57
fix mm to_dict bug ( #5299 )
2025-11-29 20:49:36 +08:00
Yuanle Liu
b99064432e
Update load_weight_utils.py ( #5285 )
2025-11-28 13:39:59 +08:00
lizhenyun01
fd1313cdb4
[Cherry-Pick][Feature] support flash_mask_attention backend( #5134 ) ( #5256 )
...
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
2025-11-28 10:13:00 +08:00
Yuanle Liu
9b0c65ba57
Add method to disable sequence parallel MoE if needed ( #5268 )
2025-11-27 16:28:24 +08:00
kevin
69b4d058ad
cp_fix_bug ( #5253 )
2025-11-27 15:15:49 +08:00
SunLei
3d74a4baf6
[Cherry-Pick] MTP split draft_tokens into standalone post-processing path( #5205 ) ( #5231 )
...
* refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs
* Restore Request.__repr__ implementation
* ci
* add envs
* fix unittest
2025-11-27 11:23:38 +08:00
freeliuzc
bdcc952eeb
fix pd-split first step bug ( #5246 )
2025-11-26 18:02:32 +08:00
xiaoxiaohehe001
710753377f
[Cherry-Pick] Fix eplb noaux( #5239 ) ( #5240 )
...
* fix eplb noaux
* fix eplb noaux
2025-11-26 17:51:10 +08:00
kevin
e0c7ebff29
[BugFix][Cherry Pick] fix ds type bug ( #5220 )
...
* fix ds type bug
* update code
2025-11-25 20:37:09 +08:00
freeliuzc
a11d17cee9
[Speculative Decoding][Cherry Pick]Update extract_mtp_weight script and optimize config ( #5213 )
...
* update extract_mtp_model
* modify config usage
2025-11-25 14:42:55 +08:00
freeliuzc
e581b7d7d9
fix kernel output extract ( #5212 )
2025-11-25 14:25:20 +08:00
chenjian
09b47c7111
[Bug fix] Send first token in D instance ( #5199 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] Send first token in D instance
* fix
2025-11-24 23:42:20 +08:00
Yuanle Liu
f69e0839f7
dummy import fd ( #5192 )
2025-11-24 20:23:07 +08:00
kevin
8e4e3ff510
[Feature] support eplb in api_server ( #4782 )
...
* support eplb in api_server
* update code
* add eplb test case
* update eplb
* support tp+dp eplb
* update test cese
* update code
* update code
* fix bug
* update copilot review
* update test case name
2025-11-24 20:22:29 +08:00
xiaozude
d5bd64336a
[Metax] support ENABLE_V1_KVCACHE_SCHEDULER ( #5163 )
2025-11-24 19:19:49 +08:00
xiaoxiaohehe001
e150a418d4
support moe offline quant ( #5142 )
2025-11-24 18:59:18 +08:00
Juncai
af03da5127
[BugFix] fix release block ids ( #5184 )
...
* fix release block ids
* up
2025-11-24 16:48:09 +08:00
xiaoxiaohehe001
95f3c8c641
[Fix] Fix eplb bug and support fp8 load weight ( #5178 )
...
* fix eplb part2
* fix eplb part2
* fix eplb part2
2025-11-24 15:31:37 +08:00
kevin
cceaba1c8d
[Feature] remove to_numpy ( #5162 )
...
* remove to_numpy
* update code
* update name
* update code
* update code
* update code
2025-11-21 21:54:26 +08:00
kevin
c068a4f642
[Feature] dyc8 support prefixcache ( #5125 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* dyc8 support prefixcache
* fix cache_trans test case
* update code
2025-11-21 19:46:26 +08:00
GoldPancake
ab3a2e45ff
fix mtp reschedule ( #5165 )
2025-11-21 19:31:35 +08:00
chenjian
3ea1b44a58
[Optimization] Improve perf for fd response token with internal adapter ( #4992 )
...
* [Optimize] Improve perf for fd response token with internal adapter
* fix
* fix bug
* fix ci
* fix ci
* fix ci
* fix ci
2025-11-21 19:02:03 +08:00
Yuanle Liu
5bcf79d780
[BugFix] fix num of rdma_comm_ports check ( #5168 )
...
* fix num of rdma_comm_ports check
* update
* update
* update
2025-11-21 18:31:14 +08:00
Jiang-Jia-Jun
d2298dcb0c
[Polish] Simplify __repr__ method in Request class ( #5153 )
...
Remove detailed string representation for Request class.
2025-11-21 17:21:06 +08:00
xiaoxiaohehe001
6471dade4a
[Fix] Fix noaux ep test ( #5161 )
...
* support noaux eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
2025-11-21 16:36:41 +08:00
Juncai
f9b0545a7f
[PD Disaggregation] [Refine] Refine splitwise deployment ( #5151 )
...
* Refine splitwise deployment
* up
2025-11-21 15:30:24 +08:00
freeliuzc
2d1dade5e2
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage ( #5155 )
...
* support static cachekv c8 quantization in mtp mode
* optimize memory allocation
2025-11-21 15:10:13 +08:00
lizhenyun01
3c36283d7d
[ENV] support AK SK ENCPOINT while get the multi_modal's feature ( #5159 )
2025-11-20 23:07:57 -08:00
bukejiyu
34f59d9800
[RL]Fix missing is_distributed attribute ( #5150 )
...
* fix
* update
2025-11-21 14:14:25 +08:00
xiaoxiaohehe001
6ca2651995
[Feature] Support noaux for eplb ( #5143 )
...
* support noaux eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
2025-11-21 14:10:32 +08:00
ddchenhao66
e70e2279ce
[PD Disaggregation][XPU] Add XPU support for PD disaggregation ( #5113 )
...
* [XPU] xpu support PD disaggregation
* [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards
* [XPU] xpu support PD disaggregation in v1 scheduler
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-21 14:09:01 +08:00
kevin
7454480e07
[Feature] support bos download retry ( #5137 )
...
* support bos download retry
* update code
* update code
2025-11-21 10:18:32 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
Juncai
01c30f6b87
Fix schedule error in splitwise deployment ( #5149 )
2025-11-20 21:18:10 +08:00
Ryan
0857099191
mv import ( #5146 )
2025-11-20 19:25:56 +08:00
周周周
385fe6dade
[Others] clean code ( #5133 )
2025-11-20 18:44:08 +08:00
Yuanle Liu
7ac25935c7
[Optimization] default compile rdma, reduce cudagraph buffer size in mm, fix some config bug ( #5121 )
...
* default compile rdma, reduce cudagraph buffer size in mm, fix some config logic
* update
* update
* fix bug
* enhance rdma compile
* fix
2025-11-20 17:19:47 +08:00
周周周
6fa34102e8
[Others]get_block_shape_and_split_kv_block clean code ( #5123 )
2025-11-20 16:40:04 +08:00
yangjianfengo1
af715db763
[Scheduler] Support chunk prefill for video input ( #5107 )
...
* add video chunk prefill
* add vit_merge=True for test_tokenizer_client.py
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-11-20 16:29:13 +08:00
Neil Zhu
0edda75a56
[Metax] optimize cutlass moe and flash attention backend ( #5128 )
2025-11-20 16:12:35 +08:00
freeliuzc
f1e36ff2f7
[Speculative Decoding][MTP]Support stop_seqs and pd-split mode ( #5029 )
...
* support multi_stop_seqs in speculative decoding
* support mtp tp with ep split
* fix custom op register
* fix spec stop_seqs params
2025-11-20 15:26:01 +08:00
kevin
109d48e456
[Feature] support async download features ( #5003 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support async download features
* add test case
* update code
2025-11-19 22:23:36 +08:00
Sunny-bot1
bde97e09f7
support dynamic activation quant for w4afp8 ( #5117 )
2025-11-19 21:11:16 +08:00
LiqinruiG
a5cd7c9039
[BugFix] rollback max_tokens and min_tokens when continue to infer ( #5082 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [BugFix] rollback max_tokens and min_tokens when continue to infer
* [BugFix] rollback max_tokens and min_tokens when continue to infer
* [fix] add more logger info: max_tokens
---------
Co-authored-by: liqinrui <liqinrui@baidu.com >
2025-11-19 18:43:42 +08:00
Sunny-bot1
43f0c7557e
[Feature] Add an unquantized option for MoE and Dense quant type ( #4813 )
2025-11-19 16:24:03 +08:00
bukejiyu
a82f25ea7b
[RL]Resolve shape mismatch problems in RL-related modules ( #5032 )
...
* RL fix
* update
2025-11-19 11:12:48 +08:00