Commit Graph

  • be0c960260 [BugFix] dynamic cache kv block_wise_fp8 not need create layer.cache_k_scale (#5362) Yuanle Liu 2025-12-03 21:32:59 +08:00
  • a36d60aa18 [FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350) 周周周 2025-12-03 21:17:37 +08:00
  • 5f8d4aedea [Feature] support audio tts (#5333) ming1753 2025-12-03 21:06:48 +08:00
  • 74ba637b6b remove close prefix cache (#5363) kevin 2025-12-03 20:59:32 +08:00
  • 83dbc4e5dd [Feature] Guided Decoding add LLguidance backend (#5124) Daci 2025-12-03 20:23:57 +08:00
  • 5c2247c3f0 [Feature] Support async download chunk video features (#5297) Dangweichong 2025-12-03 19:39:45 +08:00
  • 4e8096bd0d [XPU] xpu support mm prefix cache (#5356) ddchenhao66 2025-12-03 19:07:34 +08:00
  • a4bb3e9960 [bugfix]remove metrics middleware (#5332) xiaolei373 2025-12-03 17:07:45 +08:00
  • f458cc5ba4 [Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353) lzy 2025-12-03 16:42:10 +08:00
  • 04d35ace5e [CE]add wint4 ep (#5355) tianlef 2025-12-03 15:17:47 +08:00
  • d5a9b75b4e fix cutlass ep (#5337) Sunny-bot1 2025-12-03 14:06:01 +08:00
  • cae2c1ccf5 supports mtp split_kv_attn (#5344) lzy 2025-12-03 13:33:26 +08:00
  • 690bcb8e50 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315) lzy 2025-12-03 13:33:15 +08:00
  • 17c88f429f fix skip_quant (#5342) Yuanle Liu 2025-12-03 13:20:51 +08:00
  • f6544c0b1b [CI] Add RD in env CI. (#5345) Longzhi Wang 2025-12-03 13:18:17 +08:00
  • c71a44c7e5 supports mtp split_kv_attn (#5343) lzy 2025-12-03 12:40:16 +08:00
  • dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) YuBaoku 2025-12-03 10:56:59 +08:00
  • 0eb799a324 Update installation requirements for Kunlunxin XPU Jiang-Jia-Jun 2025-12-03 10:04:29 +08:00
  • 335ae0f4a4 Update installation requirements for Kunlunxin XPU Jiang-Jia-Jun 2025-12-03 10:04:17 +08:00
  • 21f138f68b [CI] Add env ci (#5331) Longzhi Wang 2025-12-02 19:31:25 +08:00
  • 3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) YuBaoku 2025-12-02 18:57:29 +08:00
  • 3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282) Sunny-bot1 2025-12-02 18:56:16 +08:00
  • 429dd2b1db [Intel HPU] add example benchmark scripts for hpu (#5304) fmiao2372 2025-12-02 18:00:01 +08:00
  • fb7f951612 [UNITEST] add test (#5305) 周周周 2025-12-02 17:59:01 +08:00
  • 8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252) Jiaxin Sui 2025-12-02 17:15:51 +08:00
  • 6b03f4fb53 [CI] Update part of test_docker to paddle_dev (#5278) YuBaoku 2025-12-02 16:52:07 +08:00
  • 69e003abcb [CI] Fix return_code check in test_chunked_moe.py (#5326) YuBaoku 2025-12-02 15:41:26 +08:00
  • 6048ea37bd [XPU]add enable_logprob (#5279) qw86972190 2025-12-02 15:32:28 +08:00
  • c563eca791 [Feature] support reward model (#5301) lizexu123 2025-12-02 14:55:31 +08:00
  • 2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251) K11OntheBoat 2025-12-02 14:11:50 +08:00
  • 117980dd4e [LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089) qwes5s5 2025-12-02 13:49:51 +08:00
  • af39819fcd Revert "[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 (#5064)" (#5290) YuanRisheng 2025-12-02 13:43:36 +08:00
  • ded7765dec Revert "[CI] 【Hackathon 9th Sprint No.41】NO.41 功能模块单测补充 (#5062)" (#5291) YuanRisheng 2025-12-02 13:43:13 +08:00
  • 04b2c43806 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5316) lzy 2025-12-02 13:03:55 +08:00
  • 68533ebd95 [CI] disable test_chunked_moe.py in unit_test (#5322) YuBaoku 2025-12-02 10:39:50 +08:00
  • 84e2f6aa75 [CI]add clear to run-batch ci (#5307) xiaolei373 2025-12-01 21:18:19 +08:00
  • aa35ce449d [Optimization] EP empty_input_forward Remove Communication (#5254) chen 2025-12-01 21:10:40 +08:00
  • b0113cb0fc [XPU][CI] Change XPU CI Base Value (#5318) Jiaxin Sui 2025-12-01 21:02:09 +08:00
  • 3149aed750 fix_gather_next_token (#5311) cmcamdy 2025-12-01 18:00:30 +08:00
  • 0925d44f18 [PD Disaggregation] support different tp_size for prefill and decode (#5296) Juncai 2025-12-01 17:50:20 +08:00
  • 54119cf07e [CI] Remove need approve by yuanlehome (#5310) Yuanle Liu 2025-12-01 17:44:43 +08:00
  • cbbe6b892c [CI] Update build_docker to paddle_manylinux (#5226) YuBaoku 2025-12-01 16:25:44 +08:00
  • b467e9dadc [XPU][CI]Change W4A8 Case Base Value (#5309) Jiaxin Sui 2025-12-01 15:25:33 +08:00
  • add524d80c [Feature] support chunked moe (#4575) Longzhi Wang 2025-12-01 15:17:18 +08:00
  • 6f42c37359 [Deterministic] Move paddle version batch invariant pkg to Fastdeploy (#4763) Jundong Liu 2025-12-01 11:25:48 +08:00
  • 70ec1e17c1 [Features] add audio request & fix embedding bug (#5201) ming1753 2025-12-01 11:12:17 +08:00
  • 9f4977eb74 [xpu] support mtp for xpu(mix) (#5274) cmcamdy 2025-12-01 11:03:14 +08:00
  • f1e1f5da57 fix mm to_dict bug (#5299) kevin 2025-11-29 20:49:36 +08:00
  • 8aec3acc8c fix mm type bug (#5300) kevin 2025-11-29 20:48:14 +08:00
  • 048ca60013 fix aksk check bug kevin 2025-11-28 14:42:49 +08:00
  • 090a066e4a [APIServer] add_prompt_ids_test (#5283) Divano 2025-11-29 08:31:31 +08:00
  • 2c7683d551 [Intel HPU] change MoE weights and scales from list to tensor and add… (#5289) fmiao2372 2025-11-28 19:17:05 +08:00
  • 5b49142988 update (#5298) Zhang Yulong 2025-11-28 18:29:16 +08:00
  • 4e392e8337 [BugFix]fix v1 loader lm head fp32 (#5270) (#5287) release/2.3 chen 2025-11-28 17:52:25 +08:00
  • a535050b11 [FDConfig] remove engine client args, use fd_config instead (#5217) Yonghua Li 2025-11-28 17:20:54 +08:00
  • 73886204d4 [Others] clean code (#5235) 周周周 2025-11-28 15:40:49 +08:00
  • 2d69d91ab8 add aksk check (#5273) kevin 2025-11-28 14:28:24 +08:00
  • 7bafcf1df3 [OP]Remove extra H2D in DeepGemm (#5262) K11OntheBoat 2025-11-28 14:23:44 +08:00
  • 95243f012c [Others] add PADDLE_ENFORCE (#5288) 周周周 2025-11-28 14:23:35 +08:00
  • 1539fd6056 [BugFix]Set default OMP_NUM_THREADS=3 and fix extra GPU memory usage in DeepSeek (#5219) bukejiyu 2025-11-28 14:22:04 +08:00
  • 7dc06cac6e [BugFix] race condition [is_fetching] causing multiple fetch requests (#5238) Daci 2025-11-28 13:41:36 +08:00
  • b99064432e Update load_weight_utils.py (#5285) Yuanle Liu 2025-11-28 13:39:59 +08:00
  • 35479b691f [BugFix] fix tsp o_proj bias add (#5284) Yuanle Liu 2025-11-28 13:39:55 +08:00
  • 1a559c973f Revert "[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 (#5056)" (#5286) Juncai 2025-11-28 10:48:35 +08:00
  • fc88eebc32 [CI][XPU] add pd disaggregation (#5179) ddchenhao66 2025-11-28 10:44:27 +08:00
  • b52e1bd281 [Cherry-Pick][Feature] dy-c8 prefix caching (#4918) kevin 2025-11-28 10:37:49 +08:00
  • bab01e9f85 [Cherry-pick][XPU][CI] Set pip index URL to Tsinghua mirror (#5277) (#5280) Jiaxin Sui 2025-11-28 10:14:14 +08:00
  • 89ed1a9e84 [Cherry-pick][XPU][CI] Set pip index URL to Tsinghua mirror (#5277) (#5281) Jiaxin Sui 2025-11-28 10:13:41 +08:00
  • fd1313cdb4 [Cherry-Pick][Feature] support flash_mask_attention backend(#5134) (#5256) lizhenyun01 2025-11-28 10:13:00 +08:00
  • aba4fc657f [Feature] support flash_mask_attention backend (#5134) lizhenyun01 2025-11-28 10:12:16 +08:00
  • b935101008 Create test_prompt_ids.py Divano 2025-11-28 10:11:51 +08:00
  • 07cb11e51d [XPU][CI] Set pip index URL to Tsinghua mirror (#5277) Jiaxin Sui 2025-11-27 22:12:20 +08:00
  • 6a6bf4ea24 [CI] Fix test streaming with stop str (#5275) YuBaoku 2025-11-27 20:51:39 +08:00
  • 35f85baf09 [BugFix]fix v1 loader lm head fp32 (#5270) chen 2025-11-27 20:12:56 +08:00
  • b52ec268f7 [CI]fix run batch unit test (#4628) xiaolei373 2025-11-27 19:38:04 +08:00
  • 1372d6d01d [CI] disable test_engine_client.py unit test (#5272) YuBaoku 2025-11-27 17:37:54 +08:00
  • 9b0c65ba57 Add method to disable sequence parallel MoE if needed (#5268) Yuanle Liu 2025-11-27 16:28:24 +08:00
  • f637ba708c [Cherry-Pick] MTP split draft_tokens into standalone post-processing path(#5205) (#5232) SunLei 2025-11-27 15:30:00 +08:00
  • 69b4d058ad cp_fix_bug (#5253) kevin 2025-11-27 15:15:49 +08:00
  • 051b82b4c8 [Docs] add qwen25-vl docs (#5243) CSWYF3634076 2025-11-27 15:05:57 +08:00
  • 5a67a6d960 [XPU] support kernel for mtp(base) (#4748) cmcamdy 2025-11-27 15:05:44 +08:00
  • e63d715fc3 [BugFix][Metrics] Fix Prometheus Multiprocess Metrics Issues and Add ZMQ Communication Metrics (#5185) fl0w2o48 2025-11-27 15:05:09 +08:00
  • ce9a49f6bf [PD Disaggregation] Add unittest for splitwise deployment with using rdma (#5189) Juncai 2025-11-27 14:27:17 +08:00
  • 373b5c3807 [CI] 【Hackathon 9th Sprint No.41】NO.41 功能模块单测补充 (#5062) xunyoyo 2025-11-27 14:24:19 +08:00
  • ef5aa5c03b [BugFix] fix cuda-python requirement (#5261) Yuanle Liu 2025-11-27 13:58:18 +08:00
  • 84c7fa49a5 [CI]【Hackathon 9th Sprint No.50】NO.50 功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 (#5045) essos 2025-11-27 12:43:00 +08:00
  • bbcd92c8a0 [BugFix] fix mtp logprob bugs in chunk prefill (#5234) GoldPancake 2025-11-27 11:32:01 +08:00
  • cfc5b0ccf9 [BugFix] fix mtp logprob bugs in chunk prefill (#5244) GoldPancake 2025-11-27 11:31:29 +08:00
  • 3d74a4baf6 [Cherry-Pick] MTP split draft_tokens into standalone post-processing path(#5205) (#5231) SunLei 2025-11-27 11:23:38 +08:00
  • c424e08dc5 [Speculative Decoding] split draft_tokens into standalone post-processing path (#5205) SunLei 2025-11-27 11:22:41 +08:00
  • a12eaf9171 [CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 (#5056) xunyoyo 2025-11-27 11:05:50 +08:00
  • bce3739a57 [BugFix] fix v1_loader for wint8 rl (#5224) chen 2025-11-26 21:19:54 +08:00
  • cb56d46694 [Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247) Yuanle Liu 2025-11-26 21:09:09 +08:00
  • bf30f45738 [BugFix] fix vl performance bug (#5181) kevin 2025-11-26 21:06:52 +08:00
  • 209970836e [BugFix] BF16 MoE Cutlass Backend Support EP (#5242) chen 2025-11-26 19:16:22 +08:00
  • bdcc952eeb fix pd-split first step bug (#5246) freeliuzc 2025-11-26 18:02:32 +08:00
  • ba915e03e1 [BugFix]Fix attention mask bug in D-Node of PD-split mode (#5245) freeliuzc 2025-11-26 17:56:28 +08:00
  • 710753377f [Cherry-Pick] Fix eplb noaux(#5239) (#5240) xiaoxiaohehe001 2025-11-26 17:51:10 +08:00
  • 61fc368066 [Fix] fix eplb noaux (#5239) xiaoxiaohehe001 2025-11-26 17:50:51 +08:00
  • bc118c3d2d fix prompt_token_ids is None in request dict (#5241) kxz2002 2025-11-26 17:10:45 +08:00