Commit Graph

  • a2ab1f4462 [BugFix] fix mix splitwise pickle load error (#5488) Daci 2025-12-10 19:05:50 +08:00
  • 4403a21d4b [Metax] refactor cutlass moe and optimize flash attention (#5361) Neil Zhu 2025-12-10 17:15:17 +08:00
  • fbc9bce1e9 [Feature]Optimization of Thinking Pattern Framework (#4302) luukunn 2025-12-10 16:17:06 +08:00
  • 1bffac866b [PD Disaggregation] Decode does not cache requests for preallocating resource in default (#5453) Juncai 2025-12-10 15:54:16 +08:00
  • 7c72383efa [BugFix] fix decode time sleep bug (#5461) ming1753 2025-12-10 15:48:48 +08:00
  • 9e15191cce [BugFix] fix audio end bug (#5464) ming1753 2025-12-10 13:37:26 +08:00
  • 6715196924 fix attention bug in spec decoding (#5480) freeliuzc 2025-12-10 12:56:13 +08:00
  • c5c43e3b3d fix attention bug in spec decoding (#5481) freeliuzc 2025-12-10 12:55:13 +08:00
  • 1776d410d0 fix limit_thinking bug (#5469) Yuanle Liu 2025-12-10 11:56:35 +08:00
  • c5973c2087 fix limit_thinking bug (#5477) Yuanle Liu 2025-12-10 11:50:13 +08:00
  • 83a9ef51d7 [Others] add assert and only count the actual load in cuda_graph (#5445) 周周周 2025-12-10 11:22:54 +08:00
  • e38709b499 [BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471) Copilot 2025-12-10 11:03:19 +08:00
  • 53460935ec fix attention bug in spec decoding (#5460) freeliuzc 2025-12-10 10:56:37 +08:00
  • 419b416376 [BugFix] [RL] remove shutdown_process_group/restart_process_group for RL (#5433) Yonghua Li 2025-12-09 20:32:37 +08:00
  • f08fb25cfe [Others] Maintain the mtp branch temporarily. (#5447) lzy 2025-12-09 19:41:33 +08:00
  • e9174f25e8 commit (#5452) 周周周 2025-12-09 19:36:58 +08:00
  • 1b1bfab341 [CI] Add unittest (#5328) Echo-Nie 2025-12-09 19:19:42 +08:00
  • 99f607eef5 [Others] Maintain the mtp branch temporarily. (#5446) lzy 2025-12-09 19:17:53 +08:00
  • 95eab9f9ee [Feature] support stop_token_ids (#5399) lizexu123 2025-12-09 17:49:12 +08:00
  • df67379bc3 [Metax] modify wrapSize to WARP_SIZE (#5442) xiaozude 2025-12-09 17:44:02 +08:00
  • e397c4fba6 [Others] remove add_bias option (#5425) Haonan Luo 2025-12-09 17:39:35 +08:00
  • 1f63000ef9 allow 0-dim tensor into ar (#5451) 周周周 2025-12-09 16:53:35 +08:00
  • f7c6b8c4ec modify approve (#5443) YuanRisheng 2025-12-09 16:52:10 +08:00
  • b491dcd23c [Optimization] compulte real max_logprobs in batch (#5430) (#5448) chen 2025-12-09 16:48:06 +08:00
  • b0cf2c4b7a [Feature] Support prefill batch inference for pooling models. (#5436) lizexu123 2025-12-09 16:21:00 +08:00
  • 31410415db FA3 support qwen3 (#5441) 周周周 2025-12-09 16:16:16 +08:00
  • 2c55bbc3f8 support dynamic load for normal (#5437) gaoziyuan 2025-12-09 15:07:19 +08:00
  • 83ea9646f9 [PD Disaggregation] Unify the disaggregation info and the pd communication (#5438) Juncai 2025-12-09 14:44:59 +08:00
  • 8178e3fc6a [XPU] add speculate_step_system_cache (#5397) RuohengMa 2025-12-09 14:40:11 +08:00
  • e1c4a12e34 [Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (#5223) Nyakku Shigure 2025-12-09 14:37:00 +08:00
  • 8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440) K11OntheBoat 2025-12-09 14:17:30 +08:00
  • 76649b45c1 [Optimization] compulte real max_logprobs in batch (#5430) chen 2025-12-09 14:15:05 +08:00
  • f7e832efaf [BugFix] fix mm cudagraph (#5266) kevin 2025-12-09 11:51:00 +08:00
  • c06a6234b9 [Metax] optimize mla attention (#5258) xiaozude 2025-12-09 11:18:19 +08:00
  • 4b9e2c5c8e [BugFix] 0 not into cuda graph to save memory (#5426) (#5432) 周周周 2025-12-09 11:08:55 +08:00
  • 5d9b5e4a5b [Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication (#4868) zhouchong 2025-12-09 10:53:40 +08:00
  • 2f208db4e9 [Feature] Multimodal Model P / D Separation (#5323) Daci 2025-12-09 10:47:42 +08:00
  • a8ffc22032 [BugFix] fix init RequestOutput (#5419) Juncai 2025-12-09 10:20:22 +08:00
  • 02df3c5097 FD registers to the Router only once. (#5431) Juncai 2025-12-08 22:07:11 +08:00
  • 5fb93d84f5 [Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418) SunLei 2025-12-08 22:04:49 +08:00
  • 364197c4b5 support w4afp8 mtp (#5429) Sunny-bot1 2025-12-08 20:24:00 +08:00
  • 31436a35e4 [Cherry-Pick] [BugFix] [RL] remove shutdown_process_group/restart_process_group for RL (#5433) (#5434) Yonghua Li 2025-12-08 19:13:06 +08:00
  • 438c9f785a [BugFix] 0 not into cuda graph to save memory (#5426) 周周周 2025-12-08 16:47:44 +08:00
  • b3031d2324 fix mask (#5385) freeliuzc 2025-12-08 15:24:09 +08:00
  • d1bd40d44c [CI]【Hackathon 9th Sprint Example NO 16】功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充 (#5264) kesmeey 2025-12-08 14:30:15 +08:00
  • 33e4f88e45 [BugFix] fix can not enter into cuda graph (#5422) 周周周 2025-12-08 14:20:52 +08:00
  • 2aea8a3a60 [Others] Remove useless code (#5404) 周周周 2025-12-08 13:59:46 +08:00
  • d4c16aa63e [BugFix][Cherry-Pick] fix can not enter into cuda graph (#5423) 周周周 2025-12-08 13:12:27 +08:00
  • 3066a0c34b Update FASTDEPLOY_VERSION to 2.4.0-dev Jiang-Jia-Jun 2025-12-08 11:21:46 +08:00
  • 1dceb1c48c Update setup.py Jiang-Jia-Jun 2025-12-08 11:21:26 +08:00
  • 80efe98f8d [PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317) Juncai 2025-12-08 10:08:44 +08:00
  • 7926add37c [Cherry-Pick][Loader][BugFix] Fix some parameters place on CPU in PaddleOCR-VL (#5413) (#5414) Nyakku Shigure 2025-12-08 10:01:20 +08:00
  • 0c66163dfd [Loader][BugFix] Fix some parameters place on CPU in PaddleOCR-VL (#5413) Nyakku Shigure 2025-12-08 10:01:00 +08:00
  • 707d1a1fc9 [New][RL] Support Rollout Routing Replay (#5405) (#5408) RAM 2025-12-08 10:00:35 +08:00
  • 7eea23f238 cp pr5373 pr5379 pr5410 (#5411) bukejiyu 2025-12-06 00:47:01 +08:00
  • c3a8a16f4c fix deepseek (#5410) bukejiyu 2025-12-06 00:45:48 +08:00
  • f6eb4dcc40 bf16 deepseek (#5379) bukejiyu 2025-12-05 22:23:30 +08:00
  • b2908b8e82 [New][RL] Support Rollout Routing Replay (#5405) RAM 2025-12-05 22:06:26 +08:00
  • 6961130e04 [Cherry-Pick] [BugFix] fix scheduler hang when input length is very close to max_model_len (#5394) release/2.2 Yonghua Li 2025-12-05 21:51:59 +08:00
  • c45e064f3d Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402) Jiang-Jia-Jun 2025-12-05 20:19:39 +08:00
  • 94c57e4175 [BugFix]remove _execute_empty_input (#5396) 周周周 2025-12-05 20:19:01 +08:00
  • d4979347ca [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374) lizexu123 2025-12-05 20:18:17 +08:00
  • 96d2d4877b [RL] Support Rollout Routing Replay (#5321) RAM 2025-12-05 20:01:33 +08:00
  • 8545b705ed fix top_p_candidates (#5400) GoldPancake 2025-12-05 20:01:05 +08:00
  • bae3475926 [BugFix]Fix plugin loading logic and logging messages (#4909) wyw 2025-12-05 19:25:01 +08:00
  • 36b6506abe [Cherry-Pick][BugFix] Fix mtp dy-c8 bug(#5390) (#5389) kevin 2025-12-05 19:03:35 +08:00
  • db936ab3e4 fix mtp prefix_cache dy-c8 bug (#5390) kevin 2025-12-05 19:03:19 +08:00
  • 9b5b08cb72 [Cherry-Pick][BugFix] Fix async download(#5349) (#5347) kevin 2025-12-05 18:59:36 +08:00
  • c9d7f9e7c3 [BugFix] fix async download bug (#5349) kevin 2025-12-05 18:59:12 +08:00
  • 5b900667e3 [XPU] support ep4tp1+v1 loader (#5398) zccjjj 2025-12-05 18:51:15 +08:00
  • 35846909c7 [fix] fix scheduler hang when input length is very close to max_model_len (#5393) Yonghua Li 2025-12-05 18:23:42 +08:00
  • a8f8791668 [Optimization] Qwen2.5-VL support multi-batch prefill (#5269) Ayakouji 2025-12-05 18:22:39 +08:00
  • 8f2b85362d [XPU] support moe_expert_ffn TGEMM selection (#5375) Lucas 2025-12-05 17:49:40 +08:00
  • 3aed8d257d [XPU] redirect xvllm/xtdk/xhpc downloading log (#5388) Lucas 2025-12-05 17:34:17 +08:00
  • c83dc58105 [Feature] support Two batch overlap, mainly used in Prefill (#5078) 周周周 2025-12-05 14:58:50 +08:00
  • 1aefbef0b3 fix trace log (#5386) qwes5s5 2025-12-05 14:45:52 +08:00
  • d436640735 [BugFix] Fix flash_attn_backend lizhenyun01 2025-12-05 12:00:08 +08:00
  • 86b6430582 fix split_rope_cache_kv_encoder in mix mtp (#5384) cmcamdy 2025-12-05 14:33:17 +08:00
  • b5a7abe624 [XPU] [CI] Change Paddle Version to Nightly (#5346) Jiaxin Sui 2025-12-05 13:01:29 +08:00
  • ebe613ccc8 [Intel HPU] fix bug about RP 5138 (#5380) fmiao2372 2025-12-05 11:33:29 +08:00
  • 7b0b6e470a [XPU] support XDNN downloading function (#5365) Lucas 2025-12-05 11:16:45 +08:00
  • dd2e9a14c7 [BugFix] Compatible with asynchronous functions (#5378) ming1753 2025-12-05 11:05:21 +08:00
  • e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145) zccjjj 2025-12-05 10:03:45 +08:00
  • 620d1da1c9 deepseek torch (#5373) bukejiyu 2025-12-04 23:26:53 +08:00
  • 1b5fd79d6b [CI] disable test_schedule_output.py in unit_test (#5377) YuBaoku 2025-12-04 23:18:23 +08:00
  • 7f4fff4d1e fix get_request from scheduler (#5369) Juncai 2025-12-04 21:59:10 +08:00
  • 3878a99b69 [Fearture] Support cache kv cache for output tokens (#4535) chenjian 2025-12-04 20:53:08 +08:00
  • b6f8069b36 [fix] update check_model_weights_status loop (#5249) Yonghua Li 2025-12-04 19:43:01 +08:00
  • 41c63f6056 remove fastsafetensors (#5371) Yuanle Liu 2025-12-04 19:22:04 +08:00
  • b7e1e6c953 [CE]change yaml name xiegegege 2025-12-04 19:14:11 +08:00
  • f88c159de1 [BugFix] Exit if neither modern nor legacy wheel dir not found (#5367) Nyakku Shigure 2025-12-04 16:45:48 +08:00
  • fbed0ef851 [Cherry-Pick][RL] Support Rollout Routing Replay (#5166) RAM 2025-12-04 16:35:30 +08:00
  • 3697110599 [Docs] update FAQ with logprobs MQ limits and deprecation (#5368) SunLei 2025-12-04 15:57:04 +08:00
  • f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197) Yonghua Li 2025-12-04 15:38:43 +08:00
  • 5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138) Longzhi Wang 2025-12-04 13:26:58 +08:00
  • f5bdb36e9b Reduce timeout in unittest (#5366) Juncai 2025-12-04 13:19:02 +08:00
  • 209006e6a6 [Intel HPU] fix memory fragmentation issue due to warmup process and fix moe all_reduce issue (#5357) fmiao2372 2025-12-04 11:29:41 +08:00
  • 946025480e [Bug fix] fix pooling models (#5358) lizexu123 2025-12-04 11:06:30 +08:00
  • a52aea073c fix logprobs (#5335) qwes5s5 2025-12-04 10:38:51 +08:00
  • 96ff402d44 [Optimization] Remove version constraints for setuptools, uvicorn, triton and safetensors, del fastsafetensors (#5330) Echo-Nie 2025-12-04 10:07:31 +08:00