Commit Graph

  • 0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400) YuanRisheng 2025-10-16 20:00:37 +08:00
  • b87e2c6184 [CUDAGraph]Add support for custom all-reduce operators under SOT mode (#4386) Ryan 2025-10-16 19:31:19 +08:00
  • 26ff2f8683 [XPU] refine fused moe (#4219) zhupengyang 2025-10-16 19:04:07 +08:00
  • 3bbe99eae7 [Intel HPU] Enable dist sampler on intel hpu platform (#4445) Jianyu Li 2025-10-16 19:02:27 +08:00
  • 4251ac5e95 【Fix】 remove text_after_process & raw_prediction (#4421) LiqinruiG 2025-10-16 19:00:18 +08:00
  • 8f77adc381 Add data dictionary for API response processing (#4454) Zhang Yulong 2025-10-16 17:23:11 +08:00
  • 6adfbe07ad 【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part (#4383) Zhenghai Zhang 2025-10-16 17:08:19 +08:00
  • f72be7a2c8 [BUG] fix ep bug (#4275) kevin 2025-10-16 16:46:40 +08:00
  • 5abf59715d perf: optimize ZMQ communication with async queue and single-threaded… (#4444) SunLei 2025-10-16 15:46:26 +08:00
  • 98f8c3703a Add filtering for failed requests in benchmark outputs (#4448) Zhang Yulong 2025-10-16 14:57:47 +08:00
  • cfd93c0966 fix: image token output (#4399) guozhuangzhuang 2025-10-16 14:51:32 +08:00
  • 9dc3968c13 [benchmark] Fix benchmark duration calculation logic (#4446) Zhang Yulong 2025-10-16 14:36:29 +08:00
  • a5063b96c8 [XPU] moe support VL 0-dim input (#4408) Lucas 2025-10-16 14:01:01 +08:00
  • 83f97d1196 support speculate_limit_thinking_content_length_v2 (#4428) Yuanle Liu 2025-10-16 13:23:16 +08:00
  • fd5dd1a0f1 [Bugfix]fix ep clear buffer perf (#4389) gaoziyuan 2025-10-16 13:05:39 +08:00
  • 670aaa3f83 [Bug fix] Fix pd for x1 thinking (#4433) chenjian 2025-10-16 12:03:45 +08:00
  • 8e392f0ea6 [XPU] support prefix cache (#4423) ddchenhao66 2025-10-16 11:27:41 +08:00
  • 4178c110d2 [Bug Fix] fix outdated doc and disable mm model prefix caching (#4425) ApplEOFDiscord 2025-10-16 11:10:33 +08:00
  • 5bde20b0c9 [BugFix] fix config bugs (#4370) ltd0924 2025-10-16 10:25:21 +08:00
  • 0982dfb705 Cherry-Pick PR4033 (#4082) YUNSHEN XIE 2025-10-15 22:09:02 +08:00
  • 7f94f063ff Update benchmark_serving.py (#4438) Zhang Yulong 2025-10-15 20:36:19 +08:00
  • b4b579a7ed Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344) SunLei 2025-10-15 19:42:59 +08:00
  • b3225f9a87 [Bug fix] Fix x1 thinking label bug (#4434) chenjian 2025-10-15 19:03:41 +08:00
  • 744287e1a9 fix param (#4419) freeliuzc 2025-10-15 18:44:24 +08:00
  • 74ae214f1a fix ep perf (#4381) gaoziyuan 2025-10-15 18:38:20 +08:00
  • fbdb056de0 [BUGFIX] clear request #4286 (#4402) ltd0924 2025-10-15 17:43:28 +08:00
  • bdc0207277 [XPU] fix VL multi-batch accuracy issue (#4394) Lucas 2025-10-15 17:27:43 +08:00
  • d8841b7b40 [BugFix] fix workers=1 (#4364) ltd0924 2025-10-15 17:06:25 +08:00
  • bcaa98ff9c V1 loader default (#4251) bukejiyu 2025-10-15 16:49:17 +08:00
  • 6c15945e4d fix_fa3 (#4429) xiaoxiaohehe001 2025-10-15 16:19:39 +08:00
  • e98c1c2f47 Disable gcu ci (#4427) tianshuo78520a 2025-10-15 16:06:25 +08:00
  • c3499875bd [MTP]support mtp chunk_prefill_v1 (#4365) freeliuzc 2025-10-15 15:33:59 +08:00
  • 55064b8c57 [CI] Fix download instability issues (#4424) YuBaoku 2025-10-15 15:11:14 +08:00
  • adeee84dd6 fix block_wise_fp8_v1_loader_moe_shape (#4385) chen 2025-10-15 14:23:38 +08:00
  • 6938df9c23 【Fix CI Bug】Fix ci bug (#4413) AIbin 2025-10-15 14:19:04 +08:00
  • 4efd073a41 fix block_wise_fp8_v1_loader_moe_shape (#4384) chen 2025-10-15 14:08:53 +08:00
  • 582aebd48b [MTP]support mtp chunk_prefill_v1 (#4366) freeliuzc 2025-10-15 13:21:32 +08:00
  • ffe7af8a97 [fix] fix requests & block metrics (#4404) 李泳桦 2025-10-15 11:49:24 +08:00
  • abb62624b8 [fix] Fixed the issue of excessive/redundant spans being returned for streaming requests. (#4375) qwes5s5 2025-10-15 11:47:47 +08:00
  • 28d1b6cd97 [BugFix] fix multinode bugs (#4377) ltd0924 2025-10-15 11:43:39 +08:00
  • d6f775e33b [XPU] fix ep (#4393) zhupengyang 2025-10-15 11:41:05 +08:00
  • 6d0cc0dd9c [Optimization] Optimize split_q_block kernel (#4367) Sunny-bot1 2025-10-15 11:28:00 +08:00
  • e0946ae128 [fix] fix requests & block metrics (#4325) 李泳桦 2025-10-15 11:19:20 +08:00
  • c4f866c457 update benchmark tools (#4416) Zhang Yulong 2025-10-15 11:15:25 +08:00
  • 4b647d17de [CI] Fix partial instability issues (#4418) YuBaoku 2025-10-15 11:13:43 +08:00
  • dd425b89ed [BugFix] fix cache port and zmq close bugs (#4371) ltd0924 2025-10-15 10:29:30 +08:00
  • bc7193f21d 增加4合一视频选择 (#4372) yangjianfengo1 2025-10-15 09:58:24 +08:00
  • fa9a3eef4f Update token_processor.py (#4395) ltd0924 2025-10-15 09:43:28 +08:00
  • c1a2e78b18 add install docs (#4414) yangjianfengo1 2025-10-14 20:17:29 +08:00
  • bf12bee887 [MTP][Cfp8]supports spec dynamic cfp8 (#4290) (#4392) freeliuzc 2025-10-14 19:36:41 +08:00
  • 7f85f00a7d fix offline inference doc (#4412) ApplEOFDiscord 2025-10-14 19:25:21 +08:00
  • 14eb8b4f8b add x1 a3b quantization (#4397) tianlef 2025-10-14 15:04:06 +08:00
  • 73c8e0849f 【Hackathon 9th No.67】add speculate_verify (#4326) co63oc 2025-10-14 14:13:17 +08:00
  • 7b04a1298c [CI] fix diff_error temporarily (#4391) YuBaoku 2025-10-14 11:50:01 +08:00
  • 6f53b67f6c [CI] fix diff_error temporarily (#4390) YuBaoku 2025-10-14 11:13:40 +08:00
  • a751d977bc [Optimization] Fuse get_max_len and get_kv_max_len (#4369) Sunny-bot1 2025-10-13 20:35:00 +08:00
  • 425205b03c [Doc] fix the port conflict issue in the usage example (#4379) YuBaoku 2025-10-13 20:17:06 +08:00
  • 0b7a5778ab [Executor]CUDAGraph support Speculate Decode (#4258) Jundong Liu 2025-10-13 15:21:41 +08:00
  • 2d641078c3 【Hackathon 9th No.20】add unit tests for masked_per_token_quant (#4111) ooo oo 2025-10-13 14:51:11 +08:00
  • 584d116889 [Doc] fix document navigation link paths (#4368) yyssys 2025-10-13 11:01:35 +08:00
  • 263f0735c0 fix bug for support enable thinking and not in a batch (#4331) chenjian 2025-10-13 10:39:10 +08:00
  • 07db281647 [Cherry-Pick][BugFix]fix the bug for prefilled_step_idx signal of cache_messager in cudagraph and PD (#4252) Zero Rains 2025-10-13 10:18:53 +08:00
  • 8d629568f2 [MTP]fix speculate-decoding in dpep mode (#4351) freeliuzc 2025-10-11 17:16:57 +08:00
  • a21e16ee5f [XPU] fix XPU CI bug (#4358) plusNew001 2025-10-11 14:48:14 +08:00
  • a2ec2c4152 [FDConfig]Remove max_model_len in FDConfig (#4350) YuanRisheng 2025-10-11 14:04:17 +08:00
  • 365601ea5a [MTP]support more branchs in topp kernel (#4352) freeliuzc 2025-10-11 11:33:52 +08:00
  • 0c4c28d799 Update rollout_model.py (#4349) gaoziyuan 2025-10-11 11:30:05 +08:00
  • 5035dd82ed [MTP]support more branchs in topp kernel (#4353) freeliuzc 2025-10-11 11:27:35 +08:00
  • 836ba294fc Remove unused import in engine_client.py (#3961) Jiang-Jia-Jun 2025-10-11 10:50:03 +08:00
  • b463a41a06 Update rollout_model.py (#4348) gaoziyuan 2025-10-11 10:48:09 +08:00
  • 3f535b45a2 [Feature] support prefix cache in DP (#4359) ltd0924 2025-10-11 10:12:12 +08:00
  • 28aa18bfc1 [Feature] support prefix cache + dp (#4356) ltd0924 2025-10-11 10:02:08 +08:00
  • 368049673b Add DeepSeek model end-to-end CI (#4360) AIbin 2025-10-11 08:33:37 +08:00
  • 533896fd63 fix paddle_peak_increase size (#4355) AIbin 2025-10-10 21:31:38 +08:00
  • f7eaca3971 【Bug Fix】mla enables tensorcore by default (#4354) AIbin 2025-10-10 20:45:16 +08:00
  • 245931f53d add release images build job (#4265) YUNSHEN XIE 2025-10-10 16:35:49 +08:00
  • b489943261 Update rollout_model.py (#4347) gaoziyuan 2025-10-10 16:21:05 +08:00
  • 6fd3e72da1 [FastDeploy Cli] Bench Command eval and throughput (#4239) qwes5s5 2025-10-10 16:17:44 +08:00
  • 3aa04fbf21 [MTP][Cfp8]supports spec dynamic cfp8 (#4290) lzy 2025-10-10 16:08:10 +08:00
  • 20c7b741f4 [XPU] Support W4A8C8-TP4-300B Model (#4068) yinwei 2025-10-10 15:41:32 +08:00
  • c46d5e48f8 【Hackathon 9th No.86】autogen MultiQueryAppendC8Attention template_instantiation -part (#4330) Zhenghai Zhang 2025-10-10 15:07:48 +08:00
  • c4ebaf8a07 【Inference Optimize】MLA Tensor-Core is enabled by default (#4335) AIbin 2025-10-10 10:54:56 +08:00
  • 3c9eedd562 Simplify CUDAGraph creation logic (#4298) Yuanle Liu 2025-10-10 10:46:16 +08:00
  • fd5fd0bdd7 Remove redundant inplace outputs for append_attention (#4341) Nyakku Shigure 2025-10-10 10:45:26 +08:00
  • 5f80862578 Remove redundant inplace outputs for append_attention (#4340) Nyakku Shigure 2025-10-10 10:21:27 +08:00
  • aa27b03bc0 [Executor]CUDAGraph support Speculate Decode (#3769) RAM 2025-10-09 21:18:29 +08:00
  • 7b1689f437 schedule_bugfix (#4336) AIbin 2025-10-09 20:40:10 +08:00
  • 3cb4b4d7d4 [Doc] Update xpu fastdeploy version to 2.2.1 (#4338) yyssys 2025-10-09 20:14:07 +08:00
  • b650867fff 修改文档 (#4339) yangjianfengo1 2025-10-09 20:10:58 +08:00
  • 48fd5d757d Support MLA_CACHE & Fix V1_Schedule Bug (#4318) AIbin 2025-10-09 12:11:25 +08:00
  • 791b101195 revert worker process ipc signal suffix (#4323) RichardWooSJTU 2025-09-30 18:56:41 +08:00
  • af3872215e [BugFix]remove redundant includes (#4312) fangfangssj 2025-09-30 17:54:19 +08:00
  • d14aadf70e [FIx] CI Approve fix (#4316) Zero Rains 2025-09-30 02:38:24 +08:00
  • e42dc8c694 [BUGFIX] clear request (#4320) v2.2.1 ltd0924 2025-09-29 20:37:58 +08:00
  • 63a03ee152 [feature]2.2 custom_allreduce support cudagraph recapture (#4307) chen 2025-09-29 18:14:21 +08:00
  • 81959c7d88 [NewFeature]custom_allreduce support cudagraph recapture (#4305) chen 2025-09-29 15:56:54 +08:00
  • 7c919070f7 [Metax] support cutlass moe & optimize flash attention (#4208) xiaozude 2025-09-29 11:22:43 +08:00
  • 9cc2c99539 initial commit (#4304) kxz2002 2025-09-29 11:21:57 +08:00
  • 2b2b645296 Fix bugs of splitwise_complete_prefilled_step IPCsignal clear (#4309) K11OntheBoat 2025-09-29 11:21:22 +08:00
  • 3740e33fea 【Feature】ResourceManagerV1 support need block num notifying (#4220) RichardWooSJTU 2025-09-29 11:11:51 +08:00