Commit Graph

  • 0f75b62de2 [BugFix] Fix profile run in pd-disaggregated deployment (#4584) 李泳桦 2025-10-31 14:42:00 +08:00
  • 9cf4005e62 [Cherry-pick] Fix profile run in pd-disaggregated deployment (#4693) 李泳桦 2025-10-31 14:41:35 +08:00
  • 7847b44172 [Graph Optimization] Refactor default capture list (#4631) RAM 2025-10-31 14:18:27 +08:00
  • eef85e4ff0 [BugFix] update eb5 video chunk (#4705) yangjianfengo1 2025-10-31 14:08:23 +08:00
  • c785c2dab1 [Scheduler] update v1 prefill batch (#4563) kevin 2025-10-31 14:03:18 +08:00
  • 64e875b460 [Scheduler] update v1 prefill batch (#4611) kevin 2025-10-31 14:03:01 +08:00
  • dde7ba3f9e [CI]add_tokenizer_cli_unitest (#4620) xiaolei373 2025-10-31 13:57:51 +08:00
  • 412097c1b8 benchmark工具支持受限解码场景指定response_format (#4718) ophilia-lee 2025-10-31 12:26:24 +08:00
  • 10de7a3b82 add flops and bandwidth to test_ffn.py (#4704) 周周周 2025-10-31 12:13:59 +08:00
  • 9b18f0b55d cache scale load (#4624) Sunny-bot1 2025-10-31 11:58:33 +08:00
  • 3f15e6fa15 load cache scale (#4623) Sunny-bot1 2025-10-31 11:57:57 +08:00
  • 1f3ce65b58 [Feature] support mtp distribution equivalence verification (#4699) GoldPancake 2025-10-31 11:45:04 +08:00
  • 28de91b50f [Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B (#4645) Ryan 2025-10-31 11:38:43 +08:00
  • 937bcfc6ed [XPU] [CI] Lock xvllm version (#4715) plusNew001 2025-10-31 11:32:38 +08:00
  • b61a272385 [BugFix] fix unittest of get_save_output_v1 (#4701) Longzhi Wang 2025-10-31 11:23:49 +08:00
  • 802dfa6524 fix --logprobs-mode raw_logits (#4681) (#4712) chen 2025-10-31 10:50:31 +08:00
  • a2870ed4a9 [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668) kxz2002 2025-10-31 10:45:27 +08:00
  • 82bd7e5db4 [BugFix] Fix finish reason in _create_chat_completion_choice (#4582) kxz2002 2025-10-31 10:42:19 +08:00
  • ea866e4b34 [XPU] [CI] Add Vl case (#4649) plusNew001 2025-10-31 10:38:09 +08:00
  • 2e7b7a42c2 [XPU] xpu currently disable prefix cache for VL model (#4694) ddchenhao66 2025-10-31 10:37:41 +08:00
  • b87384aa70 [XPU] xpu currently disable prefix cache for VL model (#4695) ddchenhao66 2025-10-31 10:36:39 +08:00
  • b73a78155f fix --logprobs-mode raw_logits (#4681) chen 2025-10-30 19:53:42 +08:00
  • 40b87065cc fix docs bug (#4703) ming1753 2025-10-30 19:53:40 +08:00
  • 35286ce31a fix total_block_num init error in worker_process (#4687) zhouchong 2025-10-30 19:53:09 +08:00
  • 7dc9d9885e [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686) kxz2002 2025-10-30 19:45:41 +08:00
  • 0089287534 [noauxtc_kernel] remove useless code (#4643) 周周周 2025-10-30 18:59:04 +08:00
  • ec7746bd55 Update multi-node_deployment.md Jiang-Jia-Jun 2025-10-30 16:40:30 +08:00
  • ca52cadd74 Update multi-node_deployment.md Jiang-Jia-Jun 2025-10-30 16:40:08 +08:00
  • 8b9c9463cd add real gate_correction_bias weight to mock un-balanced dispatch (#4676) 周周周 2025-10-30 15:13:21 +08:00
  • f1de348cbf Update common_engine.py Jiang-Jia-Jun 2025-10-30 14:05:04 +08:00
  • 71135d58a0 Change log level from info to debug for response Jiang-Jia-Jun 2025-10-30 14:02:50 +08:00
  • 9defdaed6b [BugFix] Fix PaddleOCRVL bug (#4678) ming1753 2025-10-30 13:49:08 +08:00
  • 05c1167c74 fix mtp logprob bugs (#4663) GoldPancake 2025-10-30 13:47:23 +08:00
  • d7d0112bbf [CI] Add test for paddleocr_vl (#4627) Haonan Luo 2025-10-30 13:40:04 +08:00
  • cd3b7cc392 [Graph Optimization] Add the CUDAGraph usage switch for Draft Model (#4601) RAM 2025-10-30 11:44:50 +08:00
  • cfdd1600a5 update doc (#4675) ApplEOFDiscord 2025-10-30 11:19:04 +08:00
  • 40cfed5bc9 [Feature] support eb5 video chunk (#4671) Dangweichong 2025-10-30 11:01:32 +08:00
  • fddda50cb9 Add ut for speculative sampler (#4650) GoldPancake 2025-10-30 10:37:49 +08:00
  • 1712e1351b 【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592) Zhenghai Zhang 2025-10-30 10:28:36 +08:00
  • e25c067f70 [OP] Add InferShape&InferDtype for per_token_quant_padding (#4667) Ryan 2025-10-30 10:28:26 +08:00
  • 52a6e0be41 [Cherry-Pick] add mm token usage (#4648) ApplEOFDiscord 2025-10-30 09:58:07 +08:00
  • 50be19a88a [EP] fix several bugs in data parallel (#4657) ltd0924 2025-10-30 09:50:49 +08:00
  • 895ca7694e [Feature] add a new reasoning parser (#4571) (#4664) kxz2002 2025-10-30 09:49:53 +08:00
  • dab04ab413 add noaux_tc to unitest fused_moe (#4656) 周周周 2025-10-29 21:50:25 +08:00
  • fd5015263d Increase pytest timeout for XPU test (#4665) plusNew001 2025-10-29 19:57:14 +08:00
  • c30bfb294f [Feature] add a new reasoning parser (#4571) kxz2002 2025-10-29 18:16:50 +08:00
  • 19df1aec2b [Docs] add Qwen25vl yaml (#4662) xjkmfa 2025-10-29 17:39:40 +08:00
  • df72033adb [XPU] fix pos_emb_type bug (#4639) Lucas 2025-10-29 17:14:47 +08:00
  • 8f40dfa9bf [XPU] fix pos_emb_type bug (#4638) Lucas 2025-10-29 17:14:32 +08:00
  • 7b275efc59 [Docs] Add PaddleOCR-VL-0.9B best practices (#4661) ming1753 2025-10-29 16:58:38 +08:00
  • d68345cb7e [Docs] Add PaddleOCR-VL-0.9B best practices (#4658) ming1753 2025-10-29 16:48:54 +08:00
  • 21bb2d69d1 [XPU] Update the return value of TextImageGatherScatter (#4646) ddchenhao66 2025-10-29 16:17:23 +08:00
  • c92eeed45d [XPU] Update the return value of TextImageGatherScatter (#4636) ddchenhao66 2025-10-29 16:17:01 +08:00
  • 1081cad4a0 set default value as false gongshaotian 2025-10-28 19:10:45 +08:00
  • fa85956c6f add draft model using cudagraph switch gongshaotian 2025-10-27 14:56:34 +08:00
  • 006c7e5a0d [Feature] Support attention dp balance for mixed deployment (#4594) lizhenyun01 2025-10-29 15:23:51 +08:00
  • 14f8cddaf1 [Feature] add mm token usage (#4570) ApplEOFDiscord 2025-10-29 14:37:12 +08:00
  • fc5cd1adb1 [BugFix] Fix graph opt test case (#4634) RAM 2025-10-29 13:28:04 +08:00
  • a0d5426ab6 fix ci test case (#4635) RAM 2025-10-29 13:26:38 +08:00
  • 0dde936e93 [BugFix] fix total_block_num init error in worker_process (#4553) RichardWooSJTU 2025-10-29 11:42:12 +08:00
  • 14e7d88ea4 [feature] support reward api (#4518) xiaolei373 2025-10-29 00:20:28 +08:00
  • a012e3608b [Feature] support logits processors (#4515) 李泳桦 2025-10-29 00:08:53 +08:00
  • 24b9505971 add einops dependency (#4633) zhang-prog 2025-10-28 22:17:13 +08:00
  • 20756cd2bb fix import jit.marker.unified (#4622) Yuanle Liu 2025-10-28 22:11:03 +08:00
  • 561b9f38d3 [BugFix] fix paddleocr prefix cache bug (#4625) ming1753 2025-10-28 21:38:12 +08:00
  • fff5fb5e39 [Graph Optimization] Refactor default capture list (#4617) RAM 2025-10-28 21:31:02 +08:00
  • 0a0c74e717 [XPU] Support PaddleOCR-VL model for XPU (#4529) Lucas 2025-10-28 20:35:04 +08:00
  • 2a9ed72533 feat: add support for API usage with multimodal models (#4548) SunLei 2025-10-28 20:23:46 +08:00
  • e1ac90d787 [CI] Revert test_rollout_model directory change (#4626) YuBaoku 2025-10-28 20:14:00 +08:00
  • 567f61072c [CI][BugFix] fix port conflicts in concurrent ci test and add more unit test on async_llm (#4616) zhouchong 2025-10-28 19:04:24 +08:00
  • cd6d1f633c [XPU]add xpu ci w4a8 case (#4501) yyssys 2025-10-28 19:02:29 +08:00
  • 07956a87b3 [Graph Optimization] Fix IR graph dependency error exposed after enabling SOT by updating the return value of TextImageGatherScatter (#4610) Ryan 2025-10-28 18:31:23 +08:00
  • 4d2f478d53 [BugFix] fix TPDP mix parallel infer (#4583) lizhenyun01 2025-10-28 16:58:20 +08:00
  • c63361fd1d [Speculative Decoding][MTP]Support mtp in epdptp mode (#4614) freeliuzc 2025-10-28 16:02:47 +08:00
  • b4014834a9 Extend sleep time to 10 seconds in switch_service (#4618) Zhang Yulong 2025-10-28 15:19:21 +08:00
  • 86d5006a57 [Graph Optimization][Speculative Decoding] Update yaml and fix typo (#4612) RAM 2025-10-28 11:43:26 +08:00
  • b2c6c41447 [CI] Relocate server test cases from ci_use directory to e2e (#4608) YuBaoku 2025-10-28 11:37:30 +08:00
  • 31180a6a13 fix_run_batch_unittest (#4613) xiaolei373 2025-10-28 10:38:06 +08:00
  • 0b196d82f3 [docs] add cli uasge to docs (#4569) xiaolei373 2025-10-28 10:35:11 +08:00
  • 6426414a0f [Feature] EngineWorkerQueue anonymous port (#4597) Daci 2025-10-28 10:22:37 +08:00
  • 7681375a19 [BugFix] PaddleOCR-VL fix FD_DEBUG type and support v1 loader (#4605) ming1753 2025-10-28 09:47:47 +08:00
  • c9be8762b6 [MTP]Merge support attn (#4591) freeliuzc 2025-10-27 21:13:08 +08:00
  • 6dcf5a3e87 fix: resolve decode bug in offline stream output (#4603) zhouchong 2025-10-27 20:17:10 +08:00
  • 3729e910a6 remove dev sync in prefill (#4598) 周周周 2025-10-27 19:54:43 +08:00
  • 64d1aa973b [Unitest]Add unitest of Attention Layer (#4494) K11OntheBoat 2025-10-27 19:18:50 +08:00
  • 70aa7423f8 benchmark工具适配SGLang框架 (#4607) ophilia-lee 2025-10-27 18:52:56 +08:00
  • c91c5040c4 [XPU] update kunlun doc about supported models (#4586) ddchenhao66 2025-10-27 18:31:51 +08:00
  • 25a983ba9c 1.fix the bug of draft model with ep 2.fix sampler bug (#4589) RAM 2025-10-27 17:47:34 +08:00
  • 8aab4e367f [Feature] mm support prefix cache (#4134) kevin 2025-10-27 17:39:51 +08:00
  • a4fb3d4ff0 [CI] Fix path error of /re-run (#4606) YuBaoku 2025-10-27 17:03:11 +08:00
  • 5c63a089f6 [Feature] Support logprobs_mode (#4567) chen 2025-10-27 14:27:48 +08:00
  • 2cf0b0b715 [Feature] support mtp distribution equivalence verification (#4566) GoldPancake 2025-10-27 11:01:28 +08:00
  • acd331780c [V1 loader] Qwen25 VL support v1 loader and torch style safetensors load (#4388) CSWYF3634076 2025-10-27 10:54:15 +08:00
  • 5c6105f4a2 [XPU] bind some OPs for VL model with pybind (#4522) Lucas 2025-10-27 10:50:08 +08:00
  • cdc40cdc2a [Others] api server exits when worker process is dead (#3271) 李泳桦 2025-10-27 10:23:48 +08:00
  • ebae69b1f8 [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593) YuBaoku 2025-10-27 10:18:56 +08:00
  • 83b720804b Clean up ports after processing results (#4587) Zhang Yulong 2025-10-27 10:13:24 +08:00
  • 1a21e6c529 support mtp draft model with ep (#4581) RAM 2025-10-27 09:34:54 +08:00
  • dc1a9c7287 perf: Optimize task queue communication from engine to worker (#4531) SunLei 2025-10-25 22:45:38 +08:00
  • 327fa4c255 [DataProcessor] add reasoning_tokens into usage info (#4520) kxz2002 2025-10-25 16:57:58 +08:00