Commit Graph

  • 70633c6641 [fix] fix gpu_caches key (#4311) 李泳桦 2025-09-28 21:32:57 +08:00
  • 1282ebe1b1 add_cli_tokenizer (#4278) xiaolei373 2025-09-28 20:47:35 +08:00
  • 18f9c41370 Update paddleformers version to 0.1.6 (#4310) Yuanle Liu 2025-09-28 20:17:18 +08:00
  • 6265f4385f [feat] support prefix cache clearing when /clear_load_weight is called (#4008) 李泳桦 2025-09-28 19:42:53 +08:00
  • 59313ed7f9 [XPU] fix VL thinking mode (#4266) Lucas 2025-09-28 17:37:37 +08:00
  • c35a21a99a [Feature] support fd return decode response (#4300) ltd0924 2025-09-28 16:11:50 +08:00
  • aa1cc09c5b fix machete pre quant (#4295) Sunny-bot1 2025-09-28 16:11:09 +08:00
  • c8985727a6 support mtp in hybird-dp-tp mode (#4299) freeliuzc 2025-09-28 15:58:45 +08:00
  • 7b6cb72ab2 Fix wrong batch size of thinking_mask (#4296) K11OntheBoat 2025-09-28 14:56:42 +08:00
  • 3cef851468 [Bug fix] Fix bug for running ep (#4245) chenjian 2025-09-28 14:56:18 +08:00
  • 31e32b5821 [fix]remove reasoning_max_tokens=max_toksns*0.8 in sampling_params (#4294) luukunn 2025-09-28 14:44:54 +08:00
  • 17e00d9f5d fix reasoning_max_tokens (#4277) luukunn 2025-09-28 14:05:29 +08:00
  • 076c30cb0f fix top_p_candidates and support separate setting of sampling params for mtp (#4189) GoldPancake 2025-09-28 11:41:20 +08:00
  • f8c6a354a1 [BUGFIX] clear request (#4286) ltd0924 2025-09-27 14:08:48 +08:00
  • aa045aa84f fix typos (#4274) Zhenghai Zhang 2025-09-27 09:25:43 +08:00
  • 79c2c52756 deepgemm pre-compile tool support mixed parallel (#4282) GoldPancake 2025-09-26 18:43:39 +08:00
  • 5c6e859681 increase ccache size (#4255) YUNSHEN XIE 2025-09-26 17:40:07 +08:00
  • f40d7c6d65 [Docs]When XPU starts the service, the model loader uses the default version (#4292) yyssys 2025-09-26 15:58:12 +08:00
  • b176cba474 support mtp in ep64 (#4280) freeliuzc 2025-09-26 15:38:03 +08:00
  • 331c4d2a74 Set approve checking for config.py, worker, model and cudagraph (#4276) Zero Rains 2025-09-26 14:50:54 +08:00
  • 838de53de8 Add speculative decoding approval check (#4284) GoldPancake 2025-09-26 14:47:45 +08:00
  • 55124f8491 Add cli run batch (#4237) xiaolei373 2025-09-26 14:27:25 +08:00
  • 8a964329f4 add glm benchmark yaml (#4289) tianlef 2025-09-26 14:23:29 +08:00
  • 67e693b18b fix ernie vl distributed attr. (#4215) Zhong Hui 2025-09-26 14:18:49 +08:00
  • 12a3587cca [Supplements and upgrades]Improvement of X1 parsers (#4172) zhuzixuan 2025-09-26 13:37:37 +08:00
  • dd2e844ea3 [CI] fix base_test error temporarily (#4283) YuBaoku 2025-09-26 11:24:55 +08:00
  • 4ec00df2b0 [Feature] add config api (#4254) memoryCoderC 2025-09-26 11:21:02 +08:00
  • dcf633c4d9 delete default value reasoning_max_tokens (#4250) Yuanle Liu 2025-09-26 10:42:27 +08:00
  • 83d41d23b0 initial commit (#4248) kxz2002 2025-09-25 21:42:05 +08:00
  • c415885a94 [Docs]Add ENABLE_V1_KVCACHE_SCHEDULER=0 to docs (#4268) yyssys 2025-09-25 20:09:03 +08:00
  • 213f15ef55 fix ernie vl distributed attr. (#4259) Zhong Hui 2025-09-25 20:06:29 +08:00
  • 4515ad21e9 Support limit thinking lengths (#4069) K11OntheBoat 2025-09-25 19:55:56 +08:00
  • b272ca9f83 [Bug fix] Fix bug for supporting max think len (#4267) chenjian 2025-09-25 19:08:38 +08:00
  • 0c6f1932c5 delete_moe_phase_in_parallel_config (#4264) Yuanle Liu 2025-09-25 17:14:37 +08:00
  • aebe12a58d [fix]update apply_chat_template (#4249) luukunn 2025-09-25 16:41:56 +08:00
  • 87179cb744 [XPU] support XPU VL model inference (#4030) Lucas 2025-09-25 14:34:15 +08:00
  • e36eccfdad 【Hackathon 9th No.21、23】add unit tests for fused_hadamard_quant_fp8, moe_fused_hadamard_quant_fp8 (#4094) ooo oo 2025-09-25 12:15:00 +08:00
  • db653644ad Fix fetch request exceed max block num (#4257) chenjian 2025-09-25 00:57:45 +08:00
  • 4aa057f28d fix fetch (#4253) chenjian 2025-09-24 22:37:47 +08:00
  • bab779011c [CudaGraph] support cudagraph use shared pool (#4199) lizhenyun01 2025-09-24 21:32:04 +08:00
  • b433a93d9a fix the bug for prefilled_step_idx signal of cache_messager in cudagraph and PD (#4235) Zero Rains 2025-09-24 19:46:52 +08:00
  • 870364b547 [CUDAGraph]CUDA Graph support unique memory pool (#4230) RAM 2025-09-24 19:45:22 +08:00
  • 5ff10c8ced [Model] Qwen2.5VL support --use-cudagraph and unit testing (#4087) CSWYF3634076 2025-09-24 19:45:01 +08:00
  • 18f4977aec [fix]update apply_chat_template (#4137) luukunn 2025-09-24 18:56:32 +08:00
  • 05b7800d80 Support limit thinking lengths (#4244) K11OntheBoat 2025-09-24 17:30:53 +08:00
  • dc600010de [Fix] X1 reasoning parser , skip parsing of \n around special tokens (#4241) feature/online/fix-thinking-20250924 zhuzixuan 2025-09-24 17:04:59 +08:00
  • e2b68b33c9 fix mtp in rl (#4234) freeliuzc 2025-09-24 16:59:24 +08:00
  • 7c1fd19f0f [OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238) chen 2025-09-24 16:39:51 +08:00
  • 8a506500f3 [BugFix] Fix EP MoE prefill function (#4101) Sunny-bot1 2025-09-24 15:31:41 +08:00
  • 8b0ce8e3ab [Feature] add cli command serve (#4226) memoryCoderC 2025-09-24 14:50:45 +08:00
  • 9566ae8827 [Bug Fix] disable prefix caching in mm model (#4167) ApplEOFDiscord 2025-09-24 14:43:46 +08:00
  • e8318b7477 [BugFix] fix qwen3-embedding model tp>1 (#4223) lizexu123 2025-09-24 14:13:26 +08:00
  • 3161014e49 [BugFix]fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param (#4229) chen 2025-09-24 14:12:05 +08:00
  • 44010cee13 FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendWriteCacheKVQKV cache_kernel(). Continue when batch_id_per_token[token_idx] is default value -1. (#4218) Yohanna 2025-09-24 14:08:49 +08:00
  • 8fdb950e9f include_stop_str_in_output=False not return eos text (#4231) chen 2025-09-24 14:07:30 +08:00
  • f1b5392e20 [Intel HPU] Support intel hpu platform (#4161) fmiao2372 2025-09-24 12:27:50 +08:00
  • a1c5d930bb 【Hackathon 9th No.24】add rebuild_padding (#4107) co63oc 2025-09-24 12:08:17 +08:00
  • 1aab1c8d06 [BugFix] fix clear data (#4227) ltd0924 2025-09-24 11:23:44 +08:00
  • b455fd39f3 register_model_class compatible with plugins (#4236) Yuanle Liu 2025-09-24 11:17:12 +08:00
  • d6e59447f5 [XPU] Enable XPU V1 mode based on environment variable (#4213) yyssys 2025-09-24 10:29:48 +08:00
  • ec99474e71 [Test]add glm45_air logprob test and rollout model (#4175) chen 2025-09-23 21:06:07 +08:00
  • 94b6e7a341 [MTP][RL]support rl reshard wenxin-tools-145 (#4173) freeliuzc 2025-09-23 20:40:26 +08:00
  • 12043fc476 fix bug for trigger preempted (#4228) chenjian 2025-09-23 20:34:51 +08:00
  • a460462d2a fix ernie vl distributed attr. (#4217) Zhong Hui 2025-09-23 19:37:38 +08:00
  • cb8d87b945 [fix] fix clearing caches synchronization and add more logs (#4212) 李泳桦 2025-09-23 19:36:38 +08:00
  • 62d1c48363 [v1 loader]code style (#4204) bukejiyu 2025-09-23 19:36:00 +08:00
  • 1a6283424e Fix noaux_tc cuda Error 700 in CUDAGraph (#4174) chen 2025-09-23 18:41:33 +08:00
  • acecd5bebe fix for pd decode not enough block (#4224) chenjian 2025-09-23 18:01:25 +08:00
  • de4feff147 [Feature]CP support data clear (#4214) ltd0924 2025-09-23 16:53:39 +08:00
  • 389c5dd3a2 Each module should have its own plugins_loaded (#4149) Yuanle Liu 2025-09-23 15:44:46 +08:00
  • 361104508e support reasoning_max_tokens (#4207) Yuanle Liu 2025-09-23 15:44:41 +08:00
  • c96a535a5d [Feature] support qwen3-embedding model load (#4202) lizexu123 2025-09-23 15:14:35 +08:00
  • 0bfffdbc14 [CI] remove test_common_model (#4196) YuBaoku 2025-09-23 14:23:05 +08:00
  • 9082f625ba [xpu] use cpu barrier (#4181) zhupengyang 2025-09-23 12:19:03 +08:00
  • 813befadfa Update run_ci_xpu.sh to lock xvllm version (#4210) plusNew001 2025-09-23 11:20:08 +08:00
  • c32aae901f [XPU] update XPU CI (#4209) plusNew001 2025-09-23 10:28:49 +08:00
  • 4325b737e7 【FIX】Change the name of sparse attn from moba to plas (#4006) (#4076) yangjianfengo1 2025-09-23 10:26:40 +08:00
  • 2c34a557f4 [XPU]change xpu ci model (#4117) plusNew001 2025-09-23 10:21:17 +08:00
  • 83720da79f [Feature] support clear data (#3601) ltd0924 2025-09-23 10:20:02 +08:00
  • f38b174a75 Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method (#4115) chen 2025-09-22 21:27:37 +08:00
  • 772f0156f3 Remove useless code (#4195) Jiang-Jia-Jun 2025-09-22 21:18:19 +08:00
  • 504461b6b5 [Iluvatar GPU] Optimize attention performance and fix moe load ckpt error (#3651) yzwu 2025-09-22 21:13:59 +08:00
  • 6b47773bd6 [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4177) luukunn 2025-09-22 21:12:05 +08:00
  • 5532e8a323 [FD CLI] Add bench cli (#4160) Zhang Yulong 2025-09-22 20:37:30 +08:00
  • 5e1f13bd3b add test_set_value_by_flags_and_idx.py (#4186) Echo-Nie 2025-09-22 20:21:34 +08:00
  • f489c9f8ef [Feature] support adapter (#4180) ltd0924 2025-09-22 19:32:24 +08:00
  • 0358329946 [fix] initialize available_gpu_block_num with max_gpu_block_num (#4193) 李泳桦 2025-09-22 18:56:00 +08:00
  • c5671d7c09 [MTP][Unit Test]add test_top_p_candidates (#4046) co63oc 2025-09-22 17:06:38 +08:00
  • 918ccdb123 [Feature] Support pd ep deployment with yiyan adapter (#4029) chenjian 2025-09-22 16:41:38 +08:00
  • 9845f0d010 【Hackathon 9th No.30】add test_tritonmoe_preprocess (#3891) Echo-Nie 2025-09-22 15:31:32 +08:00
  • be98f6e950 supports internode_ll_two_stage (#4143) lzy 2025-09-22 14:55:06 +08:00
  • 01f6934162 [Executor] Adjust signal sending order in RL training (#3773) (#4066) (#4178) RAM 2025-09-22 14:31:36 +08:00
  • c4830ef24c fix typos (#4176) co63oc 2025-09-22 14:27:17 +08:00
  • 0b62648924 test xly ci Divano 2025-09-22 14:13:00 +08:00
  • c86945ef49 [Feature] support pool (#3827) lizexu123 2025-09-22 14:09:09 +08:00
  • 7bdc6f41e5 fix glm all_reduce tp group (#4188) chen 2025-09-22 10:57:13 +08:00
  • da74a5f0b3 fix glm all_reduce tp group (#4187) chen 2025-09-22 10:56:55 +08:00
  • 718f32a6b0 fix nul (#4191) co63oc 2025-09-22 10:55:33 +08:00
  • 5c33be5a7d [TEST] init first commit (#4192) Lucas 2025-09-22 10:51:27 +08:00
  • 9f1882d9a8 fa3_rope (#4190) xiaoxiaohehe001 2025-09-21 22:04:59 +08:00