lizexu123 
							
						 
					 
					
						
						
							
						
						9b22b8d2c3 
					 
					
						
						
							
							delete max-len ( #2959 )  
						
						
						
						
					 
					
						2025-07-23 15:11:39 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						ad202272ed 
					 
					
						
						
							
							【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )  
						
						
						
						
					 
					
						2025-07-23 13:02:50 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						e51f018577 
					 
					
						
						
							
							support chunk_prefill in fa3  
						
						
						
						
					 
					
						2025-07-23 12:19:20 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						95b5af24db 
					 
					
						
						
							
							[SOT] Add sot warmup (NVIDIA GPU Only) ( #2929 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add sot warmup
* fix code style
* change batch_size list
* add param to config
* rm free_list settings && set sot_warmup_sizes
* finish debug with dynamic dims by type annotations
* add profile_run guard
* rm sth useless 
						
						
					 
					
						2025-07-22 21:36:14 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						93bb68aa71 
					 
					
						
						
							
							[Feature] Marlin MoE backend supports DeepseekV3 ( #2962 )  
						
						... 
						
						
						
						Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”> 
						
						
					 
					
						2025-07-22 18:11:15 +08:00 
						 
				 
			
				
					
						
							
							
								Nyakku Shigure 
							
						 
					 
					
						
						
							
						
						48e6a0ca26 
					 
					
						
						
							
							[SOT] Mark dynamic dims by type annotations ( #2771 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta" 
						
						
					 
					
						2025-07-22 00:23:52 -07:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						8020927f50 
					 
					
						
						
							
							[BugFix] Rename attention params of deepseekv3 ( #2939 )  
						
						... 
						
						
						
						Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”> 
						
						
					 
					
						2025-07-22 14:01:30 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						0262ef7eb3 
					 
					
						
						
							
							custom all reduce support cuda graph ( #2938 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication 
						
						
					 
					
						2025-07-21 22:52:03 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ff4569f135 
					 
					
						
						
							
							remove some code in ep.py ( #2947 )  
						
						
						
						
					 
					
						2025-07-21 22:44:57 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						2f74e93d7e 
					 
					
						
						
							
							use dist.all_reduce(min) to sync num_blocks_local ( #2933 )  
						
						... 
						
						
						
						* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg 
						
						
					 
					
						2025-07-21 01:23:36 -07:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						67990e0572 
					 
					
						
						
							
							[Feature] support min_p_sampling ( #2872 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py 
						
						
					 
					
						2025-07-20 23:17:59 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						8c5407d9e4 
					 
					
						
						
							
							remove cum_offsets from ForwardMeta ( #2925 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-19 23:57:27 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						5328daa333 
					 
					
						
						
							
							[Bug Fix] fix ep config bug ( #2920 )  
						
						
						
						
					 
					
						2025-07-18 19:12:56 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						a42fc3f40b 
					 
					
						
						
							
							[Feature] Support 45tVL EP FP8 Infer. ( #2909 )  
						
						... 
						
						
						
						* support_mm_ep_fp8
* support_mm_ep 
						
						
					 
					
						2025-07-18 17:57:15 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						6efad14b95 
					 
					
						
						
							
							support vl ori_vacab_size ( #2900 )  
						
						
						
						
					 
					
						2025-07-18 16:26:14 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						d306944f4f 
					 
					
						
						
							
							remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )  
						
						... 
						
						
						
						* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block 
						
						
					 
					
						2025-07-18 16:13:32 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						d49f8fb30a 
					 
					
						
						
							
							[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )  
						
						... 
						
						
						
						* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug 
						
						
					 
					
						2025-07-17 17:58:08 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						dbb9e2506b 
					 
					
						
						
							
							Fix rollout_model init ( #2881 )  
						
						
						
						
					 
					
						2025-07-16 22:36:21 -07:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						1f15ca21e4 
					 
					
						
						
							
							[Feature] support prompt repetition_penalty ( #2806 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-17 12:05:52 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						63d6e7ce06 
					 
					
						
						
							
							fix and refine vl ( #2866 )  
						
						... 
						
						
						
						* refine vl config
* delete attn_sep
* fix vl accuracy 
						
						
					 
					
						2025-07-16 05:59:28 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						aa76085d1f 
					 
					
						
						
							
							[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) 
						
						
					 
					
						2025-07-16 20:10:57 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						dda4a9f848 
					 
					
						
						
							
							rl update ( #2861 )  
						
						
						
						
					 
					
						2025-07-16 00:33:10 -07:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						0d0340392f 
					 
					
						
						
							
							[Fix] Fix mm ep weight init. ( #2855 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix_45t_mm
* Update load_weight_utils.py
* Update load_weight_utils.py 
						
						
					 
					
						2025-07-16 12:02:39 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						2d1184aefe 
					 
					
						
						
							
							[Fix] fix expert_parallel bug in decoder stage ( #2848 )  
						
						
						
						
					 
					
						2025-07-16 11:08:18 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						0fad10b35a 
					 
					
						
						
							
							[Executor] CUDA Graph support padding batch ( #2844 )  
						
						... 
						
						
						
						* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug 
						
						
					 
					
						2025-07-15 19:49:01 -07:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						61b3997b85 
					 
					
						
						
							
							refactor rl get_name_mappings_to_training ( #2847 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix 
						
						
					 
					
						2025-07-15 07:31:42 -07:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						e7bcbbab52 
					 
					
						
						
							
							Merge vl execution path into normal execution path ( #2829 )  
						
						... 
						
						
						
						* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c 
						
						
					 
					
						2025-07-15 22:20:03 +08:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						fd91da7b41 
					 
					
						
						
							
							【Inference Optimize】Support  wint2 triton kernel about triton_utils_v2 ( #2842 )  
						
						... 
						
						
						
						* update supported_models doc 
						
						
					 
					
						2025-07-15 14:35:40 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7cdd8d290d 
					 
					
						
						
							
							[MTP] optimize mtp infer speed ( #2840 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-14 19:50:22 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						4c7b8bc458 
					 
					
						
						
							
							Simplify the Config code ( #2770 )  
						
						... 
						
						
						
						* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe 
						
						
					 
					
						2025-07-14 19:50:05 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7f64d408a9 
					 
					
						
						
							
							[MTP] support expert-parellel in mtp ( #2835 )  
						
						
						
						
					 
					
						2025-07-14 14:28:50 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						bad53c6b6e 
					 
					
						
						
							
							[vl]remove duplicated load logic ( #2744 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-13 07:36:26 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						749b2e9c89 
					 
					
						
						
							
							support qwen3moe name_mapping ( #2820 )  
						
						
						
						
					 
					
						2025-07-12 12:05:54 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						c08561c13a 
					 
					
						
						
							
							[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 ( #2799 )  
						
						
						
						
					 
					
						2025-07-11 15:09:43 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						240d6236bc 
					 
					
						
						
							
							[Fix]fix top_k_top_p sampling ( #2801 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix topk-topp
* update
* add base_non_truncated 
						
						
					 
					
						2025-07-10 22:35:10 +08:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						59071268b6 
					 
					
						
						
							
							[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )  
						
						... 
						
						
						
						* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time 
						
						
					 
					
						2025-07-10 20:36:51 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						8c660a0dfb 
					 
					
						
						
							
							[BugFix] fix RMSNorm rms_norm_esp ( #2797 )  
						
						... 
						
						
						
						* fix rms
* add vl
* fix
* add vl
* fix
* fix 
						
						
					 
					
						2025-07-10 20:02:24 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						d33105baeb 
					 
					
						
						
							
							[Feature] Online Chat API Support Return logprobs ( #2777 )  
						
						... 
						
						
						
						* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform 
						
						
					 
					
						2025-07-10 16:33:40 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						24f934f1f9 
					 
					
						
						
							
							[BugFix] Fix low prediction accuracy of deepseekv3 ( #2798 )  
						
						
						
						
					 
					
						2025-07-10 16:16:44 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						1e2319cbef 
					 
					
						
						
							
							Rename top_p_sampling to top_k_top_p_sampling ( #2791 )  
						
						
						
						
					 
					
						2025-07-10 00:09:25 -07:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						e45050cae3 
					 
					
						
						
							
							[Feature] support top_k_top_p sampling ( #2753 )  
						
						... 
						
						
						
						* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix 
						
						
					 
					
						2025-07-09 20:58:58 -07:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						b0f525955c 
					 
					
						
						
							
							[SOT] Remove breakgraph in post processing && fix datatype ( #2780 )  
						
						
						
						
					 
					
						2025-07-10 11:26:00 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						888780ffde 
					 
					
						
						
							
							[Feature] block_wise_fp8 support triton_moe_backend ( #2767 )  
						
						
						
						
					 
					
						2025-07-09 19:22:47 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						e3768c5a83 
					 
					
						
						
							
							[Executor] Fix bug of logger.debug ( #2778 )  
						
						
						
						
					 
					
						2025-07-09 04:13:43 -07:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						1f28bdf994 
					 
					
						
						
							
							dcu adapter ernie45t ( #2756 )  
						
						... 
						
						
						
						Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-07-09 18:56:27 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						03a74995b8 
					 
					
						
						
							
							Clear dead code And supplementary notes ( #2757 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* 1.supplementary notes 2.delete dead code
* fix bug of forward meta
* Global modification of forward meta
* fix vl model_runner bug 
						
						
					 
					
						2025-07-09 16:17:34 +08:00 
						 
				 
			
				
					
						
							
							
								yulangz 
							
						 
					 
					
						
						
							
						
						be21ef5047 
					 
					
						
						
							
							[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )  
						
						... 
						
						
						
						* fix no quant xpu moe
* change dir of xpu moe weight only 
						
						
					 
					
						2025-07-09 15:57:51 +08:00