Zero Rains 
							
						 
					 
					
						
						
							
						
						0fb37ab7e4 
					 
					
						
						
							
							update flake8 version to support pre-commit in python3.12 ( #3000 )  
						
						... 
						
						
						
						* update flake8 version to support pre-commit in python3.12
* polish code 
						
						
					 
					
						2025-07-24 01:43:31 -07:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						29c3292f02 
					 
					
						
						
							
							support c4 attn && fix cache  
						
						
						
						
					 
					
						2025-07-24 12:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						832d25334a 
					 
					
						
						
							
							[Code Simplification] fix init_distributed_environment() ( #2982 )  
						
						
						
						
					 
					
						2025-07-24 11:43:28 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						bfeb664ab8 
					 
					
						
						
							
							update ( #2978 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-24 00:16:42 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						172e69fe17 
					 
					
						
						
							
							FA3 fix bug ( #2987 )  
						
						
						
						
					 
					
						2025-07-23 19:07:43 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						ad202272ed 
					 
					
						
						
							
							【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )  
						
						
						
						
					 
					
						2025-07-23 13:02:50 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						e51f018577 
					 
					
						
						
							
							support chunk_prefill in fa3  
						
						
						
						
					 
					
						2025-07-23 12:19:20 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						93bb68aa71 
					 
					
						
						
							
							[Feature] Marlin MoE backend supports DeepseekV3 ( #2962 )  
						
						... 
						
						
						
						Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”> 
						
						
					 
					
						2025-07-22 18:11:15 +08:00 
						 
				 
			
				
					
						
							
							
								Nyakku Shigure 
							
						 
					 
					
						
						
							
						
						48e6a0ca26 
					 
					
						
						
							
							[SOT] Mark dynamic dims by type annotations ( #2771 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta" 
						
						
					 
					
						2025-07-22 00:23:52 -07:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						8020927f50 
					 
					
						
						
							
							[BugFix] Rename attention params of deepseekv3 ( #2939 )  
						
						... 
						
						
						
						Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”> 
						
						
					 
					
						2025-07-22 14:01:30 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						0262ef7eb3 
					 
					
						
						
							
							custom all reduce support cuda graph ( #2938 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication 
						
						
					 
					
						2025-07-21 22:52:03 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ff4569f135 
					 
					
						
						
							
							remove some code in ep.py ( #2947 )  
						
						
						
						
					 
					
						2025-07-21 22:44:57 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						67990e0572 
					 
					
						
						
							
							[Feature] support min_p_sampling ( #2872 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py 
						
						
					 
					
						2025-07-20 23:17:59 -07:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						d306944f4f 
					 
					
						
						
							
							remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )  
						
						... 
						
						
						
						* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block 
						
						
					 
					
						2025-07-18 16:13:32 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						d49f8fb30a 
					 
					
						
						
							
							[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )  
						
						... 
						
						
						
						* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug 
						
						
					 
					
						2025-07-17 17:58:08 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						dbb9e2506b 
					 
					
						
						
							
							Fix rollout_model init ( #2881 )  
						
						
						
						
					 
					
						2025-07-16 22:36:21 -07:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						1f15ca21e4 
					 
					
						
						
							
							[Feature] support prompt repetition_penalty ( #2806 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-17 12:05:52 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						63d6e7ce06 
					 
					
						
						
							
							fix and refine vl ( #2866 )  
						
						... 
						
						
						
						* refine vl config
* delete attn_sep
* fix vl accuracy 
						
						
					 
					
						2025-07-16 05:59:28 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						aa76085d1f 
					 
					
						
						
							
							[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) 
						
						
					 
					
						2025-07-16 20:10:57 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						dda4a9f848 
					 
					
						
						
							
							rl update ( #2861 )  
						
						
						
						
					 
					
						2025-07-16 00:33:10 -07:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						2d1184aefe 
					 
					
						
						
							
							[Fix] fix expert_parallel bug in decoder stage ( #2848 )  
						
						
						
						
					 
					
						2025-07-16 11:08:18 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						0fad10b35a 
					 
					
						
						
							
							[Executor] CUDA Graph support padding batch ( #2844 )  
						
						... 
						
						
						
						* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug 
						
						
					 
					
						2025-07-15 19:49:01 -07:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						61b3997b85 
					 
					
						
						
							
							refactor rl get_name_mappings_to_training ( #2847 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix 
						
						
					 
					
						2025-07-15 07:31:42 -07:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						fd91da7b41 
					 
					
						
						
							
							【Inference Optimize】Support  wint2 triton kernel about triton_utils_v2 ( #2842 )  
						
						... 
						
						
						
						* update supported_models doc 
						
						
					 
					
						2025-07-15 14:35:40 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7cdd8d290d 
					 
					
						
						
							
							[MTP] optimize mtp infer speed ( #2840 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-14 19:50:22 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						4c7b8bc458 
					 
					
						
						
							
							Simplify the Config code ( #2770 )  
						
						... 
						
						
						
						* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe 
						
						
					 
					
						2025-07-14 19:50:05 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						c08561c13a 
					 
					
						
						
							
							[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 ( #2799 )  
						
						
						
						
					 
					
						2025-07-11 15:09:43 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						240d6236bc 
					 
					
						
						
							
							[Fix]fix top_k_top_p sampling ( #2801 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix topk-topp
* update
* add base_non_truncated 
						
						
					 
					
						2025-07-10 22:35:10 +08:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						59071268b6 
					 
					
						
						
							
							[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )  
						
						... 
						
						
						
						* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time 
						
						
					 
					
						2025-07-10 20:36:51 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						d33105baeb 
					 
					
						
						
							
							[Feature] Online Chat API Support Return logprobs ( #2777 )  
						
						... 
						
						
						
						* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform 
						
						
					 
					
						2025-07-10 16:33:40 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						24f934f1f9 
					 
					
						
						
							
							[BugFix] Fix low prediction accuracy of deepseekv3 ( #2798 )  
						
						
						
						
					 
					
						2025-07-10 16:16:44 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						1e2319cbef 
					 
					
						
						
							
							Rename top_p_sampling to top_k_top_p_sampling ( #2791 )  
						
						
						
						
					 
					
						2025-07-10 00:09:25 -07:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						e45050cae3 
					 
					
						
						
							
							[Feature] support top_k_top_p sampling ( #2753 )  
						
						... 
						
						
						
						* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix 
						
						
					 
					
						2025-07-09 20:58:58 -07:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						b0f525955c 
					 
					
						
						
							
							[SOT] Remove breakgraph in post processing && fix datatype ( #2780 )  
						
						
						
						
					 
					
						2025-07-10 11:26:00 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						888780ffde 
					 
					
						
						
							
							[Feature] block_wise_fp8 support triton_moe_backend ( #2767 )  
						
						
						
						
					 
					
						2025-07-09 19:22:47 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						1f28bdf994 
					 
					
						
						
							
							dcu adapter ernie45t ( #2756 )  
						
						... 
						
						
						
						Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-07-09 18:56:27 +08:00 
						 
				 
			
				
					
						
							
							
								yulangz 
							
						 
					 
					
						
						
							
						
						be21ef5047 
					 
					
						
						
							
							[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )  
						
						... 
						
						
						
						* fix no quant xpu moe
* change dir of xpu moe weight only 
						
						
					 
					
						2025-07-09 15:57:51 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						fee544e808 
					 
					
						
						
							
							fix ep prefill ( #2762 )  
						
						
						
						
					 
					
						2025-07-09 14:03:05 +08:00 
						 
				 
			
				
					
						
							
							
								GoldPancake 
							
						 
					 
					
						
						
							
						
						f7cad30a38 
					 
					
						
						
							
							[Feature] Add speculative decoding simulation benchmark. ( #2751 )  
						
						... 
						
						
						
						* Add speculative decoding simulation benchmark
* Fix the name of the parameter 
						
						
					 
					
						2025-07-09 12:08:43 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						6610aa29d0 
					 
					
						
						
							
							Revert "[Bug fix] fix attention rank init ( #2743 )" ( #2761 )  
						
						... 
						
						
						
						This reverts commit e8bbe7244b 
						
						
					 
					
						2025-07-09 10:38:12 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						e8bbe7244b 
					 
					
						
						
							
							[Bug fix] fix attention rank init ( #2743 )  
						
						... 
						
						
						
						* fix attention rank init
* fix attention rank init 
						
						
					 
					
						2025-07-08 17:19:49 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d0f4d6ba3a 
					 
					
						
						
							
							[GCU] Support gcu platform ( #2702 )  
						
						... 
						
						
						
						baseline: e7fa57ebaexing.wo@163.com > 
						
						
					 
					
						2025-07-08 13:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						26d5d737dd 
					 
					
						
						
							
							【Fearture】support qwen2 some func ( #2740 )  
						
						... 
						
						
						
						* add rl qwen model support
* fix
* fix 
						
						
					 
					
						2025-07-08 12:03:04 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						1eb8ea7328 
					 
					
						
						
							
							[Bug fix] fix complie bug when sm < 89 ( #2738 )  
						
						
						
						
					 
					
						2025-07-08 11:24:52 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						ef6649a577 
					 
					
						
						
							
							[Optimize] Optimize tensorwise fp8 performance ( #2729 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Optimize] Optimize tensorwise fp8 performance 
						
						
					 
					
						2025-07-07 20:06:28 +08:00 
						 
				 
			
				
					
						
							
							
								liddk1121 
							
						 
					 
					
						
						
							
						
						1b54a2831e 
					 
					
						
						
							
							Adapt for iluvatar gpu ( #2684 )  
						
						
						
						
					 
					
						2025-07-07 16:53:14 +08:00 
						 
				 
			
				
					
						
							
							
								GoldPancake 
							
						 
					 
					
						
						
							
						
						e7fa57ebae 
					 
					
						
						
							
							Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue ( #2707 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix mtp eh_proj layer
* fix mtp update_cfg function
* fix stringdoc
* simplify class name 
						
						
					 
					
						2025-07-04 14:15:04 +08:00