AIbin 
							
						 
					 
					
						
						
							
						
						a7392a0ff9 
					 
					
						
						
							
							【Inference Optimize】DeepSeek-V3-model MLA Optimize ( #3886 )  
						
						... 
						
						
						
						* support MLA chunk_size auto search & cuda_graph 
						
						
					 
					
						2025-09-11 10:46:09 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						3d0aaa5923 
					 
					
						
						
							
							[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )  
						
						... 
						
						
						
						* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-09-08 13:12:24 +08:00 
						 
				 
			
				
					
						
							
							
								Liumengyuan 
							
						 
					 
					
						
						
							
						
						e93d4cfcdd 
					 
					
						
						
							
							Add with_output version AppendAttention ( #3302 )  
						
						... 
						
						
						
						* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug 
						
						
					 
					
						2025-08-28 17:10:18 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						ad319a87cc 
					 
					
						
						
							
							support fa3 rope3d ( #3622 )  
						
						
						
						
					 
					
						2025-08-27 11:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						1e06b9fa6d 
					 
					
						
						
							
							make append_attn supports mask_offset ( #3138 )  
						
						... 
						
						
						
						* make append_attn supports mask_offset
* add unittest 
						
						
					 
					
						2025-08-14 03:40:55 -07:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						7ce00e597c 
					 
					
						
						
							
							support qk norm ( #3145 )  
						
						
						
						
					 
					
						2025-08-05 16:46:14 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						e51f018577 
					 
					
						
						
							
							support chunk_prefill in fa3  
						
						
						
						
					 
					
						2025-07-23 12:19:20 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						d306944f4f 
					 
					
						
						
							
							remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )  
						
						... 
						
						
						
						* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block 
						
						
					 
					
						2025-07-18 16:13:32 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						d49f8fb30a 
					 
					
						
						
							
							[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )  
						
						... 
						
						
						
						* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug 
						
						
					 
					
						2025-07-17 17:58:08 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						aa76085d1f 
					 
					
						
						
							
							[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) 
						
						
					 
					
						2025-07-16 20:10:57 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						b0f525955c 
					 
					
						
						
							
							[SOT] Remove breakgraph in post processing && fix datatype ( #2780 )  
						
						
						
						
					 
					
						2025-07-10 11:26:00 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						1f28bdf994 
					 
					
						
						
							
							dcu adapter ernie45t ( #2756 )  
						
						... 
						
						
						
						Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-07-09 18:56:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						240bdac2a4 
					 
					
						
						
							
							[feat] support fa3 backend for pd disaggregated ( #2695 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* delete use_fast_ffn 
						
						
					 
					
						2025-07-03 22:33:27 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						92c2cfa2e7 
					 
					
						
						
							
							Sync v2.0 version of code to github repo  
						
						
						
						
					 
					
						2025-06-29 23:29:37 +00:00 
						 
				 
			
				
					
						
							
							
								jiangjiajun 
							
						 
					 
					
						
						
							
						
						684703fd72 
					 
					
						
						
							
							[LLM] First commit the llm deployment code  
						
						
						
						
					 
					
						2025-06-09 19:20:15 +08:00