RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						2f74e93d7e 
					 
					
						
						
							
							use dist.all_reduce(min) to sync num_blocks_local ( #2933 )  
						
						... 
						
						
						
						* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg 
						
						
					 
					
						2025-07-21 01:23:36 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						8c5407d9e4 
					 
					
						
						
							
							remove cum_offsets from ForwardMeta ( #2925 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-19 23:57:27 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						0fad10b35a 
					 
					
						
						
							
							[Executor] CUDA Graph support padding batch ( #2844 )  
						
						... 
						
						
						
						* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug 
						
						
					 
					
						2025-07-15 19:49:01 -07:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						59071268b6 
					 
					
						
						
							
							[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )  
						
						... 
						
						
						
						* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time 
						
						
					 
					
						2025-07-10 20:36:51 +08:00