AIbin 
							
						 
					 
					
						
						
							
						
						a7392a0ff9 
					 
					
						
						
							
							【Inference Optimize】DeepSeek-V3-model MLA Optimize ( #3886 )  
						
						... 
						
						
						
						* support MLA chunk_size auto search & cuda_graph 
						
						
					 
					
						2025-09-11 10:46:09 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						3d0aaa5923 
					 
					
						
						
							
							[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )  
						
						... 
						
						
						
						* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-09-08 13:12:24 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						72094d4d82 
					 
					
						
						
							
							enable dcu ci ( #3402 )  
						
						
						
						
					 
					
						2025-08-29 10:23:08 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						4957908275 
					 
					
						
						
							
							add input_processor plugin ( #3657 )  
						
						... 
						
						
						
						* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update 
						
						
					 
					
						2025-08-28 22:53:57 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						ea4a3b479c 
					 
					
						
						
							
							[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )  
						
						... 
						
						
						
						* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-08-18 16:14:09 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						b4fef2cf29 
					 
					
						
						
							
							[MetaxGPU] Support FastDeploy on metax gpu  ( #3241 )  
						
						... 
						
						
						
						* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-08-13 11:11:54 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						2f74e93d7e 
					 
					
						
						
							
							use dist.all_reduce(min) to sync num_blocks_local ( #2933 )  
						
						... 
						
						
						
						* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg 
						
						
					 
					
						2025-07-21 01:23:36 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						8c5407d9e4 
					 
					
						
						
							
							remove cum_offsets from ForwardMeta ( #2925 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-19 23:57:27 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						0fad10b35a 
					 
					
						
						
							
							[Executor] CUDA Graph support padding batch ( #2844 )  
						
						... 
						
						
						
						* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug 
						
						
					 
					
						2025-07-15 19:49:01 -07:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						59071268b6 
					 
					
						
						
							
							[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )  
						
						... 
						
						
						
						* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time 
						
						
					 
					
						2025-07-10 20:36:51 +08:00