lizexu123 
							
						 
					 
					
						
						
							
						
						afff4d37ea 
					 
					
						
						
							
							[Feature] support seed parameter ( #3161 )  
						
						... 
						
						
						
						* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix 
						
						
					 
					
						2025-08-06 15:20:47 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						b01cfd6007 
					 
					
						
						
							
							[BugFix] support real batch_size ( #3109 )  
						
						... 
						
						
						
						* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix 
						
						
					 
					
						2025-08-05 16:33:54 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						72ef5a9c93 
					 
					
						
						
							
							[FIX]fix bad_words when sending requests consecutively ( #3197 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix bad_words
* fix log
* fix log 
						
						
					 
					
						2025-08-04 05:59:41 -07:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						db698bda01 
					 
					
						
						
							
							qwen loader ( #3057 )  
						
						
						
						
					 
					
						2025-07-30 19:09:38 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						74aa31d15b 
					 
					
						
						
							
							[Feature] support bad_words ( #3055 )  
						
						... 
						
						
						
						* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-07-30 09:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						73cfe1fd37 
					 
					
						
						
							
							[SOT]  Extend SOT warmup support to new hardware ( #3032 )  
						
						... 
						
						
						
						* add new hardware
* add_sot_warmup4new_hardware
* fix conflict
* rm Optional 
						
						
					 
					
						2025-07-29 22:45:20 +08:00 
						 
				 
			
				
					
						
							
							
								begin2023 
							
						 
					 
					
						
						
							
						
						dd877f38b1 
					 
					
						
						
							
							[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )  
						
						... 
						
						
						
						* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call 
						
						
					 
					
						2025-07-27 20:38:29 -07:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						6ccc10ad47 
					 
					
						
						
							
							Unify server-side and model-side Config (Part1) ( #3018 )  
						
						... 
						
						
						
						* move cache config
* fix mtp 
						
						
					 
					
						2025-07-28 10:51:52 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						8c167e130c 
					 
					
						
						
							
							[GCU] Update post_process ( #3012 )  
						
						
						
						
					 
					
						2025-07-25 11:03:03 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						c40df1802e 
					 
					
						
						
							
							[GCU] Update to develop ( #2988 )  
						
						
						
						
					 
					
						2025-07-24 19:30:52 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						3792345c3a 
					 
					
						
						
							
							[LLM] update function name ( #2985 )  
						
						... 
						
						
						
						* [LLM] update function name 
						
						
					 
					
						2025-07-24 15:03:40 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						29c3292f02 
					 
					
						
						
							
							support c4 attn && fix cache  
						
						
						
						
					 
					
						2025-07-24 12:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						9b22b8d2c3 
					 
					
						
						
							
							delete max-len ( #2959 )  
						
						
						
						
					 
					
						2025-07-23 15:11:39 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						89a485b69f 
					 
					
						
						
							
							[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )  
						
						... 
						
						
						
						* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-07-22 00:59:45 -07:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						101ad33332 
					 
					
						
						
							
							[BugFix] Fix Configs ( #2849 )  
						
						... 
						
						
						
						* fix config
* fix config 
						
						
					 
					
						2025-07-15 19:50:36 -07:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						0fad10b35a 
					 
					
						
						
							
							[Executor] CUDA Graph support padding batch ( #2844 )  
						
						... 
						
						
						
						* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug 
						
						
					 
					
						2025-07-15 19:49:01 -07:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						4c7b8bc458 
					 
					
						
						
							
							Simplify the Config code ( #2770 )  
						
						... 
						
						
						
						* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe 
						
						
					 
					
						2025-07-14 19:50:05 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						f6ad26fc08 
					 
					
						
						
							
							fix topp default value ( #2814 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-11 17:10:21 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						240d6236bc 
					 
					
						
						
							
							[Fix]fix top_k_top_p sampling ( #2801 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix topk-topp
* update
* add base_non_truncated 
						
						
					 
					
						2025-07-10 22:35:10 +08:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						59071268b6 
					 
					
						
						
							
							[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )  
						
						... 
						
						
						
						* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time 
						
						
					 
					
						2025-07-10 20:36:51 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						d33105baeb 
					 
					
						
						
							
							[Feature] Online Chat API Support Return logprobs ( #2777 )  
						
						... 
						
						
						
						* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform 
						
						
					 
					
						2025-07-10 16:33:40 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						e45050cae3 
					 
					
						
						
							
							[Feature] support top_k_top_p sampling ( #2753 )  
						
						... 
						
						
						
						* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix 
						
						
					 
					
						2025-07-09 20:58:58 -07:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						03a74995b8 
					 
					
						
						
							
							Clear dead code And supplementary notes ( #2757 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* 1.supplementary notes 2.delete dead code
* fix bug of forward meta
* Global modification of forward meta
* fix vl model_runner bug 
						
						
					 
					
						2025-07-09 16:17:34 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d0f4d6ba3a 
					 
					
						
						
							
							[GCU] Support gcu platform ( #2702 )  
						
						... 
						
						
						
						baseline: e7fa57ebaexing.wo@163.com > 
						
						
					 
					
						2025-07-08 13:00:52 +08:00