yzwu 
							
						 
					 
					
						
						
							
						
						fbdd6b0663 
					 
					
						
						
							
							[Iluvatar GPU] Optimze attention and moe performance ( #3234 )  
						
						
						
						
					 
					
						2025-08-08 10:51:24 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						afff4d37ea 
					 
					
						
						
							
							[Feature] support seed parameter ( #3161 )  
						
						... 
						
						
						
						* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix 
						
						
					 
					
						2025-08-06 15:20:47 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						b01cfd6007 
					 
					
						
						
							
							[BugFix] support real batch_size ( #3109 )  
						
						... 
						
						
						
						* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix 
						
						
					 
					
						2025-08-05 16:33:54 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						fe540f6caa 
					 
					
						
						
							
							[plugin] Custom model_runner/model support ( #3186 )  
						
						... 
						
						
						
						* support custom model&&model_runner
* fix merge
* add test && update doc
* fix codestyle
* fix unittest
* load model in rl 
						
						
					 
					
						2025-08-04 18:52:39 -07:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						72ef5a9c93 
					 
					
						
						
							
							[FIX]fix bad_words when sending requests consecutively ( #3197 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix bad_words
* fix log
* fix log 
						
						
					 
					
						2025-08-04 05:59:41 -07:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						01d7586661 
					 
					
						
						
							
							[Bug fix] Fix cudagraph when use ep. ( #3130 )  
						
						... 
						
						
						
						* fix cudagraph when use ep
* fix typo
* reduce full length to adapt large bsz such 128/256 
						
						
					 
					
						2025-08-04 18:06:18 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						4021d66ea5 
					 
					
						
						
							
							【Feature】add fd plugins && rm model_classes ( #3123 )  
						
						... 
						
						
						
						* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci 
						
						
					 
					
						2025-08-03 19:53:20 -07:00 
						 
				 
			
				
					
						
							
							
								yinwei 
							
						 
					 
					
						
						
							
						
						3a4db15765 
					 
					
						
						
							
							Fix out-of-memory issue during single-XPU deployment ( #3133 )  
						
						
						
						
					 
					
						2025-08-01 17:12:03 +08:00 
						 
				 
			
				
					
						
							
							
								SunLei 
							
						 
					 
					
						
						
							
						
						dade19d7a4 
					 
					
						
						
							
							[Feature] General support for logprobs ( #2974 )  
						
						... 
						
						
						
						* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-31 20:25:56 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						7dfdd157ac 
					 
					
						
						
							
							[BugFix]Fix ep size ( #3092 )  
						
						... 
						
						
						
						* fix ep
* fix num_layer 
						
						
					 
					
						2025-07-30 21:03:12 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						d17886de19 
					 
					
						
						
							
							[Feature] support ep in mixed mode ( #3001 )  
						
						... 
						
						
						
						* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files 
						
						
					 
					
						2025-07-30 20:43:39 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						db698bda01 
					 
					
						
						
							
							qwen loader ( #3057 )  
						
						
						
						
					 
					
						2025-07-30 19:09:38 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						5acde4eb43 
					 
					
						
						
							
							[Feature] Multimodal Scheduler V1 ( #3019 )  
						
						... 
						
						
						
						* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code 
						
						
					 
					
						2025-07-30 16:05:55 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						74aa31d15b 
					 
					
						
						
							
							[Feature] support bad_words ( #3055 )  
						
						... 
						
						
						
						* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-07-30 09:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						73cfe1fd37 
					 
					
						
						
							
							[SOT]  Extend SOT warmup support to new hardware ( #3032 )  
						
						... 
						
						
						
						* add new hardware
* add_sot_warmup4new_hardware
* fix conflict
* rm Optional 
						
						
					 
					
						2025-07-29 22:45:20 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						b2f9a42d87 
					 
					
						
						
							
							[Feature] Support repetition early stop ( #3024 )  
						
						... 
						
						
						
						* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-07-29 22:42:54 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						502ee92a0a 
					 
					
						
						
							
							Unify server-side and model-side Config (Part3)  ( #3047 )  
						
						... 
						
						
						
						* merge model config
* fix arch
* fix rl 
						
						
					 
					
						2025-07-29 17:07:44 +08:00 
						 
				 
			
				
					
						
							
							
								JYChen 
							
						 
					 
					
						
						
							
						
						dafe02a7b9 
					 
					
						
						
							
							[stop sequence] support stop sequence ( #3025 )  
						
						... 
						
						
						
						* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0 
						
						
					 
					
						2025-07-29 14:17:37 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						1a815b7a2a 
					 
					
						
						
							
							Fix Speculative Config bug ( #3049 )  
						
						... 
						
						
						
						* fix speculative bug
* fix rl 
						
						
					 
					
						2025-07-29 10:50:48 +08:00 
						 
				 
			
				
					
						
							
							
								yinwei 
							
						 
					 
					
						
						
							
						
						f2a528f9ae 
					 
					
						
						
							
							[XPU] Support kvblock centralized management ( #3017 )  
						
						
						
						
					 
					
						2025-07-29 10:40:55 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						bddf403576 
					 
					
						
						
							
							Unify server-side and model-side Config (Part2) ( #3035 )  
						
						... 
						
						
						
						* merge speculative and graph opt conifg
* add attr 
						
						
					 
					
						2025-07-28 15:31:48 +08:00 
						 
				 
			
				
					
						
							
							
								begin2023 
							
						 
					 
					
						
						
							
						
						dd877f38b1 
					 
					
						
						
							
							[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )  
						
						... 
						
						
						
						* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call 
						
						
					 
					
						2025-07-27 20:38:29 -07:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						6ccc10ad47 
					 
					
						
						
							
							Unify server-side and model-side Config (Part1) ( #3018 )  
						
						... 
						
						
						
						* move cache config
* fix mtp 
						
						
					 
					
						2025-07-28 10:51:52 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						0700c90caa 
					 
					
						
						
							
							[Feat] support mixed ep ( #2969 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict 
						
						
					 
					
						2025-07-25 15:29:30 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						8c167e130c 
					 
					
						
						
							
							[GCU] Update post_process ( #3012 )  
						
						
						
						
					 
					
						2025-07-25 11:03:03 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						2970b00dfa 
					 
					
						
						
							
							[Feature] Support_eplb ( #2997 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep 
						
						
					 
					
						2025-07-24 20:22:45 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						c40df1802e 
					 
					
						
						
							
							[GCU] Update to develop ( #2988 )  
						
						
						
						
					 
					
						2025-07-24 19:30:52 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						f935d6f862 
					 
					
						
						
							
							[BugFix] fix multinode deployment ( #2977 )  
						
						
						
						
					 
					
						2025-07-24 15:04:04 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						3792345c3a 
					 
					
						
						
							
							[LLM] update function name ( #2985 )  
						
						... 
						
						
						
						* [LLM] update function name 
						
						
					 
					
						2025-07-24 15:03:40 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						29c3292f02 
					 
					
						
						
							
							support c4 attn && fix cache  
						
						
						
						
					 
					
						2025-07-24 12:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						832d25334a 
					 
					
						
						
							
							[Code Simplification] fix init_distributed_environment() ( #2982 )  
						
						
						
						
					 
					
						2025-07-24 11:43:28 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						85a78d695d 
					 
					
						
						
							
							[Feature] Support block scheduler v1 for FD ( #2928 )  
						
						... 
						
						
						
						* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-23 20:31:31 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						ca0f71bd39 
					 
					
						
						
							
							polish code for prefill restrictions ( #2991 )  
						
						
						
						
					 
					
						2025-07-23 05:10:14 -07:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						850c9d98d4 
					 
					
						
						
							
							[BugFix] Add prefill restrictions for chunked_prefill+VL ( #2983 )  
						
						
						
						
					 
					
						2025-07-23 01:45:57 -07:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						9b22b8d2c3 
					 
					
						
						
							
							delete max-len ( #2959 )  
						
						
						
						
					 
					
						2025-07-23 15:11:39 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						95b5af24db 
					 
					
						
						
							
							[SOT] Add sot warmup (NVIDIA GPU Only) ( #2929 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add sot warmup
* fix code style
* change batch_size list
* add param to config
* rm free_list settings && set sot_warmup_sizes
* finish debug with dynamic dims by type annotations
* add profile_run guard
* rm sth useless 
						
						
					 
					
						2025-07-22 21:36:14 +08:00 
						 
				 
			
				
					
						
							
							
								GoldPancake 
							
						 
					 
					
						
						
							
						
						dc67c10a7e 
					 
					
						
						
							
							[Feature][MTP]Support multi-step MTP ( #2952 )  
						
						
						
						
					 
					
						2025-07-22 16:26:29 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						89a485b69f 
					 
					
						
						
							
							[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )  
						
						... 
						
						
						
						* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-07-22 00:59:45 -07:00 
						 
				 
			
				
					
						
							
							
								Nyakku Shigure 
							
						 
					 
					
						
						
							
						
						48e6a0ca26 
					 
					
						
						
							
							[SOT] Mark dynamic dims by type annotations ( #2771 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta" 
						
						
					 
					
						2025-07-22 00:23:52 -07:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						0262ef7eb3 
					 
					
						
						
							
							custom all reduce support cuda graph ( #2938 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication 
						
						
					 
					
						2025-07-21 22:52:03 +08:00 
						 
				 
			
				
					
						
							
							
								littledgg 
							
						 
					 
					
						
						
							
						
						2845bde964 
					 
					
						
						
							
							[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph  ( #2936 )  
						
						... 
						
						
						
						* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph
* Fix: Apply black formatting 
						
						
					 
					
						2025-07-21 16:25:51 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						2f74e93d7e 
					 
					
						
						
							
							use dist.all_reduce(min) to sync num_blocks_local ( #2933 )  
						
						... 
						
						
						
						* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg 
						
						
					 
					
						2025-07-21 01:23:36 -07:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						67990e0572 
					 
					
						
						
							
							[Feature] support min_p_sampling ( #2872 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py 
						
						
					 
					
						2025-07-20 23:17:59 -07:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						8c5407d9e4 
					 
					
						
						
							
							remove cum_offsets from ForwardMeta ( #2925 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-19 23:57:27 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						5328daa333 
					 
					
						
						
							
							[Bug Fix] fix ep config bug ( #2920 )  
						
						
						
						
					 
					
						2025-07-18 19:12:56 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						6efad14b95 
					 
					
						
						
							
							support vl ori_vacab_size ( #2900 )  
						
						
						
						
					 
					
						2025-07-18 16:26:14 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						1339e56282 
					 
					
						
						
							
							[XPU] Remove padding_offsets from get_padding_offset.cu ( #2911 )  
						
						
						
						
					 
					
						2025-07-18 14:16:44 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						0eb5dc18d3 
					 
					
						
						
							
							[BugFix]Fix sample rejection ( #2908 )  
						
						... 
						
						
						
						* fix config
* fix rejection 
						
						
					 
					
						2025-07-18 13:44:30 +08:00