RAM 
							
						 
					 
					
						
						
							
						
						920df5be5a 
					 
					
						
						
							
							[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP  ( #4430 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Fix MTP dummy run bug
* Target Model and Draft Model using the same flag
* aovid moe bug in cudagraph padding
* In mtp replace use_cudagraph as step_use_cudagraph 
						
						
					 
					
						2025-10-17 14:22:05 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						c3499875bd 
					 
					
						
						
							
							[MTP]support mtp chunk_prefill_v1 ( #4365 )  
						
						... 
						
						
						
						* support mtp chunk_prefill_v1
* fix mtp chunkprefill output
* fix mtp chunkprefill output, fix unit test
* fix save_output
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com > 
						
						
					 
					
						2025-10-15 15:33:59 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						0b7a5778ab 
					 
					
						
						
							
							[Executor]CUDAGraph support Speculate Decode ( #4258 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Executor]CUDAGraph support Speculate Decode
* fix problem
* solve problem
* fix
* fast compile
* CUDAGraph + mtp support eb5(only target model)
* Revert "fast compile"
This reverts commit 3cfe8373edgstain5555@outlook.com >
Co-authored-by: gongshaotian <gstian5555@outlook.com > 
						
						
					 
					
						2025-10-13 15:21:41 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						f75697c2d1 
					 
					
						
						
							
							[Feature] support clear data ( #4185 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix
* fix
* fix
* [Feature] support clear data
* update
* fix
* fix
* fix
* fix 
						
						
					 
					
						2025-09-21 20:41:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						25aa2d94aa 
					 
					
						
						
							
							cp dynamic Cfp8  ( #4120 )  
						
						... 
						
						
						
						* supports dynamic Cfp8
* add unittest
* fix dynamic Cfp8 computing error
* fix Cfp8 for RL load
---------
Co-authored-by: carryyu <569782149@qq.com > 
						
						
					 
					
						2025-09-17 11:55:47 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						63d24b2210 
					 
					
						
						
							
							[Executor] Adjust signal sending order in RL training ( #3773 ) ( #4066 )  
						
						... 
						
						
						
						* Adjust processing order
* fix bug
* fix update_parameters bug
* refine code 
						
						
					 
					
						2025-09-11 15:41:32 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						2f473ba966 
					 
					
						
						
							
							[Feature][MTP]Support MTP for rl-model ( #4009 )  
						
						... 
						
						
						
						* qk norm for speculate decode C16
* support mtp in v1_scheduler mode
* support mtp rope_3d
* support mtp features
* add unit test && del some log
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: xiaoxiaohehe001 <hiteezsf@163.com > 
						
						
					 
					
						2025-09-10 13:34:37 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						d43549953c 
					 
					
						
						
							
							[Cherry-Pick][Bug Fix]fix the bug for real size 0 in cudagraph ( #3888 )  
						
						... 
						
						
						
						* fix the bug for real size 0 in cudagraph
* fix cache_messager
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 14:06:10 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						8d77c1cb51 
					 
					
						
						
							
							[Optimize] optimize prefix cache in release22 ( #3889 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* optimize prefix cache in release22
* optimize prefix cache in release22
* fix worker
* fix
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-09-06 09:52:01 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						41cd3e24c9 
					 
					
						
						
							
							[Feature] Enable prefix caching as default ( #3816 )  
						
						... 
						
						
						
						* [Feature] Enable prefix caching as default
* [Feature] Enable prefix caching as default
* Set prefix caching as default
* skip dynamic load
* fix kill bug
* fix kill bug
* fix kill bug
* fix ci
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-09-06 09:51:34 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						a0c03510c0 
					 
					
						
						
							
							[Bug fix] Fix prompt token ids dtype in v1 ( #3861 )  
						
						
						
						
					 
					
						2025-09-04 11:02:37 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						fb1e0d6a87 
					 
					
						
						
							
							[Feature] Set scheduler v1 as default ( #3812 )  
						
						... 
						
						
						
						* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default
* [Feature] Set scheduler v1 as default 
						
						
					 
					
						2025-09-04 11:02:10 +08:00 
						 
				 
			
				
					
						
							
							
								zhouchong 
							
						 
					 
					
						
						
							
						
						ccd52b5596 
					 
					
						
						
							
							[Model]support qwen2_5_vl ( #3557 )  
						
						... 
						
						
						
						* adapt qwen_2_5_vl model
* adapt qwen_2_5_vl VIT model
* adapt qwen2_5_vl images_embeds
* adapt qwen2_5_vl 3D rope
* adapt qwen2_5_vl 3D rope v2
* adapt qwen2_5_vl processor
* adapt qwen2_5_vl bypass resampler_model
* adapt qwen2_5_vl 绕过部分ernie逻辑
* adapt qwen2_5_vl 绕过部分ernie逻辑 v2
* adapt qwen2_5_vl 权重加载与命名修改
* adapt qwen2_5_vl 非必须think_end_id
* adapt qwen2_5_vl 区分多种模型的extract_vision_features
* fix:adapt qwen2_5_vl model
* adapt qwen2_5_vl norm
* adapt qwen2_5_vl  processor 更新
* adapt qwen2_5_vl image and video success
* adapt qwen2_5_vl 部分整理代码
* adapt qwen2_5_vl 支持多卡
* adapt qwen2_5_vl on latest develop
* adapt qwen2_5_vl RL
* adapt qwen2_5_vl 整理代码
* support noex rope3d
* adapt qwen2_5_vl add init.py
* adapt qwen2_5_vl add init.py v2
* adapt qwen2_5_vl remove space
* adapt qwen2_5_vl remove space v2
* adapt qwen2_5_vl pre-commit
* adapt qwen2_5_vl update
* adapt qwen2_5_vl pre-commit v2
* adapt qwen2_5_vl modify comments
* adapt qwen2_5_vl fix indentation
* adapt qwen2_5_vl fix indentation v2
---------
Co-authored-by: wangyafeng <wangyafeng@baidu.com >
Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com >
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com > 
						
						
					 
					
						2025-08-29 18:28:39 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						72094d4d82 
					 
					
						
						
							
							enable dcu ci ( #3402 )  
						
						
						
						
					 
					
						2025-08-29 10:23:08 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						e37e86b3b8 
					 
					
						
						
							
							[V1 Loader]support param create and load for wint2 and xpu backend ( #3581 )  
						
						... 
						
						
						
						* support wint2 backend'
* [V1 Loader]support param create and load for wint2 and xpu backend
* update weight shape name
* update
* update
* update baseline.txt
* update model name
* update baseline.txt
* fix codestyle
* remove debug coode 
						
						
					 
					
						2025-08-28 09:49:36 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						b28a0343a6 
					 
					
						
						
							
							fix ENABLE_V1_KVCACHE_SCHEDULER ( #3625 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 21:21:29 +08:00 
						 
				 
			
				
					
						
							
							
								李泳桦 
							
						 
					 
					
						
						
							
						
						b2afdf4fc6 
					 
					
						
						
							
							[fix] qwen output inconsistency when top_p=0 ( #3634 )  
						
						... 
						
						
						
						* [fix] qwen output inconsistency when top_p=0
* [fix] remove decode pre_id code 
						
						
					 
					
						2025-08-27 17:16:23 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						82e64b13e1 
					 
					
						
						
							
							[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )  
						
						... 
						
						
						
						* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 19:59:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						cbce94a00e 
					 
					
						
						
							
							rename ernie_xxx to ernie4_5_xxx ( #3621 )  
						
						... 
						
						
						
						* rename ernie_xxx to ernie4_5_xxx
* ci fix 
						
						
					 
					
						2025-08-26 19:29:27 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						d339df2e90 
					 
					
						
						
							
							Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )  
						
						... 
						
						
						
						* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log 
						
						
					 
					
						2025-08-26 00:04:01 -07:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						c68c3c4b8b 
					 
					
						
						
							
							[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )  
						
						... 
						
						
						
						* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix 
						
						
					 
					
						2025-08-25 20:14:51 -07:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						2fa173e327 
					 
					
						
						
							
							[Executor] CUDAGraph support RL training ( #3265 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add clear graph opt backend
* cuda graph support rl
* add branch
* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM
* open test case
* fix typo
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
* update test case
* use unittest in graph test case 
						
						
					 
					
						2025-08-25 20:59:30 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						9cab3f47ff 
					 
					
						
						
							
							[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )  
						
						... 
						
						
						
						* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com > 
						
						
					 
					
						2025-08-25 14:11:49 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						a053ab889b 
					 
					
						
						
							
							[BugFix] fix num_running_requests in cuda_graph ( #3457 )  
						
						... 
						
						
						
						* fix cuda_grpah
* add note
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-08-19 10:47:22 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						32b39620bc 
					 
					
						
						
							
							[Code Simplification] remove cum_offsets ( #3410 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-18 20:21:25 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						ea4a3b479c 
					 
					
						
						
							
							[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )  
						
						... 
						
						
						
						* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-08-18 16:14:09 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						f0f00a6025 
					 
					
						
						
							
							[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700 
						
						
					 
					
						2025-08-14 22:40:44 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						7b596d0877 
					 
					
						
						
							
							[BugFix] fix real_bsz in ep ( #3366 )  
						
						... 
						
						
						
						* Your commit message here
* fix ep
* delete cuda_graph 
						
						
					 
					
						2025-08-14 17:31:19 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						c56c99837a 
					 
					
						
						
							
							Revert "[BugFix] num_seqs ( #3291 )" ( #3316 )  
						
						... 
						
						
						
						This reverts commit e0aeac58e1 
						
						
					 
					
						2025-08-11 16:16:51 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						9571c458f0 
					 
					
						
						
							
							enhance eos_tokens ( #3274 )  
						
						... 
						
						
						
						* enhance eos_tokens
* update
* update 
						
						
					 
					
						2025-08-11 14:47:52 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						e0aeac58e1 
					 
					
						
						
							
							[BugFix] num_seqs ( #3291 )  
						
						... 
						
						
						
						* fix num_seqs
* merge develop 
						
						
					 
					
						2025-08-11 13:38:55 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						c011cb8b16 
					 
					
						
						
							
							[Bug Fix] Fix scheduler bug in develop ( #3292 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Fix scheduler bug in develop
* Fix scheduler bug in develop
* Fix scheduler bug in develop 
						
						
					 
					
						2025-08-10 13:55:38 +08:00 
						 
				 
			
				
					
						
							
							
								yzwu 
							
						 
					 
					
						
						
							
						
						fbdd6b0663 
					 
					
						
						
							
							[Iluvatar GPU] Optimze attention and moe performance ( #3234 )  
						
						
						
						
					 
					
						2025-08-08 10:51:24 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						afff4d37ea 
					 
					
						
						
							
							[Feature] support seed parameter ( #3161 )  
						
						... 
						
						
						
						* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix 
						
						
					 
					
						2025-08-06 15:20:47 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						b01cfd6007 
					 
					
						
						
							
							[BugFix] support real batch_size ( #3109 )  
						
						... 
						
						
						
						* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix 
						
						
					 
					
						2025-08-05 16:33:54 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						72ef5a9c93 
					 
					
						
						
							
							[FIX]fix bad_words when sending requests consecutively ( #3197 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* fix bad_words
* fix log
* fix log 
						
						
					 
					
						2025-08-04 05:59:41 -07:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						01d7586661 
					 
					
						
						
							
							[Bug fix] Fix cudagraph when use ep. ( #3130 )  
						
						... 
						
						
						
						* fix cudagraph when use ep
* fix typo
* reduce full length to adapt large bsz such 128/256 
						
						
					 
					
						2025-08-04 18:06:18 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						db698bda01 
					 
					
						
						
							
							qwen loader ( #3057 )  
						
						
						
						
					 
					
						2025-07-30 19:09:38 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						5acde4eb43 
					 
					
						
						
							
							[Feature] Multimodal Scheduler V1 ( #3019 )  
						
						... 
						
						
						
						* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code 
						
						
					 
					
						2025-07-30 16:05:55 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						74aa31d15b 
					 
					
						
						
							
							[Feature] support bad_words ( #3055 )  
						
						... 
						
						
						
						* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-07-30 09:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						b2f9a42d87 
					 
					
						
						
							
							[Feature] Support repetition early stop ( #3024 )  
						
						... 
						
						
						
						* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-07-29 22:42:54 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						502ee92a0a 
					 
					
						
						
							
							Unify server-side and model-side Config (Part3)  ( #3047 )  
						
						... 
						
						
						
						* merge model config
* fix arch
* fix rl 
						
						
					 
					
						2025-07-29 17:07:44 +08:00 
						 
				 
			
				
					
						
							
							
								JYChen 
							
						 
					 
					
						
						
							
						
						dafe02a7b9 
					 
					
						
						
							
							[stop sequence] support stop sequence ( #3025 )  
						
						... 
						
						
						
						* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0 
						
						
					 
					
						2025-07-29 14:17:37 +08:00 
						 
				 
			
				
					
						
							
							
								begin2023 
							
						 
					 
					
						
						
							
						
						dd877f38b1 
					 
					
						
						
							
							[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )  
						
						... 
						
						
						
						* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call 
						
						
					 
					
						2025-07-27 20:38:29 -07:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						6ccc10ad47 
					 
					
						
						
							
							Unify server-side and model-side Config (Part1) ( #3018 )  
						
						... 
						
						
						
						* move cache config
* fix mtp 
						
						
					 
					
						2025-07-28 10:51:52 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						0700c90caa 
					 
					
						
						
							
							[Feat] support mixed ep ( #2969 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict 
						
						
					 
					
						2025-07-25 15:29:30 +08:00 
						 
				 
			
				
					
						
							
							
								ltd0924 
							
						 
					 
					
						
						
							
						
						3792345c3a 
					 
					
						
						
							
							[LLM] update function name ( #2985 )  
						
						... 
						
						
						
						* [LLM] update function name 
						
						
					 
					
						2025-07-24 15:03:40 +08:00 
						 
				 
			
				
					
						
							
							
								lizhenyun01 
							
						 
					 
					
						
						
							
						
						29c3292f02 
					 
					
						
						
							
							support c4 attn && fix cache  
						
						
						
						
					 
					
						2025-07-24 12:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						85a78d695d 
					 
					
						
						
							
							[Feature] Support block scheduler v1 for FD ( #2928 )  
						
						... 
						
						
						
						* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-23 20:31:31 +08:00