chen 
							
						 
					 
					
						
						
							
						
						ce9c0917c5 
					 
					
						
						
							
							[Precision] Support lm_head layer running in float32 ( #3597 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc 
						
						
					 
					
						2025-08-27 11:34:53 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						ad319a87cc 
					 
					
						
						
							
							support fa3 rope3d ( #3622 )  
						
						
						
						
					 
					
						2025-08-27 11:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								yangjianfengo1 
							
						 
					 
					
						
						
							
						
						646a0c2fd8 
					 
					
						
						
							
							[Feature] block sparse attention ( #3209 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config 
						
						
					 
					
						2025-08-26 07:16:04 -07:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						82e64b13e1 
					 
					
						
						
							
							[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )  
						
						... 
						
						
						
						* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 19:59:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						cbce94a00e 
					 
					
						
						
							
							rename ernie_xxx to ernie4_5_xxx ( #3621 )  
						
						... 
						
						
						
						* rename ernie_xxx to ernie4_5_xxx
* ci fix 
						
						
					 
					
						2025-08-26 19:29:27 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						642480f5f6 
					 
					
						
						
							
							[CI] Standard unittest ( #3606 )  
						
						... 
						
						
						
						* standard unittest
* fix bugs
* fix script 
						
						
					 
					
						2025-08-26 19:03:11 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						3200a80de3 
					 
					
						
						
							
							[v1 loader]support fp8 ( #3593 )  
						
						... 
						
						
						
						* support fp8
* update ci 
						
						
					 
					
						2025-08-26 02:42:46 -07:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						00898603c8 
					 
					
						
						
							
							[CUDAGraph]Add debug func ( #3616 )  
						
						... 
						
						
						
						* add print dot files
* refine code 
						
						
					 
					
						2025-08-26 16:43:48 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						9afa236e39 
					 
					
						
						
							
							[NewFeatures] support eplb ( #3547 )  
						
						... 
						
						
						
						* [NewFeatures] support eplb
* fix eplb 
						
						
					 
					
						2025-08-26 16:19:30 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						56e2d7e668 
					 
					
						
						
							
							adaptive rms_norm's dtype ( #3617 )  
						
						... 
						
						
						
						* adaptive rms_norm's dtype
* adaptive rms_norm's dtype
* add approve coverage
---------
Co-authored-by: liuyuanle <liuyuanle@baidu.com > 
						
						
					 
					
						2025-08-26 15:29:15 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						d339df2e90 
					 
					
						
						
							
							Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )  
						
						... 
						
						
						
						* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log 
						
						
					 
					
						2025-08-26 00:04:01 -07:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						52eda7fdb3 
					 
					
						
						
							
							[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram  ( #3610 )  
						
						
						
						
					 
					
						2025-08-26 14:29:22 +08:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						0a0d2959b9 
					 
					
						
						
							
							qkv_a_proj horizontal fusion ( #3591 )  
						
						... 
						
						
						
						Support DSK qkv_a_proj horizontal fusion under V0 Loder 
						
						
					 
					
						2025-08-26 14:25:57 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						c43a4bec00 
					 
					
						
						
							
							[Features] support hugging face qwen3 dense and qwen2 model  ( #3574 )  
						
						... 
						
						
						
						* support qwen2 and qwen3 hugging face
* fix moe
* defualt_v1 loader
* hugging_face_format deprecated
* modify hugging_face_foramt to model_format
* model_format auto
* fix environemt
* fix bug
* fix qwen3-0.6 bug
* model_format is str
* fix 
						
						
					 
					
						2025-08-26 10:54:53 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						2fa173e327 
					 
					
						
						
							
							[Executor] CUDAGraph support RL training ( #3265 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add clear graph opt backend
* cuda graph support rl
* add branch
* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM
* open test case
* fix typo
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
* update test case
* use unittest in graph test case 
						
						
					 
					
						2025-08-25 20:59:30 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						2ae7ab28d2 
					 
					
						
						
							
							[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )  
						
						
						
						
					 
					
						2025-08-25 17:44:20 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						9cab3f47ff 
					 
					
						
						
							
							[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )  
						
						... 
						
						
						
						* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com > 
						
						
					 
					
						2025-08-25 14:11:49 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						9205c88da1 
					 
					
						
						
							
							support w4afp8 EP inference ( #3044 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-25 11:27:45 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						bdbac0aa3d 
					 
					
						
						
							
							support qwen2 weight only ( #3571 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-24 11:14:34 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						77514e3e1e 
					 
					
						
						
							
							[V1 Loader] support weight_only ( #3413 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support wint4/wint8
* delete smoe case
* update ci
* print log 
						
						
					 
					
						2025-08-23 13:13:41 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						79f0dbbb55 
					 
					
						
						
							
							[V1 Loader] Support qwen2(bf16) ( #3502 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support qwen2(bf16)
* merge bias_loader and weight_loader 
						
						
					 
					
						2025-08-23 01:08:23 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						85fbf5455a 
					 
					
						
						
							
							[V1 Loader]Ernie VL support loader v1 ( #3494 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* ernie vl support new loader
* add unittest
* fix test 
						
						
					 
					
						2025-08-22 11:16:57 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						c389a4013c 
					 
					
						
						
							
							Unify server-side and model-side Config(Part-5) ( #3497 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* move config
* fix xpu
* fix
* fix vl
* fix vl
* fix unitest
* fix args
* add unitest
* fix test 
						
						
					 
					
						2025-08-21 19:00:21 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						30b3f2dc07 
					 
					
						
						
							
							[BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 ( #3486 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-20 05:52:54 -07:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						bcdfc1d6b9 
					 
					
						
						
							
							Add custom op declaration for all_reduce ( #3473 )  
						
						... 
						
						
						
						* add custom op declaration
* roll back try except 
						
						
					 
					
						2025-08-20 20:29:58 +08:00 
						 
				 
			
				
					
						
							
							
								kevin 
							
						 
					 
					
						
						
							
						
						67298cf4c0 
					 
					
						
						
							
							add error traceback info ( #3419 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 19:32:04 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						fef447e350 
					 
					
						
						
							
							[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend ( #3447 )  
						
						... 
						
						
						
						* support deepgemm backend
* support marlin backend
* remove print
* fix process_prequanted_weights 
						
						
					 
					
						2025-08-19 14:15:53 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						8b12c80f90 
					 
					
						
						
							
							[FixBug] compute early stopping with real batch size ( #3418 )  
						
						... 
						
						
						
						* [FixBug] compute early stopping with real batch size
* update
* fix test_sampler 
						
						
					 
					
						2025-08-18 22:09:21 -07:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						beec24fd89 
					 
					
						
						
							
							【Inference Optimize】DeepSeek-v3 model inference performance optimization ( #3455 )  
						
						... 
						
						
						
						* DSK_OPT_01
* update FA3 
						
						
					 
					
						2025-08-19 10:42:42 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						32b39620bc 
					 
					
						
						
							
							[Code Simplification] remove cum_offsets ( #3410 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-18 20:21:25 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						70ee910cd5 
					 
					
						
						
							
							[Excutor] Change cudagraph hashkey from batch size to num_tokens  ( #3454 )  
						
						
						
						
					 
					
						2025-08-18 16:16:48 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						ea4a3b479c 
					 
					
						
						
							
							[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )  
						
						... 
						
						
						
						* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-08-18 16:14:09 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						5585cf7aa5 
					 
					
						
						
							
							fix mtp_rej_topp input ( #3450 )  
						
						
						
						
					 
					
						2025-08-18 16:12:42 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						e88f5552db 
					 
					
						
						
							
							fix cpu __ini__.py ( #3448 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-17 12:38:54 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						f0f00a6025 
					 
					
						
						
							
							[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700 
						
						
					 
					
						2025-08-14 22:40:44 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						09c979f3dd 
					 
					
						
						
							
							[V1 Loader] Support Ernie text(moe and dense) ( #3110 )  
						
						... 
						
						
						
						* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code 
						
						
					 
					
						2025-08-14 20:25:28 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						1e06b9fa6d 
					 
					
						
						
							
							make append_attn supports mask_offset ( #3138 )  
						
						... 
						
						
						
						* make append_attn supports mask_offset
* add unittest 
						
						
					 
					
						2025-08-14 03:40:55 -07:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						7b596d0877 
					 
					
						
						
							
							[BugFix] fix real_bsz in ep ( #3366 )  
						
						... 
						
						
						
						* Your commit message here
* fix ep
* delete cuda_graph 
						
						
					 
					
						2025-08-14 17:31:19 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						666ab65a51 
					 
					
						
						
							
							[Polish Code] Remove useless notes  
						
						
						
						
					 
					
						2025-08-14 14:04:52 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						be94bdd0b0 
					 
					
						
						
							
							[Loader V1] modify layername for DeepSeekV3 ( #3336 )  
						
						... 
						
						
						
						Co-authored-by: Yuanle Liu <yuanlehome@163.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com > 
						
						
					 
					
						2025-08-13 15:47:06 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d1a92e3e17 
					 
					
						
						
							
							[GCU] Enable gcu CI ( #3190 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [GCU] Update to the latest version
* [GCU] Enable CI 
						
						
					 
					
						2025-08-13 11:48:24 +08:00 
						 
				 
			
				
					
						
							
							
								yzwu 
							
						 
					 
					
						
						
							
						
						ce9180241e 
					 
					
						
						
							
							[Iluvatar GPU] Modify the names of some variables ( #3273 )  
						
						
						
						
					 
					
						2025-08-13 11:38:02 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						b4fef2cf29 
					 
					
						
						
							
							[MetaxGPU] Support FastDeploy on metax gpu  ( #3241 )  
						
						... 
						
						
						
						* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-08-13 11:11:54 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						283da92bfa 
					 
					
						
						
							
							fix ep lm head ( #3244 )  
						
						... 
						
						
						
						Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com > 
						
						
					 
					
						2025-08-12 15:38:28 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						c56c99837a 
					 
					
						
						
							
							Revert "[BugFix] num_seqs ( #3291 )" ( #3316 )  
						
						... 
						
						
						
						This reverts commit e0aeac58e1 
						
						
					 
					
						2025-08-11 16:16:51 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						9571c458f0 
					 
					
						
						
							
							enhance eos_tokens ( #3274 )  
						
						... 
						
						
						
						* enhance eos_tokens
* update
* update 
						
						
					 
					
						2025-08-11 14:47:52 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						42af0b4b64 
					 
					
						
						
							
							[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )  
						
						... 
						
						
						
						* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr 
						
						
					 
					
						2025-08-11 13:39:28 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						e0aeac58e1 
					 
					
						
						
							
							[BugFix] num_seqs ( #3291 )  
						
						... 
						
						
						
						* fix num_seqs
* merge develop 
						
						
					 
					
						2025-08-11 13:38:55 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						a799d14df1 
					 
					
						
						
							
							[Bugfix] Fix model accuracy in some ops ( #3231 )  
						
						... 
						
						
						
						* fix noaux_tc op
* fix
* update
* fix qk norm
* fix linear for prequant loader
* test
* fix
* fix
* rm some print
* fix noaux_tc op
* test
* Fix the confused enable_early_stop when only set early_stop_config (#3214 )
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
* Add ci case for min token and max token (#3229 )
Co-authored-by: xujing43 <xujing43@baidu.com >
* add some evil cases (#3240 )
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* qwen3_moe (#3084 )
* [Feature] support seed parameter (#3161 )
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
* 【Fix Bug】 修复 fa3 支持集中式bug (#3235 )
* fix fa3 集中式bug
* 增加qknorm参数
* fix qk norm
* fix
* update
* fix linear for prequant loader
* fix
* fix
* rm some print
* fix
* fix moe init weight&scale
* fix moe init weight&scale
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: Zero Rains <linjunlu@zerorains.top >
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com >
Co-authored-by: xujing43 <xujing43@baidu.com >
Co-authored-by: Divano <dddivano@outlook.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com >
Co-authored-by: qingqing01 <dangqingqing@baidu.com > 
						
						
					 
					
						2025-08-08 17:30:37 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						ce1f353c70 
					 
					
						
						
							
							Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend ( #3148 )  
						
						... 
						
						
						
						* w4a8 bug
* fix w4a8 bug
* remove code
* modify the triton backend
* fix ep
* fix the bug with tensor_wise_fp8 in triton backend
* fix the RL
* fix bug by merge
* fix the bug in w4a8
* fix the tensor_wise_fp8 bug
* fix RL 
						
						
					 
					
						2025-08-08 15:55:47 +08:00