bukejiyu 
							
						 
					 
					
						
						
							
						
						29ed617f0f 
					 
					
						
						
							
							[v1 loader]qwen Offline fp8 ( #4036 )  
						
						... 
						
						
						
						* support offline fp8
* update ut
* update ut
* update ut
* fix
* update
* update 
						
						
					 
					
						2025-09-15 13:44:11 +08:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						41aee08982 
					 
					
						
						
							
							【Inference Optimize】Update MergedReplicatedLinear for DSK qkv_a_proj_with_mqa. ( #3673 )  
						
						... 
						
						
						
						* support MergedReplicatedLinear
* update MergedReplicatedLinear to support DSK_wint4 V1_load
* update model name
* update linear class
* fix
* fix v0 moe_bias load
---------
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com > 
						
						
					 
					
						2025-09-04 21:16:05 -07:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						b6a4115369 
					 
					
						
						
							
							[v1loader]Reduce EB300B model loading time ( #3700 )  
						
						... 
						
						
						
						* speed up eb45
* update 
						
						
					 
					
						2025-09-02 19:13:57 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						455205f991 
					 
					
						
						
							
							[Features] support hugging face qwen3 moe ( #3649 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* split ut
* qwen3-30B-A3B
* fix
* add test
* add test_torch_model.py
* fix test_torch_model.py
* delete print
* fix moe
* delete init.py
* fix
* fix
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com > 
						
						
					 
					
						2025-08-30 15:26:05 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						0b51b9c35b 
					 
					
						
						
							
							fix qwen3 235B tp 8 ( #3697 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-28 23:46:25 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						4957908275 
					 
					
						
						
							
							add input_processor plugin ( #3657 )  
						
						... 
						
						
						
						* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update 
						
						
					 
					
						2025-08-28 22:53:57 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						d339df2e90 
					 
					
						
						
							
							Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )  
						
						... 
						
						
						
						* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log 
						
						
					 
					
						2025-08-26 00:04:01 -07:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						0a0d2959b9 
					 
					
						
						
							
							qkv_a_proj horizontal fusion ( #3591 )  
						
						... 
						
						
						
						Support DSK qkv_a_proj horizontal fusion under V0 Loder 
						
						
					 
					
						2025-08-26 14:25:57 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						c43a4bec00 
					 
					
						
						
							
							[Features] support hugging face qwen3 dense and qwen2 model  ( #3574 )  
						
						... 
						
						
						
						* support qwen2 and qwen3 hugging face
* fix moe
* defualt_v1 loader
* hugging_face_format deprecated
* modify hugging_face_foramt to model_format
* model_format auto
* fix environemt
* fix bug
* fix qwen3-0.6 bug
* model_format is str
* fix 
						
						
					 
					
						2025-08-26 10:54:53 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						bdbac0aa3d 
					 
					
						
						
							
							support qwen2 weight only ( #3571 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-24 11:14:34 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						77514e3e1e 
					 
					
						
						
							
							[V1 Loader] support weight_only ( #3413 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support wint4/wint8
* delete smoe case
* update ci
* print log 
						
						
					 
					
						2025-08-23 13:13:41 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						79f0dbbb55 
					 
					
						
						
							
							[V1 Loader] Support qwen2(bf16) ( #3502 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support qwen2(bf16)
* merge bias_loader and weight_loader 
						
						
					 
					
						2025-08-23 01:08:23 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						09c979f3dd 
					 
					
						
						
							
							[V1 Loader] Support Ernie text(moe and dense) ( #3110 )  
						
						... 
						
						
						
						* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code 
						
						
					 
					
						2025-08-14 20:25:28 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						b4fef2cf29 
					 
					
						
						
							
							[MetaxGPU] Support FastDeploy on metax gpu  ( #3241 )  
						
						... 
						
						
						
						* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-08-13 11:11:54 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						42af0b4b64 
					 
					
						
						
							
							[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )  
						
						... 
						
						
						
						* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr 
						
						
					 
					
						2025-08-11 13:39:28 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						b76b17fc1b 
					 
					
						
						
							
							qwen3 0.3B fix ( #3255 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-08 11:35:40 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						37569cca86 
					 
					
						
						
							
							[feat]add fast_weights_iterator ( #3258 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* add fast_weights_iterator
* update
* update 
						
						
					 
					
						2025-08-07 22:36:46 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						9408e667a5 
					 
					
						
						
							
							[bugfix]fix blockwisefp8 and all_reduce ( #3243 )  
						
						... 
						
						
						
						* fix
* update
* fix linear for prequant loader 
						
						
					 
					
						2025-08-06 23:54:33 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						20839abccf 
					 
					
						
						
							
							qwen3_moe ( #3084 )  
						
						
						
						
					 
					
						2025-08-06 14:45:27 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						db698bda01 
					 
					
						
						
							
							qwen loader ( #3057 )  
						
						
						
						
					 
					
						2025-07-30 19:09:38 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						832d25334a 
					 
					
						
						
							
							[Code Simplification] fix init_distributed_environment() ( #2982 )  
						
						
						
						
					 
					
						2025-07-24 11:43:28 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						0262ef7eb3 
					 
					
						
						
							
							custom all reduce support cuda graph ( #2938 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication 
						
						
					 
					
						2025-07-21 22:52:03 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						61b3997b85 
					 
					
						
						
							
							refactor rl get_name_mappings_to_training ( #2847 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix 
						
						
					 
					
						2025-07-15 07:31:42 -07:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						4c7b8bc458 
					 
					
						
						
							
							Simplify the Config code ( #2770 )  
						
						... 
						
						
						
						* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe 
						
						
					 
					
						2025-07-14 19:50:05 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						c08561c13a 
					 
					
						
						
							
							[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 ( #2799 )  
						
						
						
						
					 
					
						2025-07-11 15:09:43 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d0f4d6ba3a 
					 
					
						
						
							
							[GCU] Support gcu platform ( #2702 )  
						
						... 
						
						
						
						baseline: e7fa57ebaexing.wo@163.com > 
						
						
					 
					
						2025-07-08 13:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						26d5d737dd 
					 
					
						
						
							
							【Fearture】support qwen2 some func ( #2740 )  
						
						... 
						
						
						
						* add rl qwen model support
* fix
* fix 
						
						
					 
					
						2025-07-08 12:03:04 +08:00 
						 
				 
			
				
					
						
							
							
								liddk1121 
							
						 
					 
					
						
						
							
						
						1b54a2831e 
					 
					
						
						
							
							Adapt for iluvatar gpu ( #2684 )  
						
						
						
						
					 
					
						2025-07-07 16:53:14 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						240bdac2a4 
					 
					
						
						
							
							[feat] support fa3 backend for pd disaggregated ( #2695 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* delete use_fast_ffn 
						
						
					 
					
						2025-07-03 22:33:27 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						05c670e593 
					 
					
						
						
							
							[Sync] Update to latest code ( #2679 )  
						
						... 
						
						
						
						* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com > 
						
						
					 
					
						2025-07-03 15:43:53 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						92c2cfa2e7 
					 
					
						
						
							
							Sync v2.0 version of code to github repo  
						
						
						
						
					 
					
						2025-06-29 23:29:37 +00:00 
						 
				 
			
				
					
						
							
							
								jiangjiajun 
							
						 
					 
					
						
						
							
						
						684703fd72 
					 
					
						
						
							
							[LLM] First commit the llm deployment code  
						
						
						
						
					 
					
						2025-06-09 19:20:15 +08:00