bukejiyu 
							
						 
					 
					
						
						
							
						
						77514e3e1e 
					 
					
						
						
							
							[V1 Loader] support weight_only ( #3413 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* support wint4/wint8
* delete smoe case
* update ci
* print log 
						
						
					 
					
						2025-08-23 13:13:41 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						85fbf5455a 
					 
					
						
						
							
							[V1 Loader]Ernie VL support loader v1 ( #3494 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* ernie vl support new loader
* add unittest
* fix test 
						
						
					 
					
						2025-08-22 11:16:57 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						30b3f2dc07 
					 
					
						
						
							
							[BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 ( #3486 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-20 05:52:54 -07:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						fef447e350 
					 
					
						
						
							
							[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend ( #3447 )  
						
						... 
						
						
						
						* support deepgemm backend
* support marlin backend
* remove print
* fix process_prequanted_weights 
						
						
					 
					
						2025-08-19 14:15:53 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						09c979f3dd 
					 
					
						
						
							
							[V1 Loader] Support Ernie text(moe and dense) ( #3110 )  
						
						... 
						
						
						
						* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code 
						
						
					 
					
						2025-08-14 20:25:28 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						b4fef2cf29 
					 
					
						
						
							
							[MetaxGPU] Support FastDeploy on metax gpu  ( #3241 )  
						
						... 
						
						
						
						* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-08-13 11:11:54 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						42af0b4b64 
					 
					
						
						
							
							[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )  
						
						... 
						
						
						
						* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr 
						
						
					 
					
						2025-08-11 13:39:28 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						a799d14df1 
					 
					
						
						
							
							[Bugfix] Fix model accuracy in some ops ( #3231 )  
						
						... 
						
						
						
						* fix noaux_tc op
* fix
* update
* fix qk norm
* fix linear for prequant loader
* test
* fix
* fix
* rm some print
* fix noaux_tc op
* test
* Fix the confused enable_early_stop when only set early_stop_config (#3214 )
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
* Add ci case for min token and max token (#3229 )
Co-authored-by: xujing43 <xujing43@baidu.com >
* add some evil cases (#3240 )
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* qwen3_moe (#3084 )
* [Feature] support seed parameter (#3161 )
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
* 【Fix Bug】 修复 fa3 支持集中式bug (#3235 )
* fix fa3 集中式bug
* 增加qknorm参数
* fix qk norm
* fix
* update
* fix linear for prequant loader
* fix
* fix
* rm some print
* fix
* fix moe init weight&scale
* fix moe init weight&scale
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: Zero Rains <linjunlu@zerorains.top >
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com >
Co-authored-by: xujing43 <xujing43@baidu.com >
Co-authored-by: Divano <dddivano@outlook.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com >
Co-authored-by: qingqing01 <dangqingqing@baidu.com > 
						
						
					 
					
						2025-08-08 17:30:37 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						ce1f353c70 
					 
					
						
						
							
							Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend ( #3148 )  
						
						... 
						
						
						
						* w4a8 bug
* fix w4a8 bug
* remove code
* modify the triton backend
* fix ep
* fix the bug with tensor_wise_fp8 in triton backend
* fix the RL
* fix bug by merge
* fix the bug in w4a8
* fix the tensor_wise_fp8 bug
* fix RL 
						
						
					 
					
						2025-08-08 15:55:47 +08:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						20839abccf 
					 
					
						
						
							
							qwen3_moe ( #3084 )  
						
						
						
						
					 
					
						2025-08-06 14:45:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						af543b7f0f 
					 
					
						
						
							
							revise get_moe_scores ( #3164 )  
						
						
						
						
					 
					
						2025-08-05 16:43:07 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						f5c64a074c 
					 
					
						
						
							
							[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization  ( #3182 )  
						
						... 
						
						
						
						* Add support for mixed-ep across multi nodes
* code refine
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com > 
						
						
					 
					
						2025-08-05 15:40:11 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						1f8289e106 
					 
					
						
						
							
							fix expertwise_scale ( #3181 )  
						
						
						
						
					 
					
						2025-08-04 20:06:15 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						5f56d289a7 
					 
					
						
						
							
							fix is_permuted ( #3098 )  
						
						... 
						
						
						
						Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-31 19:58:05 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						35935da9e5 
					 
					
						
						
							
							support W4A8 EPLB ( #3075 )  
						
						
						
						
					 
					
						2025-07-30 14:34:12 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						3214fb5393 
					 
					
						
						
							
							support model loading for w4a8 offline quant ( #3064 )  
						
						... 
						
						
						
						支持W4A8 EP 对离线量化权重的load 
						
						
					 
					
						2025-07-29 21:54:37 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						be0a0f2bb2 
					 
					
						
						
							
							fix arguement error in ep when pd ( #3060 )  
						
						
						
						
					 
					
						2025-07-29 17:17:24 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						502ee92a0a 
					 
					
						
						
							
							Unify server-side and model-side Config (Part3)  ( #3047 )  
						
						... 
						
						
						
						* merge model config
* fix arch
* fix rl 
						
						
					 
					
						2025-07-29 17:07:44 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						907d561523 
					 
					
						
						
							
							fix ep when paddle version mismatch ( #3056 )  
						
						
						
						
					 
					
						2025-07-29 15:06:49 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						b1d787a272 
					 
					
						
						
							
							[fix] w4a8 model loading and hadamard config ( #3013 )  
						
						
						
						
					 
					
						2025-07-28 18:17:59 +08:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						ec52d39e68 
					 
					
						
						
							
							【Inference Optimize】Update wint2 weight n-dim reorder ( #3042 )  
						
						
						
						
					 
					
						2025-07-28 16:31:56 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						247010d298 
					 
					
						
						
							
							fix arguement error ( #3030 )  
						
						
						
						
					 
					
						2025-07-28 11:03:29 +08:00 
						 
				 
			
				
					
						
							
							
								Longzhi Wang 
							
						 
					 
					
						
						
							
						
						0700c90caa 
					 
					
						
						
							
							[Feat] support mixed ep ( #2969 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict 
						
						
					 
					
						2025-07-25 15:29:30 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						2970b00dfa 
					 
					
						
						
							
							[Feature] Support_eplb ( #2997 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep 
						
						
					 
					
						2025-07-24 20:22:45 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						0fb37ab7e4 
					 
					
						
						
							
							update flake8 version to support pre-commit in python3.12 ( #3000 )  
						
						... 
						
						
						
						* update flake8 version to support pre-commit in python3.12
* polish code 
						
						
					 
					
						2025-07-24 01:43:31 -07:00 
						 
				 
			
				
					
						
							
							
								bukejiyu 
							
						 
					 
					
						
						
							
						
						bfeb664ab8 
					 
					
						
						
							
							update ( #2978 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-24 00:16:42 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						ad202272ed 
					 
					
						
						
							
							【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )  
						
						
						
						
					 
					
						2025-07-23 13:02:50 +08:00 
						 
				 
			
				
					
						
							
							
								K11OntheBoat 
							
						 
					 
					
						
						
							
						
						93bb68aa71 
					 
					
						
						
							
							[Feature] Marlin MoE backend supports DeepseekV3 ( #2962 )  
						
						... 
						
						
						
						Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”> 
						
						
					 
					
						2025-07-22 18:11:15 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						0262ef7eb3 
					 
					
						
						
							
							custom all reduce support cuda graph ( #2938 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication 
						
						
					 
					
						2025-07-21 22:52:03 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ff4569f135 
					 
					
						
						
							
							remove some code in ep.py ( #2947 )  
						
						
						
						
					 
					
						2025-07-21 22:44:57 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						dda4a9f848 
					 
					
						
						
							
							rl update ( #2861 )  
						
						
						
						
					 
					
						2025-07-16 00:33:10 -07:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						2d1184aefe 
					 
					
						
						
							
							[Fix] fix expert_parallel bug in decoder stage ( #2848 )  
						
						
						
						
					 
					
						2025-07-16 11:08:18 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						61b3997b85 
					 
					
						
						
							
							refactor rl get_name_mappings_to_training ( #2847 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix 
						
						
					 
					
						2025-07-15 07:31:42 -07:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						fd91da7b41 
					 
					
						
						
							
							【Inference Optimize】Support  wint2 triton kernel about triton_utils_v2 ( #2842 )  
						
						... 
						
						
						
						* update supported_models doc 
						
						
					 
					
						2025-07-15 14:35:40 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						4c7b8bc458 
					 
					
						
						
							
							Simplify the Config code ( #2770 )  
						
						... 
						
						
						
						* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe 
						
						
					 
					
						2025-07-14 19:50:05 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						888780ffde 
					 
					
						
						
							
							[Feature] block_wise_fp8 support triton_moe_backend ( #2767 )  
						
						
						
						
					 
					
						2025-07-09 19:22:47 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						1f28bdf994 
					 
					
						
						
							
							dcu adapter ernie45t ( #2756 )  
						
						... 
						
						
						
						Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-07-09 18:56:27 +08:00 
						 
				 
			
				
					
						
							
							
								yulangz 
							
						 
					 
					
						
						
							
						
						be21ef5047 
					 
					
						
						
							
							[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )  
						
						... 
						
						
						
						* fix no quant xpu moe
* change dir of xpu moe weight only 
						
						
					 
					
						2025-07-09 15:57:51 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						fee544e808 
					 
					
						
						
							
							fix ep prefill ( #2762 )  
						
						
						
						
					 
					
						2025-07-09 14:03:05 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d0f4d6ba3a 
					 
					
						
						
							
							[GCU] Support gcu platform ( #2702 )  
						
						... 
						
						
						
						baseline: e7fa57ebaexing.wo@163.com > 
						
						
					 
					
						2025-07-08 13:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								gaoziyuan 
							
						 
					 
					
						
						
							
						
						26d5d737dd 
					 
					
						
						
							
							【Fearture】support qwen2 some func ( #2740 )  
						
						... 
						
						
						
						* add rl qwen model support
* fix
* fix 
						
						
					 
					
						2025-07-08 12:03:04 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						1eb8ea7328 
					 
					
						
						
							
							[Bug fix] fix complie bug when sm < 89 ( #2738 )  
						
						
						
						
					 
					
						2025-07-08 11:24:52 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						ef6649a577 
					 
					
						
						
							
							[Optimize] Optimize tensorwise fp8 performance ( #2729 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Optimize] Optimize tensorwise fp8 performance 
						
						
					 
					
						2025-07-07 20:06:28 +08:00 
						 
				 
			
				
					
						
							
							
								liddk1121 
							
						 
					 
					
						
						
							
						
						1b54a2831e 
					 
					
						
						
							
							Adapt for iluvatar gpu ( #2684 )  
						
						
						
						
					 
					
						2025-07-07 16:53:14 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						05c670e593 
					 
					
						
						
							
							[Sync] Update to latest code ( #2679 )  
						
						... 
						
						
						
						* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com > 
						
						
					 
					
						2025-07-03 15:43:53 +08:00 
						 
				 
			
				
					
						
							
							
								AIbin 
							
						 
					 
					
						
						
							
						
						a197dcd729 
					 
					
						
						
							
							【Inference Optimize】Support ERNIE-4_5-300B-A47B-2BITS-Paddle model TP2/TP4 Inference ( #2666 )  
						
						... 
						
						
						
						* Support TP2&TP4 Wint
* Support TP2&TP4 Wint2 Inference 
						
						
					 
					
						2025-07-01 18:29:11 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						92c2cfa2e7 
					 
					
						
						
							
							Sync v2.0 version of code to github repo  
						
						
						
						
					 
					
						2025-06-29 23:29:37 +00:00 
						 
				 
			
				
					
						
							
							
								jiangjiajun 
							
						 
					 
					
						
						
							
						
						684703fd72 
					 
					
						
						
							
							[LLM] First commit the llm deployment code  
						
						
						
						
					 
					
						2025-06-09 19:20:15 +08:00