lifulll 
							
						 
					 
					
						
						
							
						
						72094d4d82 
					 
					
						
						
							
							enable dcu ci ( #3402 )  
						
						
						
						
					 
					
						2025-08-29 10:23:08 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						52eda7fdb3 
					 
					
						
						
							
							[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram  ( #3610 )  
						
						
						
						
					 
					
						2025-08-26 14:29:22 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						2ae7ab28d2 
					 
					
						
						
							
							[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )  
						
						
						
						
					 
					
						2025-08-25 17:44:20 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						32b39620bc 
					 
					
						
						
							
							[Code Simplification] remove cum_offsets ( #3410 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-18 20:21:25 +08:00 
						 
				 
			
				
					
						
							
							
								Kane2011 
							
						 
					 
					
						
						
							
						
						b4fef2cf29 
					 
					
						
						
							
							[MetaxGPU] Support FastDeploy on metax gpu  ( #3241 )  
						
						... 
						
						
						
						* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-08-13 11:11:54 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						9571c458f0 
					 
					
						
						
							
							enhance eos_tokens ( #3274 )  
						
						... 
						
						
						
						* enhance eos_tokens
* update
* update 
						
						
					 
					
						2025-08-11 14:47:52 +08:00 
						 
				 
			
				
					
						
							
							
								yzwu 
							
						 
					 
					
						
						
							
						
						fbdd6b0663 
					 
					
						
						
							
							[Iluvatar GPU] Optimze attention and moe performance ( #3234 )  
						
						
						
						
					 
					
						2025-08-08 10:51:24 +08:00 
						 
				 
			
				
					
						
							
							
								JYChen 
							
						 
					 
					
						
						
							
						
						dafe02a7b9 
					 
					
						
						
							
							[stop sequence] support stop sequence ( #3025 )  
						
						... 
						
						
						
						* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0 
						
						
					 
					
						2025-07-29 14:17:37 +08:00 
						 
				 
			
				
					
						
							
							
								YuanRisheng 
							
						 
					 
					
						
						
							
						
						bddf403576 
					 
					
						
						
							
							Unify server-side and model-side Config (Part2) ( #3035 )  
						
						... 
						
						
						
						* merge speculative and graph opt conifg
* add attr 
						
						
					 
					
						2025-07-28 15:31:48 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						85a78d695d 
					 
					
						
						
							
							[Feature] Support block scheduler v1 for FD ( #2928 )  
						
						... 
						
						
						
						* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-23 20:31:31 +08:00 
						 
				 
			
				
					
						
							
							
								lizexu123 
							
						 
					 
					
						
						
							
						
						9b22b8d2c3 
					 
					
						
						
							
							delete max-len ( #2959 )  
						
						
						
						
					 
					
						2025-07-23 15:11:39 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						2c6a9e887e 
					 
					
						
						
							
							native top_p_sampling ( #2901 )  
						
						
						
						
					 
					
						2025-07-22 14:09:59 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						25698d56d1 
					 
					
						
						
							
							polish code with new pre-commit rule ( #2923 )  
						
						
						
						
					 
					
						2025-07-19 23:19:27 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								Zero Rains 
							
						 
					 
					
						
						
							
						
						e7bcbbab52 
					 
					
						
						
							
							Merge vl execution path into normal execution path ( #2829 )  
						
						... 
						
						
						
						* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c 
						
						
					 
					
						2025-07-15 22:20:03 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7cdd8d290d 
					 
					
						
						
							
							[MTP] optimize mtp infer speed ( #2840 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-14 19:50:22 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7f64d408a9 
					 
					
						
						
							
							[MTP] support expert-parellel in mtp ( #2835 )  
						
						
						
						
					 
					
						2025-07-14 14:28:50 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						d33105baeb 
					 
					
						
						
							
							[Feature] Online Chat API Support Return logprobs ( #2777 )  
						
						... 
						
						
						
						* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform 
						
						
					 
					
						2025-07-10 16:33:40 +08:00 
						 
				 
			
				
					
						
							
							
								lifulll 
							
						 
					 
					
						
						
							
						
						1f28bdf994 
					 
					
						
						
							
							dcu adapter ernie45t ( #2756 )  
						
						... 
						
						
						
						Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com > 
						
						
					 
					
						2025-07-09 18:56:27 +08:00 
						 
				 
			
				
					
						
							
							
								EnflameGCU 
							
						 
					 
					
						
						
							
						
						d0f4d6ba3a 
					 
					
						
						
							
							[GCU] Support gcu platform ( #2702 )  
						
						... 
						
						
						
						baseline: e7fa57ebaexing.wo@163.com > 
						
						
					 
					
						2025-07-08 13:00:52 +08:00 
						 
				 
			
				
					
						
							
							
								liddk1121 
							
						 
					 
					
						
						
							
						
						1b54a2831e 
					 
					
						
						
							
							Adapt for iluvatar gpu ( #2684 )  
						
						
						
						
					 
					
						2025-07-07 16:53:14 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						05c670e593 
					 
					
						
						
							
							[Sync] Update to latest code ( #2679 )  
						
						... 
						
						
						
						* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com > 
						
						
					 
					
						2025-07-03 15:43:53 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						92c2cfa2e7 
					 
					
						
						
							
							Sync v2.0 version of code to github repo  
						
						
						
						
					 
					
						2025-06-29 23:29:37 +00:00 
						 
				 
			
				
					
						
							
							
								jiangjiajun 
							
						 
					 
					
						
						
							
						
						684703fd72 
					 
					
						
						
							
							[LLM] First commit the llm deployment code  
						
						
						
						
					 
					
						2025-06-09 19:20:15 +08:00