Ayakouji 
							
						 
					 
					
						
						
							
						
						453487d5b0 
					 
					
						
						
							
							[Feat] ernie4_5_vl_moe support CudaGraph ( #3226 )  
						
						... 
						
						
						
						* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com > 
						
						
					 
					
						2025-09-10 13:11:57 +08:00 
						 
				 
			
				
					
						
							
							
								Jundong Liu 
							
						 
					 
					
						
						
							
						
						3d0aaa5923 
					 
					
						
						
							
							[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )  
						
						... 
						
						
						
						* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com > 
						
						
					 
					
						2025-09-08 13:12:24 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						2cf55168ca 
					 
					
						
						
							
							load hadamard_block_size from config ( #3797 )  
						
						
						
						
					 
					
						2025-09-05 17:07:58 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						88d44a2c93 
					 
					
						
						
							
							support mtp in v1_scheduler mode ( #3695 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / publish_pre_check (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / print_publish_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
			
				
	Publish Job / Run Stable Tests (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / CI Images Build (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Run Base Tests (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Run Accuracy Tests (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Run Stable Tests (push) Has been cancelled 
				
			 
		
			
				
	CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-09-04 17:39:59 +08:00 
						 
				 
			
				
					
						
							
							
								co63oc 
							
						 
					 
					
						
						
							
						
						5441538173 
					 
					
						
						
							
							rename fused_get_rope.cu ( #3752 )  
						
						... 
						
						
						
						* rename fused_get_rope.cu
* fix
* fix typos
* fix
* fix 
						
						
					 
					
						2025-09-03 10:54:34 +08:00 
						 
				 
			
				
					
						
							
							
								co63oc 
							
						 
					 
					
						
						
							
						
						d6369b4d51 
					 
					
						
						
							
							fix typos ( #3684 )  
						
						
						
						
					 
					
						2025-09-01 17:50:17 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						fe5d09f9ee 
					 
					
						
						
							
							[FIX]Fix Machete compile via ENABLE_MACHETE ( #3727 )  
						
						... 
						
						
						
						* add ENABLE_MACHETE
* fix
* revert
* update
* pre_commit
* fix
* fix
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: aquagull <hongyuh@qq.com > 
						
						
					 
					
						2025-08-30 17:50:17 +08:00 
						 
				 
			
				
					
						
							
							
								yangjianfengo1 
							
						 
					 
					
						
						
							
						
						3754a9906d 
					 
					
						
						
							
							[Feature] block sparse attention ( #3668 )  
						
						... 
						
						
						
						* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp  最后一层使用full attn 
						
						
					 
					
						2025-08-29 19:46:30 +08:00 
						 
				 
			
				
					
						
							
							
								Liumengyuan 
							
						 
					 
					
						
						
							
						
						e93d4cfcdd 
					 
					
						
						
							
							Add with_output version AppendAttention ( #3302 )  
						
						... 
						
						
						
						* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug 
						
						
					 
					
						2025-08-28 17:10:18 +08:00 
						 
				 
			
				
					
						
							
							
								yangjianfengo1 
							
						 
					 
					
						
						
							
						
						e81046fdad 
					 
					
						
						
							
							【New Feature】集中式支持w4afp8 ( #3644 )  
						
						... 
						
						
						
						* 支持tp w4afp8
* code style 
						
						
					 
					
						2025-08-28 10:53:24 +08:00 
						 
				 
			
				
					
						
							
							
								Sunny-bot1 
							
						 
					 
					
						
						
							
						
						479c8b85d3 
					 
					
						
						
							
							[Optimize]support machete weight only gemm ( #3561 )  
						
						... 
						
						
						
						* support machete weight only gemm
* add generate
* update
* fix
* change file location
* add sm_version limit
* fix
* fix
* fix ci
* fix coverage
* fix xpu 
						
						
					 
					
						2025-08-28 09:49:58 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						c694fa2879 
					 
					
						
						
							
							Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )  
						
						... 
						
						
						
						This reverts commit 646a0c2fd8 
						
						
					 
					
						2025-08-27 17:35:04 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						ad319a87cc 
					 
					
						
						
							
							support fa3 rope3d ( #3622 )  
						
						
						
						
					 
					
						2025-08-27 11:31:29 +08:00 
						 
				 
			
				
					
						
							
							
								yangjianfengo1 
							
						 
					 
					
						
						
							
						
						646a0c2fd8 
					 
					
						
						
							
							[Feature] block sparse attention ( #3209 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config 
						
						
					 
					
						2025-08-26 07:16:04 -07:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						52eda7fdb3 
					 
					
						
						
							
							[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram  ( #3610 )  
						
						
						
						
					 
					
						2025-08-26 14:29:22 +08:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						9205c88da1 
					 
					
						
						
							
							support w4afp8 EP inference ( #3044 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	CE Compile Job / ce_job_pre_check (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / FD-Clone-Linux (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / Show Code Archive Output (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8090 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / BUILD_SM8689 (push) Has been cancelled 
				
			 
		
			
				
	CE Compile Job / CE_UPLOAD (push) Has been cancelled 
				
			 
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-08-25 11:27:45 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						bcdfc1d6b9 
					 
					
						
						
							
							Add custom op declaration for all_reduce ( #3473 )  
						
						... 
						
						
						
						* add custom op declaration
* roll back try except 
						
						
					 
					
						2025-08-20 20:29:58 +08:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						a12d0bc549 
					 
					
						
						
							
							[Feature][MTP]update multi-draft-token strategy ( #3369 )  
						
						... 
						
						
						
						* update multi-draft-token strategy
* fix format
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com > 
						
						
					 
					
						2025-08-18 13:59:56 +08:00 
						 
				 
			
				
					
						
							
							
								chen 
							
						 
					 
					
						
						
							
						
						f0f00a6025 
					 
					
						
						
							
							[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700 
						
						
					 
					
						2025-08-14 22:40:44 +08:00 
						 
				 
			
				
					
						
							
							
								lzy 
							
						 
					 
					
						
						
							
						
						1e06b9fa6d 
					 
					
						
						
							
							make append_attn supports mask_offset ( #3138 )  
						
						... 
						
						
						
						* make append_attn supports mask_offset
* add unittest 
						
						
					 
					
						2025-08-14 03:40:55 -07:00 
						 
				 
			
				
					
						
							
							
								Yuan Xiaolan 
							
						 
					 
					
						
						
							
						
						7ce00e597c 
					 
					
						
						
							
							support qk norm ( #3145 )  
						
						
						
						
					 
					
						2025-08-05 16:46:14 +08:00 
						 
				 
			
				
					
						
							
							
								yangjianfengo1 
							
						 
					 
					
						
						
							
						
						64d7a3194d 
					 
					
						
						
							
							集中式支持fa3 ( #3112 )  
						
						
						
						
					 
					
						2025-08-01 18:03:36 +08:00 
						 
				 
			
				
					
						
							
							
								Ryan 
							
						 
					 
					
						
						
							
						
						94264bbf60 
					 
					
						
						
							
							[Code Simplification] Refactor Post-processing in VL Model Forward Method ( #2937 )  
						
						... 
						
						
						
						* rm sth useless
* refactor model forward
* mv bool index to kernel 
						
						
					 
					
						2025-08-01 17:28:07 +08:00 
						 
				 
			
				
					
						
							
							
								RAM 
							
						 
					 
					
						
						
							
						
						d850660872 
					 
					
						
						
							
							[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )  
						
						... 
						
						
						
						* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug 
						
						
					 
					
						2025-07-31 00:09:31 +08:00 
						 
				 
			
				
					
						
							
							
								JYChen 
							
						 
					 
					
						
						
							
						
						dafe02a7b9 
					 
					
						
						
							
							[stop sequence] support stop sequence ( #3025 )  
						
						... 
						
						
						
						* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0 
						
						
					 
					
						2025-07-29 14:17:37 +08:00 
						 
				 
			
				
					
						
							
							
								xiaoxiaohehe001 
							
						 
					 
					
						
						
							
						
						2970b00dfa 
					 
					
						
						
							
							[Feature] Support_eplb ( #2997 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep 
						
						
					 
					
						2025-07-24 20:22:45 +08:00 
						 
				 
			
				
					
						
							
							
								chenjian 
							
						 
					 
					
						
						
							
						
						85a78d695d 
					 
					
						
						
							
							[Feature] Support block scheduler v1 for FD ( #2928 )  
						
						... 
						
						
						
						* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com > 
						
						
					 
					
						2025-07-23 20:31:31 +08:00 
						 
				 
			
				
					
						
							
							
								GoldPancake 
							
						 
					 
					
						
						
							
						
						9b84d51e25 
					 
					
						
						
							
							[MTP Fix] Fix code and register cpp operators ( #2965 )  
						
						
						
						
					 
					
						2025-07-22 19:36:24 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						d306944f4f 
					 
					
						
						
							
							remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )  
						
						... 
						
						
						
						* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block 
						
						
					 
					
						2025-07-18 16:13:32 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						ddb10ac509 
					 
					
						
						
							
							[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )  
						
						... 
						
						
						
						* remove padding_offsets from atten 
						
						
					 
					
						2025-07-17 18:41:31 +08:00 
						 
				 
			
				
					
						
							
							
								周周周 
							
						 
					 
					
						
						
							
						
						aa76085d1f 
					 
					
						
						
							
							[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 ) 
						
						
					 
					
						2025-07-16 20:10:57 +08:00 
						 
				 
			
				
					
						
							
							
								Yuanle Liu 
							
						 
					 
					
						
						
							
						
						61b3997b85 
					 
					
						
						
							
							refactor rl get_name_mappings_to_training ( #2847 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix 
						
						
					 
					
						2025-07-15 07:31:42 -07:00 
						 
				 
			
				
					
						
							
							
								freeliuzc 
							
						 
					 
					
						
						
							
						
						7cdd8d290d 
					 
					
						
						
							
							[MTP] optimize mtp infer speed ( #2840 )  
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						
					 
					
						2025-07-14 19:50:22 +08:00 
						 
				 
			
				
					
						
							
							
								zhink 
							
						 
					 
					
						
						
							
						
						b89180f1cd 
					 
					
						
						
							
							[Feature] support custom all-reduce ( #2758 )  
						
						... 
						
						
						
						* [Feature] support custom all-reduce
* add vllm adapted 
						
						
					 
					
						2025-07-09 16:00:27 +08:00 
						 
				 
			
				
					
						
							
							
								RichardWooSJTU 
							
						 
					 
					
						
						
							
						
						fee544e808 
					 
					
						
						
							
							fix ep prefill ( #2762 )  
						
						
						
						
					 
					
						2025-07-09 14:03:05 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						1eb8ea7328 
					 
					
						
						
							
							[Bug fix] fix complie bug when sm < 89 ( #2738 )  
						
						
						
						
					 
					
						2025-07-08 11:24:52 +08:00 
						 
				 
			
				
					
						
							
							
								ming1753 
							
						 
					 
					
						
						
							
						
						ef6649a577 
					 
					
						
						
							
							[Optimize] Optimize tensorwise fp8 performance ( #2729 )  
						
						... 
						
						
	
		
			
	 
	
	
		
	
	
		
			
				
	Deploy GitHub Pages / deploy (push) Has been cancelled 
				
			 
		
		
	 
 
	 
						
						* [Optimize] Optimize tensorwise fp8 performance 
						
						
					 
					
						2025-07-07 20:06:28 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						05c670e593 
					 
					
						
						
							
							[Sync] Update to latest code ( #2679 )  
						
						... 
						
						
						
						* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com > 
						
						
					 
					
						2025-07-03 15:43:53 +08:00 
						 
				 
			
				
					
						
							
							
								Jiang-Jia-Jun 
							
						 
					 
					
						
						
							
						
						92c2cfa2e7 
					 
					
						
						
							
							Sync v2.0 version of code to github repo  
						
						
						
						
					 
					
						2025-06-29 23:29:37 +00:00