Add with_output version AppendAttention (#3302)

* get use_output from fd_config

* add clear TODO description

* add mask_offset para to align with develop

* fix bug

* fix use_output logic

* fix sot bug
This commit is contained in:
Liumengyuan
2025-08-28 17:10:18 +08:00
committed by GitHub
parent 94ded434bd
commit e93d4cfcdd
8 changed files with 1366 additions and 96 deletions

View File

@@ -378,7 +378,7 @@ class FlashAttentionBackend(AttentionBackend):
self.speculate_max_draft_token_num + 1,
self.causal,
self.speculative_method is not None,
)[0]
)
if metadata.max_len_tensor_cpu[1] > 0:
merge_prefill_decode_output(