[Optimization] support mm prefill batch (#5313)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled

* support mm prefill batch

* update code

* update code

* update code

* update code

* fix encoder cache bug

* update code

* update code

* fix bug

* fix paddle ocr bug

* fix xpu bug

* update code
This commit is contained in:
kevin
2025-12-11 22:21:14 +08:00
committed by GitHub
parent 7116982995
commit 954a145d57
14 changed files with 769 additions and 296 deletions

View File

@@ -484,9 +484,9 @@ class ResourceManagerV1(ResourceManager):
request.image_start = np.sum(np.prod(grid_thw[: request.num_image_start], axis=1))
request.image_end = np.sum(np.prod(grid_thw[: request.num_image_end], axis=1))
cur_mm_hashes = inputs["mm_hashes"][request.num_image_start : request.num_image_end]
cur_mm_positions = inputs["mm_positions"][request.num_image_start : request.num_image_end]
if self.encoder_cache:
cur_mm_hashes = inputs["mm_hashes"][request.num_image_start : request.num_image_end]
cur_mm_positions = inputs["mm_positions"][request.num_image_start : request.num_image_end]
request.evict_mm_hashes = self.encoder_cache.apply_cache(cur_mm_hashes, cur_mm_positions)
# Compatible with scenarios without images and videos.
@@ -655,7 +655,7 @@ class ResourceManagerV1(ResourceManager):
request = self.waiting[0]
if (
not envs.FD_ENABLE_MAX_PREFILL
self.config.model_config.disable_mm_prefill_batch()
and self._is_mm_request(request)
and self.exist_mm_prefill(scheduled_reqs)
) or (paddle.is_compiled_with_xpu() and self.exist_prefill(scheduled_reqs)):