[Feature] Support pd ep deployment with yiyan adapter (#4029)

* [Feature] Support mixed deployment with yiyan adapter in release2.2 * fix metrics * add unit test * add unit test * add unit test * Support pd ep deployment with yiyan adapter * Support pd ep deployment with yiyan adapter * refactor cache messager * support scheduler v1 in PD * suppport pd v1 + chunk prefill * suppport pd v1 + chunk prefill * add eplb * support eplb * support eplb * support eplb * support v1 * fix * fix * fix bug * remove eplb support * support prefix cache in P * fix bug * fix bug * support one stop in V1 * fix bug * fix ci * fix ci * fix * fix * fix * fix * fix --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-10-06 09:07:10 +08:00 · 2025-09-22 16:41:38 +08:00
parent 9845f0d010
commit 918ccdb123
22 changed files with 1838 additions and 343 deletions
--- a/fastdeploy/engine/resource_manager.py
+++ b/fastdeploy/engine/resource_manager.py
@@ -328,8 +328,8 @@ class ResourceManager:
        Delete cached data from the task's prompt token ids based on the cached length.
        """
        if cached_len == len(task.prompt_token_ids):
-            task.prompt_token_ids = task.prompt_token_ids[cached_len - 1 :]
-            task.seq_lens_decoder = cached_len - 1
+            task.prompt_token_ids = task.prompt_token_ids[cached_len - self.cfg.block_size :]
+            task.seq_lens_decoder = cached_len - self.cfg.block_size
        else:
            task.prompt_token_ids = task.prompt_token_ids[cached_len:]
            task.seq_lens_decoder = cached_len