[Feature] Support pd ep deployment with yiyan adapter (#4029)

* [Feature] Support mixed deployment with yiyan adapter in release2.2 * fix metrics * add unit test * add unit test * add unit test * Support pd ep deployment with yiyan adapter * Support pd ep deployment with yiyan adapter * refactor cache messager * support scheduler v1 in PD * suppport pd v1 + chunk prefill * suppport pd v1 + chunk prefill * add eplb * support eplb * support eplb * support eplb * support v1 * fix * fix * fix bug * remove eplb support * support prefix cache in P * fix bug * fix bug * support one stop in V1 * fix bug * fix ci * fix ci * fix * fix * fix * fix * fix --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-10-18 14:40:44 +08:00 · 2025-09-22 16:41:38 +08:00
parent 9845f0d010
commit 918ccdb123
22 changed files with 1838 additions and 343 deletions
--- a/fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py
+++ b/fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py
@@ -61,18 +61,12 @@ class RDMACommManager:
        Connect to remote gpu and write cache.
        """
        assert self.splitwise_role == "prefill", "only prefill can call this method"
-        addr = f"{ip}:{port!s}"
-        if addr in self.connected_rdma:
-            return True
        ret = self.messager.is_connected(ip, str(port))
        if ret:
-            self.connected_rdma.add(addr)
            return True

        ret = self.messager.connect(ip, str(port))
        logger.info(f"connect to remote rdma address {ip}:{port} status is {ret}")
-        if ret == 0:
-            self.connected_rdma.add(addr)
        return ret == 0

    def write_cache(self, ip, port, local_block_ids, remote_block_ids, layer_idx):