[Fearture] Support mm model close prefix cache (#4459)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled

* [Feature] support prefix cache in DP

* fix

* Update common_engine.py

* Update common_engine.py

* Update common_engine.py

* Update common_engine.py

* [BugFix] fix workers more than 1

* fix

* Update api_server.py

* fix

* Update api_server.py

* fix

* [Fearture] Support mm model close prefix cache

* Update api_server.py

* Update engine_client.py

* Update engine_client.py

* add test

* Update test_chat.py

* fix

* fix

* Update test_chat.py

* Update test_chat.py

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
This commit is contained in:
ltd0924
2025-10-21 15:37:59 +08:00
committed by GitHub
parent 2b53c4d684
commit fb76cdfb4f
4 changed files with 47 additions and 4 deletions

View File

@@ -21,6 +21,18 @@ from fastdeploy.utils import get_logger
logger = get_logger("prefix_cache_manager", "prefix_cache_manager.log")
DISABLE_PREFIX_CACHE_MM_MODEL: set[str] = {
"Ernie5ForCausalLM",
}
def is_mm_model_disable_prefix_cache(model_config):
"""
check if the model architecture is in DISABLE_PREFIX_CACHE_MM_MODEL
"""
return model_config._architecture in DISABLE_PREFIX_CACHE_MM_MODEL
class CacheStatus(Enum):
"""
cache status enum class