[BugFix] fix too many open files problem (#3256)

* Update cache_messager.py * fix too many open files problem * fix too many open files problem * fix too many open files problem * fix ci bugs * Update api_server.py * add parameter * format * format * format * format * Update parameters.md * Update parameters.md * Update serving_completion.py * Update serving_chat.py * Update envs.py --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-24 13:28:13 +08:00 · 2025-08-08 20:10:11 +08:00
parent 22255a65aa
commit 31d4fcb425
8 changed files with 182 additions and 24 deletions
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -8,6 +8,8 @@ When using FastDeploy to deploy models (including offline inference and service
 |:--------------|:----|:-----------|
 | ```port``` | `int` | Only required for service deployment, HTTP service port number, default: 8000 |
 | ```metrics_port``` | `int` | Only required for service deployment, metrics monitoring port number, default: 8001 |
+| ```max_waiting_time``` | `int` | Only required for service deployment, maximum wait time for establishing a connection upon service request. Default: -1 (indicates no wait time limit).|
+| ```max_concurrency```  | `int` | Only required for service deployment, the actual number of connections established by the service, default 512 |
 | ```engine_worker_queue_port``` | `int` | FastDeploy internal engine communication port, default: 8002 |
 | ```cache_queue_port``` | `int` | FastDeploy internal KVCache process communication port, default: 8003 |
 | ```max_model_len``` | `int` | Default maximum supported context length for inference, default: 2048 |