[BugFix] fix too many open files problem (#3256)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled

* Update cache_messager.py

* fix too many open files problem

* fix too many open files problem

* fix too many open files problem

* fix ci bugs

* Update api_server.py

* add parameter

* format

* format

* format

* format

* Update parameters.md

* Update parameters.md

* Update serving_completion.py

* Update serving_chat.py

* Update envs.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
This commit is contained in:
ltd0924
2025-08-08 20:10:11 +08:00
committed by GitHub
parent 22255a65aa
commit 31d4fcb425
8 changed files with 182 additions and 24 deletions

View File

@@ -8,6 +8,8 @@ When using FastDeploy to deploy models (including offline inference and service
|:--------------|:----|:-----------|
| ```port``` | `int` | Only required for service deployment, HTTP service port number, default: 8000 |
| ```metrics_port``` | `int` | Only required for service deployment, metrics monitoring port number, default: 8001 |
| ```max_waiting_time``` | `int` | Only required for service deployment, maximum wait time for establishing a connection upon service request. Default: -1 (indicates no wait time limit).|
| ```max_concurrency``` | `int` | Only required for service deployment, the actual number of connections established by the service, default 512 |
| ```engine_worker_queue_port``` | `int` | FastDeploy internal engine communication port, default: 8002 |
| ```cache_queue_port``` | `int` | FastDeploy internal KVCache process communication port, default: 8003 |
| ```max_model_len``` | `int` | Default maximum supported context length for inference, default: 2048 |