Graceful shut down (#3785)

* feat(log):add_request_and_response_log * 优雅退出-接口增加退出时长参数
2025-10-06 17:17:14 +08:00 · 2025-09-04 19:33:50 +08:00
parent 88d44a2c93
commit ed97cf8396
5 changed files with 151 additions and 0 deletions
--- a/docs/best_practices/graceful_shutdown_service.md
+++ b/docs/best_practices/graceful_shutdown_service.md
@@ -0,0 +1,71 @@
 # Graceful Service Node Shutdown Solution
 ## 1. Core Objective
 Achieve graceful shutdown of service nodes, ensuring no in-flight user requests are lost during service termination while maintaining overall cluster availability.
 ## 2. Solution Overview
 This solution combines **Nginx reverse proxy**, **Gunicorn server**, **Uvicorn server**, and **FastAPI** working in collaboration to achieve the objective.
 ![graceful_shutdown](images/graceful_shutdown.png)
 ## 3. Component Introduction
 ### 1. Nginx: Traffic Entry Point and Load Balancer
 - **Functions**:
  - Acts as a reverse proxy, receiving all external client requests and distributing them to upstream Gunicorn worker nodes according to load balancing policies.
  - Actively monitors backend node health status through health check mechanisms.
  - Enables instantaneous removal of problematic nodes from the service pool through configuration management, achieving traffic switching.
 ### 2. Gunicorn: WSGI HTTP Server (Process Manager)
 - **Functions**:
  - Serves as the master process, managing multiple Uvicorn worker child processes.
  - Receives external signals (e.g., `SIGTERM`) and coordinates the graceful shutdown process for all child processes.
  - Daemonizes worker processes and automatically restarts them upon abnormal termination, ensuring service robustness.
 ### 3. Uvicorn: ASGI Server (Worker Process)
 - **Functions**:
  - Functions as a Gunicorn-managed worker, actually handling HTTP requests.
  - Runs the FastAPI application instance, processing specific business logic.
  - Implements the ASGI protocol, supporting asynchronous request processing for high performance.
 ---
 ## Advantages
 1. **Nginx**:
   - Can quickly isolate faulty nodes, ensuring overall service availability.
   - Allows configuration updates without downtime using `nginx -s reload`, making it transparent to users.
 2. **Gunicorn** (Compared to Uvicorn's native multi-worker mode):
   - **Mature Process Management**: Built-in comprehensive process spawning, recycling, and management logic, eliminating the need for custom implementation.
   - **Process Daemon Capability**: The Gunicorn Master automatically forks new Workers if they crash, whereas in Uvicorn's `--workers` mode, any crashed process is not restarted and requires an external daemon.
   - **Rich Configuration**: Offers numerous parameters for adjusting timeouts, number of workers, restart policies, etc.
 3. **Uvicorn**:
   - Extremely fast, built on uvloop and httptools.
   - Natively supports graceful shutdown: upon receiving a shutdown signal, it stops accepting new connections and waits for existing requests to complete before exiting.
 ---
 ## Graceful Shutdown Procedure
 When a specific node needs to be taken offline, the steps are as follows:
 1. **Nginx Monitors Node Health Status**:
   - Monitors the node's health status by periodically sending health check requests to it.
 2. **Removal from Load Balancing**:
   - Modify the Nginx configuration to mark the target node as `down` and reload the Nginx configuration.
   - Subsequently, all new requests will no longer be sent to the target node.
 3. **Gunicorn Server**:
   - Monitors for stop signals. Upon receiving a stop signal (e.g., `SIGTERM`), it relays this signal to all Uvicorn child processes.
 4. **Sending the Stop Signal**:
   - Send a `SIGTERM` signal to the Uvicorn process on the target node, triggering Uvicorn's graceful shutdown process.
 5. **Waiting for Request Processing**:
   - Wait for a period slightly longer than `timeout_graceful_shutdown` before forcefully terminating the service, allowing the node sufficient time to complete processing all received requests.
 6. **Shutdown Completion**:
   - The node has now processed all remaining requests and exited safely.
--- a/docs/best_practices/images/graceful_shutdown.png
+++ b/docs/best_practices/images/graceful_shutdown.png
--- a/docs/zh/best_practices/graceful_shutdown_service.md
+++ b/docs/zh/best_practices/graceful_shutdown_service.md
@@ -0,0 +1,71 @@
 # 服务节点优雅关闭方案
 ## 1. 核心目标
 实现服务节点的优雅关闭，确保在停止服务时不丢失任何正在处理的用户请求，同时不影响整个集群的可用性。
 ## 2. 实现方案说明
 该方案通过结合 **Nginx 反向代理**、**Gunicorn 服务器**、**Uvicorn 服务器** 和 **FastAPI** 协作来实现目标。
 ![graceful_shutdown](images/graceful_shutdown.png)
 ## 3. 组件介绍
 ### 1. Nginx：流量入口与负载均衡器
 - **功能**：
  - 作为反向代理，接收所有外部客户端请求并按负载均衡策略分发到上游（Upstream）的 Gunicorn 工作节点。
  - 通过健康检查机制主动监控后端节点的健康状态。
  - 通过配置管理，能够瞬时地将问题节点从服务池中摘除，实现流量切换。
 ### 2. Gunicorn：WSGI HTTP 服务器（进程管理器）
 - **功能**：
  - 作为主进程（Master Process），负责管理多个 Uvicorn 工作子进程（Worker Process）。
  - 接收外部信号（如 `SIGTERM`），并协调所有子进程的优雅关闭流程。
  - 守护工作进程，在进程异常退出时自动重启，保证服务健壮性。
 ### 3. Uvicorn：ASGI 服务器（工作进程）
 - **功能**：
  - 作为 Gunicorn 管理的 Worker，实际负责处理 HTTP 请求。
  - 运行 FastAPI 应用实例，处理具体的业务逻辑。
  - 实现 ASGI 协议，支持异步请求处理，高性能。
 ---
 ## 优势
 1. **Nginx**：
   - 能够快速隔离故障节点，保证整体服务的可用性。
   - 通过 `nginx -s reload` 可不停机更新配置，对用户无感知。
 2. **Gunicorn**（相比于 Uvicorn 原生的多 Worker）：
   - **成熟的进程管理**：内置了完善的进程生成、回收、管理逻辑，无需自己实现。
   - **进程守护能力**：Gunicorn Master 会在 Worker 异常退出后自动 fork 新 Worker，而 Uvicorn `--workers` 模式下任何进程崩溃都不会被重新拉起，需要外部守护进程。
   - **配置丰富**：提供大量参数用于调整超时、Worker 数量、重启策略等。
 3. **Uvicorn**：
   - 基于 uvloop 和 httptools，速度极快。
   - 原生支持优雅关闭：在收到关闭信号后，会停止接受新连接，并等待现有请求处理完成后再退出。
 ---
 ## 优雅关闭流程
 当需要下线某个特定节点时，步骤如下：
 1. **Nginx 监控节点状态是否健康**：
   - 通过向节点定时发送 health 请求，监控节点的健康状态。
 2. **从负载均衡中摘除**：
   - 修改 Nginx 配置，将该节点标记为 `down` 状态，并重载 Nginx 配置。
   - 此后，所有新请求将不再被发送到目标节点。
 3. **Gunicorn 服务器**：
   - 监控停止信号，收到停止信号（如 `SIGTERM` 信号）时，会把此信号向所有的 Uvicorn 子进程发送。
 4. **发送停止信号**：
   - 向目标节点的 Uvicorn 进程发送 `SIGTERM` 信号，触发 Uvicorn 的优雅关闭流程。
 5. **等待请求处理**：
   - 等待一段稍长于 `timeout_graceful_shutdown` 的时间后强制终止服务，让该节点有充足的时间完成所有已接收请求的处理。
 6. **关闭完成**：
   - 此时，该节点已经处理完所有存量请求并安全退出。
--- a/docs/zh/best_practices/images/graceful_shutdown.png
+++ b/docs/zh/best_practices/images/graceful_shutdown.png
--- a/fastdeploy/entrypoints/openai/api_server.py
+++ b/fastdeploy/entrypoints/openai/api_server.py
@@ -77,9 +77,17 @@ parser.add_argument(
    help="max waiting time for connection, if set value -1 means no waiting time limit",
 )
 parser.add_argument("--max-concurrency", default=512, type=int, help="max concurrency")
 parser.add_argument(
    "--enable-mm-output", action="store_true", help="Enable 'multimodal_content' field in response output. "
 )
 parser.add_argument(
    "--timeout-graceful-shutdown",
    default=0,
    type=int,
    help="timeout for graceful shutdown in seconds (used by uvicorn)",
 )
 parser = EngineArgs.add_cli_args(parser)
 args = parser.parse_args()
@@ -431,6 +439,7 @@ def launch_api_server() -> None:
            workers=args.workers,
            log_config=UVICORN_CONFIG,
            log_level="info",
            timeout_graceful_shutdown=args.timeout_graceful_shutdown,
        )  # set log level to error to avoid log
    except Exception as e:
        api_server_logger.error(f"launch sync http server error, {e}, {str(traceback.format_exc())}")