From 998968f1e8615cf8da1960c86d2bc807f037191c Mon Sep 17 00:00:00 2001
From: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Date: Wed, 30 Jul 2025 22:35:01 +0800
Subject: [PATCH] [Doc] Update parameters of serving

---
 docs/online_serving/README.md    | 1 +
 docs/zh/online_serving/README.md | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/online_serving/README.md b/docs/online_serving/README.md
index 691434eed..536637110 100644
--- a/docs/online_serving/README.md
+++ b/docs/online_serving/README.md
@@ -94,6 +94,7 @@ The differences in request parameters between FastDeploy and the OpenAI protocol
   - `enable_thinking`: Optional[bool] = True (whether to enable reasoning for models that support deep thinking)
   - `repetition_penalty`: Optional[float] = None (coefficient for directly penalizing repeated token generation (>1 penalizes repetition, <1 encourages repetition))
   - `return_token_ids`: Optional[bool] = False: (whether to return token ids as a list)
+  - `include_stop_str_in_output`: Optional[bool] = False: (whether to include the stop strings in output text. Defaults to False.)
 
 > Note: For multimodal models, since the reasoning chain is enabled by default, resulting in overly long outputs, `max_tokens` can be set to the model's maximum output length or the default value can be used.
 
diff --git a/docs/zh/online_serving/README.md b/docs/zh/online_serving/README.md
index a2d4f98d2..a0b92289f 100644
--- a/docs/zh/online_serving/README.md
+++ b/docs/zh/online_serving/README.md
@@ -93,6 +93,7 @@ FastDeploy 与 OpenAI 协议的请求参数差异如下，其余请求参数会
   - `enable_thinking`: Optional[bool] = True 支持深度思考的模型是否打开思考
   - `repetition_penalty`: Optional[float] = None: 直接对重复生成的token进行惩罚的系数（>1时惩罚重复，<1时鼓励重复）
   - `return_token_ids`: Optional[bool] = False: 是否返回 token id 列表
+  - `include_stop_str_in_output`: Optional[bool] = False: 是否返回结束符
 
 > 注: 若为多模态模型 由于思考链默认打开导致输出过长，max tokens 可以设置为模型最长输出，或使用默认值。