mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-11-03 02:53:26 +08:00
modify reasoning_output docs (#2696)
This commit is contained in:
@@ -1,19 +1,19 @@
|
||||
# Chain-of-Thought Content
|
||||
# Reasoning Outputs
|
||||
|
||||
The reasoning model returns a `reasoning_content` field in the output, representing the chain-of-thought content—the reasoning steps that lead to the final conclusion.
|
||||
Reasoning models return an additional `reasoning_content` field in their output, which contains the reasoning steps that led to the final conclusion.
|
||||
|
||||
## Currently Supported Chain-of-Thought Models
|
||||
| Model Name | Parser Name | Chain-of-Thought Enabled by Default |
|
||||
|----------------|----------------|-------------------------------------|
|
||||
| ernie-45-vl | ernie-45-vl | ✓ |
|
||||
| ernie-lite-vl | ernie-45-vl | ✓ |
|
||||
## Supported Models
|
||||
| Model Name | Parser Name | Eable_thinking by Default |
|
||||
|----------------|----------------|---------------------------|
|
||||
| baidu/ERNIE-4.5-VL-424B-A47B-Paddle | ernie-45-vl | ✓ |
|
||||
| baidu/ERNIE-4.5-VL-28B-A3B-Paddle | ernie-45-vl | ✓ |
|
||||
|
||||
The reasoning model requires a specified parser to interpret the reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
|
||||
The reasoning model requires a specified parser to extract reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
|
||||
|
||||
Interfaces that support toggling the reasoning mode:
|
||||
1. `/v1/chat/completions` request in OpenAI services.
|
||||
2. `/v1/chat/completions` request in the OpenAI Python client.
|
||||
3. `llm.chat` request in Offline interfaces.
|
||||
1. `/v1/chat/completions` requests in OpenAI services.
|
||||
2. `/v1/chat/completions` requests in the OpenAI Python client.
|
||||
3. `llm.chat` requests in Offline interfaces.
|
||||
|
||||
For reasoning models, the length of the reasoning content can be controlled via `reasoning_max_tokens`. Add `metadata={"reasoning_max_tokens": 1024}` to the request.
|
||||
|
||||
@@ -21,10 +21,15 @@ For reasoning models, the length of the reasoning content can be controlled via
|
||||
When launching the model service, specify the parser name using the `--reasoning-parser` argument.
|
||||
This parser will process the model's output and extract the `reasoning_content` field.
|
||||
```bash
|
||||
python -m fastdeploy.entrypoints.openai.api_server --model /root/merge_llm_model --enable-mm --tensor-parallel-size=8 --port 8192 --quantization wint4 --reasoning-parser=ernie-45-vl
|
||||
python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--model /path/to/your/model \
|
||||
--enable-mm \
|
||||
--tensor-parallel-size 8 \
|
||||
--port 8192 \
|
||||
--quantization wint4 \
|
||||
--reasoning-parser ernie-45-vl
|
||||
```
|
||||
|
||||
Next, send a `chat completion` request to the model:
|
||||
Next, make a request to the model that should return the reasoning content in the response.
|
||||
```bash
|
||||
curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
@@ -40,8 +45,8 @@ curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
|
||||
```
|
||||
The `reasoning_content` field contains the reasoning steps to reach the final conclusion, while the `content` field holds the conclusion itself.
|
||||
|
||||
### Streaming Sessions
|
||||
In streaming sessions, the `reasoning_content` field can be retrieved from the `delta` in `chat completion response chunks`.
|
||||
### Streaming chat completions
|
||||
Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta` field in `chat completion response chunks`
|
||||
```python
|
||||
from openai import OpenAI
|
||||
# Set OpenAI's API key and API base to use vLLM's API server.
|
||||
|
||||
@@ -5,8 +5,8 @@
|
||||
##目前支持思考链的模型
|
||||
| 模型名称 | 解析器名称 | 默认开启思考链 |
|
||||
|---------------|-------------|---------|
|
||||
| ernie-45-vl | ernie-45-vl | ✓ |
|
||||
| ernie-lite-vl | ernie-45-vl | ✓ |
|
||||
| baidu/ERNIE-4.5-VL-424B-A47B-Paddle | ernie-45-vl | ✓ |
|
||||
| baidu/ERNIE-4.5-VL-28B-A3B-Paddle | ernie-45-vl | ✓ |
|
||||
|
||||
思考模型需要指定解析器,以便于对思考内容进行解析. 通过`enable_thinking=False` 参数可以关闭模型思考模式.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user