mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
* [Docs] Improve reasoning_out docs * [Docs] Improve reasoning_out docs * [Docs] Improve reasoning_out docs * [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking instruction * [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking instruction * [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking instruction --------- Co-authored-by: liqinrui <liqinrui@baidu.com>
228 lines
6.7 KiB
Markdown
228 lines
6.7 KiB
Markdown
# Tool_Calling
|
||
|
||
This document describes how to configure the server in FastDeploy to use the tool parser, and how to invoke tools from the client.
|
||
|
||
## Tool Call parser for Ernie series models
|
||
| Model Name | Parser Name |
|
||
|---------------|-------------|
|
||
| baidu/ERNIE-4.5-21B-A3B-Thinking | ernie-x1 |
|
||
| baidu/ERNIE-4.5-VL-28B-A3B-Thinking | ernie-45-vl-thinking |
|
||
|
||
## Quickstart
|
||
|
||
### Starting FastDeploy with Tool Calling Enabled.
|
||
|
||
Launch the server with tool-calling enabled.This example uses ERNIE-4.5-21B-A3B.Leverage the ernie-x1 reasoning parser and the ernie-x1 tool-call parser from the fastdeploy directory to extract the model’s reasoning content, response content, and the tool-calling information:
|
||
|
||
```bash
|
||
python -m fastdeploy.entrypoints.openai.api_server
|
||
--model /models/ERNIE-4.5-21B-A3B \
|
||
--port 8000 \
|
||
--reasoning-parser ernie-x1 \
|
||
--tool-call-parser ernie-x1
|
||
```
|
||
### Example of triggering tool calling
|
||
Make a request containing the tool to trigger the model to use the available tool:
|
||
```python
|
||
curl -X POST http://0.0.0.0:8000/v1/chat/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"messages": [
|
||
{
|
||
"role": "user",
|
||
"content": "What's the weather in Beijing?"
|
||
}
|
||
],
|
||
"tools": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_weather",
|
||
"description": "Get the current weather in a given location",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"location": {
|
||
"type": "string",
|
||
"description": "City name, for example: Beijing"
|
||
},
|
||
"unit": {
|
||
"type": "string",
|
||
"enum": ["c", "f"],
|
||
"description": "Temperature units: c = Celsius, f = Fahrenheit"
|
||
}
|
||
},
|
||
"required": ["location", "unit"],
|
||
"additionalProperties": false
|
||
},
|
||
"strict": true
|
||
}
|
||
}
|
||
],
|
||
"stream": false
|
||
}'
|
||
```
|
||
The example output is as follows. It shows that the model's output of the thought process `reasoning_content` and tool call information `tool_calls` was successfully parsed, and the current response content `content` is empty,`finish_reason` is `tool_calls`:
|
||
```bash
|
||
{
|
||
"choices": [
|
||
{
|
||
"index": 0,
|
||
"message": {
|
||
"role": "assistant",
|
||
"content": "",
|
||
"multimodal_content": null,
|
||
"reasoning_content": "User wants to ... ",
|
||
"tool_calls": [
|
||
{
|
||
"id": "chatcmpl-tool-bc90641c67e44dbfb981a79bc986fbe5",
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_weather",
|
||
"arguments": "{\"location\": \"北京\", \"unit\": \"c\"}"
|
||
}
|
||
}
|
||
],
|
||
"finish_reason": "tool_calls"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
## Parallel Tool Calls
|
||
If the model can generate parallel tool calls, FastDeploy will return a list:
|
||
```bash
|
||
tool_calls=[
|
||
{"id": "...", "function": {...}},
|
||
{"id": "...", "function": {...}}
|
||
]
|
||
```
|
||
|
||
## Requests containing tools in the conversation history
|
||
If tool-call information exists in previous turns, you can construct the request as follows:
|
||
```python
|
||
curl -X POST "http://0.0.0.0:8000/v1/chat/completions" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"messages": [
|
||
{
|
||
"role": "user",
|
||
"content": "Hello,What's the weather in Beijing?"
|
||
},
|
||
{
|
||
"role": "assistant",
|
||
"tool_calls": [
|
||
{
|
||
"id": "call_1",
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_weather",
|
||
"arguments": {
|
||
"location": "Beijing",
|
||
"unit": "c"
|
||
}
|
||
}
|
||
}
|
||
],
|
||
"thoughts": "Users need to check today's weather in Beijing."
|
||
},
|
||
{
|
||
"role": "tool",
|
||
"tool_call_id": "call_1",
|
||
"content": {
|
||
"type": "text",
|
||
"text": "{\"location\": \"北京\",\"temperature\": \"23\",\"weather\": \"晴\",\"unit\": \"c\"}"
|
||
}
|
||
}
|
||
],
|
||
"tools": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "get_weather",
|
||
"description": "Determine weather in my location",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"location": {
|
||
"type": "string",
|
||
"description": "The city and state e.g. San Francisco, CA"
|
||
},
|
||
"unit": {
|
||
"type": "string",
|
||
"enum": [
|
||
"c",
|
||
"f"
|
||
]
|
||
}
|
||
},
|
||
"additionalProperties": false,
|
||
"required": [
|
||
"location",
|
||
"unit"
|
||
]
|
||
},
|
||
"strict": true
|
||
}
|
||
}
|
||
],
|
||
"stream": false
|
||
}'
|
||
```
|
||
The parsed model output is as follows, containing the thought content `reasoning_content` and the response content `content`, with `finish_reason` set to stop:
|
||
```bash
|
||
{
|
||
"choices": [
|
||
{
|
||
"index": 0,
|
||
"message": {
|
||
"role": "assistant",
|
||
"content": "Today's weather in Beijing is sunny with a temperature of 23 degrees Celsius.",
|
||
"reasoning_content": "User wants to ...",
|
||
"tool_calls": null
|
||
},
|
||
"finish_reason": "stop"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
## Writing a Custom Tool Parser
|
||
FastDeploy supports custom tool parser plugins. You can refer to the following address to create a `tool parser`: `fastdeploy/entrypoints/openai/tool_parser`
|
||
|
||
A custom parser should implement:
|
||
``` python
|
||
# import the required packages
|
||
# register the tool parser to ToolParserManager
|
||
@ToolParserManager.register_module("my-parser")
|
||
class ToolParser:
|
||
def __init__(self, tokenizer: AnyTokenizer):
|
||
super().__init__(tokenizer)
|
||
|
||
# implement the tool parse for non-stream call
|
||
def extract_tool_calls(self, model_output: str, request: ChatCompletionRequest) -> ExtractToolCallInformation:
|
||
return ExtractedToolCallInformation(tools_called=False,tool_calls=[],content=text)
|
||
|
||
# implement the tool call parse for stream call
|
||
def extract_tool_calls_streaming(
|
||
self,
|
||
previous_text: str,
|
||
current_text: str,
|
||
delta_text: str,
|
||
previous_token_ids: Sequence[int],
|
||
current_token_ids: Sequence[int],
|
||
delta_token_ids: Sequence[int],
|
||
request: ChatCompletionRequest,
|
||
) -> DeltaMessage | None:
|
||
return delta
|
||
```
|
||
Enable via:
|
||
``` bash
|
||
python -m fastdeploy.entrypoints.openai.api_server
|
||
--model <model path>
|
||
--tool-parser-plugin <absolute path of the plugin file>
|
||
--tool-call-parser my-parser
|
||
```
|
||
|
||
---
|