apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

zhuzixuan 8a9e7b53af

CE Compile Job / ce_job_pre_check (push) Has been cancelled

Details

Deploy GitHub Pages / deploy (push) Has been cancelled

Details

CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled

Details

CE Compile Job / FD-Clone-Linux (push) Has been cancelled

Details

CE Compile Job / Show Code Archive Output (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8090 (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8689 (push) Has been cancelled

Details

CE Compile Job / CE_UPLOAD (push) Has been cancelled

Details

Publish Job / publish_pre_check (push) Has been cancelled

Details

Publish Job / print_publish_pre_check_outputs (push) Has been cancelled

Details

Publish Job / FD-Clone-Linux (push) Has been cancelled

Details

Publish Job / Show Code Archive Output (push) Has been cancelled

Details

Publish Job / BUILD_SM8090 (push) Has been cancelled

Details

Publish Job / BUILD_SM8689 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled

Details

Publish Job / Run FD Image Build (push) Has been cancelled

Details

Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

Publish Job / Run Base Tests (push) Has been cancelled

Details

Publish Job / Run Accuracy Tests (push) Has been cancelled

Details

Publish Job / Run Stable Tests (push) Has been cancelled

Details

CI Images Build / FD-Clone-Linux (push) Has been cancelled

Details

CI Images Build / Show Code Archive Output (push) Has been cancelled

Details

CI Images Build / CI Images Build (push) Has been cancelled

Details

CI Images Build / BUILD_SM8090 (push) Has been cancelled

Details

CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

CI Images Build / Run Base Tests (push) Has been cancelled

Details

CI Images Build / Run Accuracy Tests (push) Has been cancelled

Details

CI Images Build / Run Stable Tests (push) Has been cancelled

Details

CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled

Details

[Docs]Supplement the English and Chinese user documentation for Tool calling (#4895 )

* tool calling文档编写，v1.0

* tool calling文档编写，v1.0

* tool calling文档编写，v1.0

* tool calling doc，v1.1

* tool calling doc，v1.1

* tool calling doc，v1.1

* tool calling doc，v1.1

2025-11-08 20:05:14 +08:00

6.5 KiB

Raw Blame History

Tool_Calling

This document describes how to configure the server in FastDeploy to use the tool parser, and how to invoke tools from the client.

Quickstart

Starting FastDeploy with Tool Calling Enabled.

Launch the server with tool-calling enabled.This example uses ERNIE-4.5-21B-A3B.Leverage the ernie-x1 reasoning parser and the ernie-x1 tool-call parser from the fastdeploy directory to extract the model’s reasoning content, response content, and the tool-calling information:

python -m fastdeploy.entrypoints.openai.api_server
    --model /models/ERNIE-4.5-21B-A3B \
    --port 8000 \
    --reasoning-parser ernie-x1 \
    --tool-call-parser ernie-x1

Example of triggering tool calling

Make a request containing the tool to trigger the model to use the available tool:

curl -X POST http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What's the weather in Beijing?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name, for example: Beijing"
              },
              "unit": {
                "type": "string",
                "enum": ["c", "f"],
                "description": "Temperature units: c = Celsius, f = Fahrenheit"
              }
            },
            "required": ["location", "unit"],
            "additionalProperties": false
          },
          "strict": true
        }
      }
    ],
    "stream": false
  }'

The example output is as follows. It shows that the model's output of the thought process reasoning_content and tool call information tool_calls was successfully parsed, and the current response content content is empty,finish_reason is tool_calls:

{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "multimodal_content": null,
                "reasoning_content": "User wants to ... ",
                "tool_calls": [
                    {
                        "id": "chatcmpl-tool-bc90641c67e44dbfb981a79bc986fbe5",
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\": \"北京\", \"unit\": \"c\"}"
                        }
                    }
                ],
                "finish_reason": "tool_calls"
            }
        }
    ]
}

Parallel Tool Calls

If the model can generate parallel tool calls, FastDeploy will return a list:

tool_calls=[
  {"id": "...", "function": {...}},
  {"id": "...", "function": {...}}
]

Requests containing tools in the conversation history

If tool-call information exists in previous turns, you can construct the request as follows:

curl -X POST "http://0.0.0.0:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "user",
      "content": "Hello,What's the weather in Beijing?"
    },
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "call_1",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": {
              "location": "Beijing",
              "unit": "c"
            }
          }
        }
      ],
      "thoughts": "Users need to check today's weather in Beijing."
    },
    {
      "role": "tool",
      "tool_call_id": "call_1",
      "content": {
        "type": "text",
        "text": "{\"location\": \"北京\",\"temperature\": \"23\",\"weather\": \"晴\",\"unit\": \"c\"}"
      }
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Determine weather in my location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "c",
                "f"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        },
        "strict": true
      }
    }
  ],
  "stream": false
}'

The parsed model output is as follows, containing the thought content reasoning_content and the response content content, with finish_reason set to stop:

{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Today's weather in Beijing is sunny with a temperature of 23 degrees Celsius.",
                "reasoning_content": "User wants to ...",
                "tool_calls": null
            },
            "finish_reason": "stop"
        }
    ]
}

Writing a Custom Tool Parser

FastDeploy supports custom tool parser plugins. You can refer to the following address to create a tool parser: fastdeploy/entrypoints/openai/tool_parser

A custom parser should implement:

# import the required packages
# register the tool parser to ToolParserManager
@ToolParserManager.register_module("my-parser")
class ToolParser:
    def __init__(self, tokenizer: AnyTokenizer):
      super().__init__(tokenizer)

    # implement the tool parse for non-stream call
    def extract_tool_calls(self, model_output: str, request: ChatCompletionRequest) -> ExtractToolCallInformation:
      return ExtractedToolCallInformation(tools_called=False,tool_calls=[],content=text)

    # implement the tool call parse for stream call
    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> DeltaMessage | None:
        return delta

Enable via:

python -m fastdeploy.entrypoints.openai.api_server
--model <model path>
--tool-parser-plugin <absolute path of the plugin file>
--tool-call-parser my-parser

6.5 KiB Raw Blame History Unescape Escape

Tool_Calling

Quickstart

Starting FastDeploy with Tool Calling Enabled.

Example of triggering tool calling

Parallel Tool Calls

Requests containing tools in the conversation history

Writing a Custom Tool Parser

6.5 KiB

Raw Blame History