[Feature] mm and thinking model support structred output (#2749)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled

* mm support structured output

* update code

* update code

* update format

* update code

* update code

* add enable_thinking default

* update code

* add structured_outputs test case

* add ci install xgrammar

* add ci timeout time

* update test for structured_outputs

* update code

* add error traceback info

* update error msg

* update structred output code

* update code

* update code

* update config

* update torch version

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
This commit is contained in:
kevin
2025-09-02 16:21:09 +08:00
committed by GitHub
parent 0e4df5a6f4
commit 1908465542
17 changed files with 1168 additions and 83 deletions

View File

@@ -330,3 +330,67 @@ ParsedChatCompletionMessage[Info](content='{"addr": "上海市浦东新区世纪
地址: 上海市浦东新区世纪大道1号
高度: 468
```
### 离线推理
离线推理允许通过预先指定约束条件,限制模型输出格式。在 `FastDeploy` 中,支持通过 `SamplingParams` 中的 `GuidedDecodingParams` 类指定相关约束条件。`GuidedDecodingParams` 支持以下几种约束条件,使用方式可以参考在线推理:
```python
json: Optional[Union[str, dict]] = None
regex: Optional[str] = None
choice: Optional[List[str]] = None
grammar: Optional[str] = None
json_object: Optional[bool] = None
structural_tag: Optional[str] = None
```
以下示例展示了如何使用离线推理生成一个结构化的 json :
```python
from fastdeploy import LLM, SamplingParams
from fastdeploy.engine.sampling_params import GuidedDecodingParams
from pydantic import BaseModel
from enum import Enum
class BookType(str, Enum):
romance = "Romance"
historical = "Historical"
adventure = "Adventure"
mystery = "Mystery"
dystopian = "Dystopian"
class BookDescription(BaseModel):
author: str
title: str
genre: BookType
# Constrained decoding parameters
guided_decoding_params = GuidedDecodingParams(json=BookDescription.model_json_schema())
# Sampling parameters
sampling_params = SamplingParams(
top_p=0.95,
max_tokens=6400,
guided_decoding=guided_decoding_params,
)
# Load model
llm = LLM(model="ERNIE-4.5-0.3B", tensor_parallel_size=1, max_model_len=8192, guided_decoding_backend="auto")
outputs = llm.generate(
prompts="生成一个JSON描述一本中国的著作要包含作者、标题和书籍类型。",
sampling_params=sampling_params,
)
# Output results
for output in outputs:
print(output.outputs.text)
```
输出
```
{"author": "曹雪芹", "title": "红楼梦", "genre": "Historical"}
```