mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
[Feature] mm and thinking model support structred output (#2749)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output * update code * update code * update format * update code * update code * add enable_thinking default * update code * add structured_outputs test case * add ci install xgrammar * add ci timeout time * update test for structured_outputs * update code * add error traceback info * update error msg * update structred output code * update code * update code * update config * update torch version --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
This commit is contained in:
@@ -330,3 +330,67 @@ ParsedChatCompletionMessage[Info](content='{"addr": "上海市浦东新区世纪
|
||||
地址: 上海市浦东新区世纪大道1号
|
||||
高度: 468
|
||||
```
|
||||
|
||||
### 离线推理
|
||||
|
||||
离线推理允许通过预先指定约束条件,限制模型输出格式。在 `FastDeploy` 中,支持通过 `SamplingParams` 中的 `GuidedDecodingParams` 类指定相关约束条件。`GuidedDecodingParams` 支持以下几种约束条件,使用方式可以参考在线推理:
|
||||
|
||||
```python
|
||||
json: Optional[Union[str, dict]] = None
|
||||
regex: Optional[str] = None
|
||||
choice: Optional[List[str]] = None
|
||||
grammar: Optional[str] = None
|
||||
json_object: Optional[bool] = None
|
||||
structural_tag: Optional[str] = None
|
||||
```
|
||||
|
||||
以下示例展示了如何使用离线推理生成一个结构化的 json :
|
||||
|
||||
```python
|
||||
|
||||
from fastdeploy import LLM, SamplingParams
|
||||
from fastdeploy.engine.sampling_params import GuidedDecodingParams
|
||||
from pydantic import BaseModel
|
||||
from enum import Enum
|
||||
|
||||
class BookType(str, Enum):
|
||||
romance = "Romance"
|
||||
historical = "Historical"
|
||||
adventure = "Adventure"
|
||||
mystery = "Mystery"
|
||||
dystopian = "Dystopian"
|
||||
|
||||
class BookDescription(BaseModel):
|
||||
author: str
|
||||
title: str
|
||||
genre: BookType
|
||||
|
||||
# Constrained decoding parameters
|
||||
guided_decoding_params = GuidedDecodingParams(json=BookDescription.model_json_schema())
|
||||
|
||||
# Sampling parameters
|
||||
sampling_params = SamplingParams(
|
||||
top_p=0.95,
|
||||
max_tokens=6400,
|
||||
guided_decoding=guided_decoding_params,
|
||||
)
|
||||
|
||||
# Load model
|
||||
llm = LLM(model="ERNIE-4.5-0.3B", tensor_parallel_size=1, max_model_len=8192, guided_decoding_backend="auto")
|
||||
|
||||
outputs = llm.generate(
|
||||
prompts="生成一个JSON,描述一本中国的著作,要包含作者、标题和书籍类型。",
|
||||
sampling_params=sampling_params,
|
||||
)
|
||||
|
||||
# Output results
|
||||
for output in outputs:
|
||||
print(output.outputs.text)
|
||||
|
||||
```
|
||||
|
||||
输出
|
||||
|
||||
```
|
||||
{"author": "曹雪芹", "title": "红楼梦", "genre": "Historical"}
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user