[Feature] Guided Decoding add LLguidance backend (#5124)

* llguidance

* add requirements_guided_decoding.txt and doc

* fix test_guidance_*.py

* fix test_guidance_*.py && mv

* fix llguidance choice

* test_guidance_*

* rm lazy loader

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
This commit is contained in:
Daci
2025-12-03 20:23:57 +08:00
committed by GitHub
parent 4e8096bd0d
commit 83dbc4e5dd
14 changed files with 1307 additions and 8 deletions

View File

@@ -7,6 +7,7 @@
Structured Outputs refer to predefined format constraints that force large language models to generate content strictly following specified structures. This feature significantly improves output controllability and is suitable for scenarios requiring precise format outputs (such as API calls, data parsing, code generation, etc.), while supporting dynamic grammar extensions to balance flexibility and standardization.
FastDeploy supports using the [XGrammar](https://xgrammar.mlc.ai/docs/) backend to generate structured outputs.
FastDeploy supports using the [LLguidance](https://github.com/guidance-ai/llguidance) backend to generate structured outputs.
Supported output formats:

View File

@@ -44,7 +44,7 @@ When using FastDeploy to deploy models (including offline inference and service
| ```disable_sequence_parallel_moe``` | `bool` | Disable sequence parallel moe, default: False |
| ```splitwise_role``` | `str` | Whether to enable splitwise inference, default value: mixed, supported parameters: ["mixed", "decode", "prefill"] |
| ```innode_prefill_ports``` | `str` | Internal engine startup ports for prefill instances (only required for single-machine PD separation), default: None |
| ```guided_decoding_backend``` | `str` | Specify the guided decoding backend to use, supports `auto`, `xgrammar`, `off`, default: `off` |
| ```guided_decoding_backend``` | `str` | Specify the guided decoding backend to use, supports `auto`, `xgrammar`, `guidance`, `off`, default: `off` |
| ```guided_decoding_disable_any_whitespace``` | `bool` | Whether to disable whitespace generation during guided decoding, default: False |
| ```speculative_config``` | `dict[str]` | Speculative decoding configuration, only supports standard format JSON string, default: None |
| ```dynamic_load_weight``` | `int` | Whether to enable dynamic weight loading, default: 0 |

View File

@@ -7,6 +7,7 @@
Structured Outputs 是指通过预定义格式约束使大模型生成内容严格遵循指定结构。该功能可显著提升生成结果的可控性适用于需要精确格式输出的场景如API调用、数据解析、代码生成等同时支持动态语法扩展平衡灵活性与规范性。
FastDeploy 支持使用 [XGrammar](https://xgrammar.mlc.ai/docs/) 后端生成结构化输出。
FastDeploy 支持使用 [LLguidance](https://github.com/guidance-ai/llguidance) 后端生成结构化输出。
支持输出格式

View File

@@ -42,7 +42,7 @@
| ```disable_sequence_parallel_moe``` | `bool` | 禁止在TP+EP中使用序列并行优化, default: False |
| ```splitwise_role``` | `str` | 是否开启splitwise推理默认值mixed 支持参数为["mixed", "decode", "prefill"] |
| ```innode_prefill_ports``` | `str` | prefill 实例内部引擎启动端口 仅单机PD分离需要默认值None |
| ```guided_decoding_backend``` | `str` | 指定要使用的guided decoding后端支持 `auto`、`xgrammar`、`off`, 默认为 `off` |
| ```guided_decoding_backend``` | `str` | 指定要使用的guided decoding后端支持 `auto`、`xgrammar`、 `guidance`、`off`, 默认为 `off` |
| ```guided_decoding_disable_any_whitespace``` | `bool` | guided decoding期间是否禁止生成空格默认False |
| ```speculative_config``` | `dict[str]` | 投机解码配置仅支持标准格式json字符串默认为None |
| ```dynamic_load_weight``` | `int` | 是否动态加载权重默认0 |