Update docs for thinking model support

This commit is contained in:
Jiang-Jia-Jun
2025-09-09 10:08:05 +08:00
parent 12326b60e1
commit 1dc1397ef6
6 changed files with 7 additions and 3 deletions

View File

@@ -26,6 +26,7 @@ English | [简体中文](README_CN.md)
# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
## News
**[2025-09] 🔥 FastDeploy v2.2 is newly released! It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)!
**[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine.
**[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)

View File

@@ -26,7 +26,8 @@
# FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包
## 最新活动
**[2025-08] 🔥 FastDeploy v2.1 全新发布:** 全新的KV Cache调度策略更多模型支持PD分离和CUDA Graph昆仑、海光等更多硬件支持增强全方面优化服务和推理引擎的性能。
**[2025-09] 🔥 FastDeploy v2.2 全新发布: HuggingFace生态模型兼容性能进一步优化更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持!
**[2025-08] FastDeploy v2.1 发布:全新的KV Cache调度策略更多模型支持PD分离和CUDA Graph昆仑、海光等更多硬件支持增强全方面优化服务和推理引擎的性能。
**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)

View File

@@ -17,6 +17,7 @@
|ERNIE-4.5-300B-A47B-Base|BF16/WINT4/WINT8|✅|✅|✅|⛔|✅|128K|
|ERNIE-4.5-VL-424B-A47B|BF16/WINT4/WINT8|🚧|✅|🚧|⛔|🚧|128K|
|ERNIE-4.5-VL-28B-A3B|BF16/WINT4/WINT8|⛔|✅|🚧|⛔|🚧|128K|
|ERNIE-4.5-21B-A3B-Thinking|BF16/WINT4/WINT8/FP8|⛔|✅|✅|✅|✅|128K|
|ERNIE-4.5-21B-A3B|BF16/WINT4/WINT8/FP8|⛔|✅|✅|✅|✅|128K|
|ERNIE-4.5-21B-A3B-Base|BF16/WINT4/WINT8/FP8|⛔|✅|✅|⛔|✅|128K|
|ERNIE-4.5-0.3B|BF16/WINT8/FP8|⛔|✅|✅|⛔|✅|128K|

View File

@@ -33,7 +33,7 @@ These models accept text input.
|Models|DataType|Example HF Model|
|-|-|-|
|⭐ERNIE|BF16\WINT4\WINT8\W4A8C8\WINT2\FP8|baidu/ERNIE-4.5-VL-424B-A47B-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Paddle<br>&emsp;[quick start](./get_started/ernie-4.5.md) &emsp; [best practice](./best_practices/ERNIE-4.5-300B-A47B-Paddle.md);<br>baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-FP8-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Base-Paddle;<br>[baidu/ERNIE-4.5-21B-A3B-Paddle](./best_practices/ERNIE-4.5-21B-A3B-Paddle.md);<br>baidu/ERNIE-4.5-21B-A3B-Base-Paddle;<br>baidu/ERNIE-4.5-0.3B-Paddle<br>&emsp;[quick start](./get_started/quick_start.md) &emsp; [best practice](./best_practices/ERNIE-4.5-0.3B-Paddle.md);<br>baidu/ERNIE-4.5-0.3B-Base-Paddle, etc.|
|⭐ERNIE|BF16\WINT4\WINT8\W4A8C8\WINT2\FP8|baidu/ERNIE-4.5-VL-424B-A47B-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Paddle<br>&emsp;[quick start](./get_started/ernie-4.5.md) &emsp; [best practice](./best_practices/ERNIE-4.5-300B-A47B-Paddle.md);<br>baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-FP8-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Base-Paddle;<br>[baidu/ERNIE-4.5-21B-A3B-Paddle](./best_practices/ERNIE-4.5-21B-A3B-Paddle.md);<br>baidu/ERNIE-4.5-21B-A3B-Base-Paddle;<br>baidu/ERNIE-4.5-21B-A3B-Thinking;<br>baidu/ERNIE-4.5-0.3B-Paddle<br>&emsp;[quick start](./get_started/quick_start.md) &emsp; [best practice](./best_practices/ERNIE-4.5-0.3B-Paddle.md);<br>baidu/ERNIE-4.5-0.3B-Base-Paddle, etc.|
|⭐QWEN3-MOE|BF16/WINT4/WINT8/FP8|Qwen/Qwen3-235B-A22B;<br>Qwen/Qwen3-30B-A3B, etc.|
|⭐QWEN3|BF16/WINT8/FP8|Qwen/qwen3-32B;<br>Qwen/qwen3-14B;<br>Qwen/qwen3-8B;<br>Qwen/qwen3-4B;<br>Qwen/qwen3-1.7B;<br>[Qwen/qwen3-0.6B](./get_started/quick_start_qwen.md), etc.|
|⭐QWEN2.5|BF16/WINT8/FP8|Qwen/qwen2.5-72B;<br>Qwen/qwen2.5-32B;<br>Qwen/qwen2.5-14B;<br>Qwen/qwen2.5-7B;<br>Qwen/qwen2.5-3B;<br>Qwen/qwen2.5-1.5B;<br>Qwen/qwen2.5-0.5B, etc.|

View File

@@ -18,6 +18,7 @@
|ERNIE-4.5-VL-424B-A47B|BF16/WINT4/WINT8|🚧|✅|🚧|⛔|🚧|128K|
|ERNIE-4.5-VL-28B-A3B|BF16/WINT4/WINT8|⛔|✅|🚧|⛔|🚧|128K|
|ERNIE-4.5-21B-A3B|BF16/WINT4/WINT8/FP8|⛔|✅|✅|✅|✅|128K|
|ERNIE-4.5-21B-A3B-Thinking|BF16/WINT4/WINT8/FP8|⛔|✅|✅|✅|✅|128K|
|ERNIE-4.5-21B-A3B-Base|BF16/WINT4/WINT8/FP8|⛔|✅|✅|⛔|✅|128K|
|ERNIE-4.5-0.3B|BF16/WINT8/FP8|⛔|✅|✅|⛔|✅|128K|
|QWEN3-MOE|BF16/WINT4/WINT8/FP8|⛔|✅|✅|🚧|✅|128K|

View File

@@ -31,7 +31,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|模型|DataType|模型案例|
|-|-|-|
|⭐ERNIE|BF16\WINT4\WINT8\W4A8C8\WINT2\FP8|baidu/ERNIE-4.5-VL-424B-A47B-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Paddle<br>&emsp;[快速部署](./get_started/ernie-4.5.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-300B-A47B-Paddle.md);<br>baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-FP8-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Base-Paddle;<br>[baidu/ERNIE-4.5-21B-A3B-Paddle](./best_practices/ERNIE-4.5-21B-A3B-Paddle.md);<br>baidu/ERNIE-4.5-21B-A3B-Base-Paddle;<br>baidu/ERNIE-4.5-0.3B-Paddle<br>&emsp;[快速部署](./get_started/quick_start.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-0.3B-Paddle.md);<br>baidu/ERNIE-4.5-0.3B-Base-Paddle, etc.|
|⭐ERNIE|BF16\WINT4\WINT8\W4A8C8\WINT2\FP8|baidu/ERNIE-4.5-VL-424B-A47B-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Paddle<br>&emsp;[快速部署](./get_started/ernie-4.5.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-300B-A47B-Paddle.md);<br>baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-FP8-Paddle;<br>baidu/ERNIE-4.5-300B-A47B-Base-Paddle;<br>[baidu/ERNIE-4.5-21B-A3B-Paddle](./best_practices/ERNIE-4.5-21B-A3B-Paddle.md);<br>baidu/ERNIE-4.5-21B-A3B-Base-Paddle;<br>baidu/ERNIE-4.5-21B-A3B-Thinking;<br>baidu/ERNIE-4.5-0.3B-Paddle<br>&emsp;[快速部署](./get_started/quick_start.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-0.3B-Paddle.md);<br>baidu/ERNIE-4.5-0.3B-Base-Paddle, etc.|
|⭐QWEN3-MOE|BF16/WINT4/WINT8/FP8|Qwen/Qwen3-235B-A22B;<br>Qwen/Qwen3-30B-A3B, etc.|
|⭐QWEN3|BF16/WINT8/FP8|Qwen/qwen3-32B;<br>Qwen/qwen3-14B;<br>Qwen/qwen3-8B;<br>Qwen/qwen3-4B;<br>Qwen/qwen3-1.7B;<br>[Qwen/qwen3-0.6B](./get_started/quick_start_qwen.md), etc.|
|⭐QWEN2.5|BF16/WINT8/FP8|Qwen/qwen2.5-72B;<br>Qwen/qwen2.5-32B;<br>Qwen/qwen2.5-14B;<br>Qwen/qwen2.5-7B;<br>Qwen/qwen2.5-3B;<br>Qwen/qwen2.5-1.5B;<br>Qwen/qwen2.5-0.5B, etc.|