更新文档 (#3996)

This commit is contained in:
yangjianfengo1
2025-09-09 10:23:51 +08:00
committed by GitHub
parent 934071578a
commit 14df2c59da
4 changed files with 65 additions and 64 deletions

View File

@@ -26,6 +26,8 @@ English | [简体中文](README_CN.md)
# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle # FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
## News ## News
**[2025-09] 🔥 FastDeploy v2.2 is newly released!** It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)!
**[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine. **[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine.
**[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) **[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)
@@ -57,8 +59,9 @@ FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**,
- [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md) - [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md)
- [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md) - [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md)
- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md) - [Hygon DCU](./docs/get_started/installation/hygon_dcu.md)
- [MetaX GPU](./docs/get_started/installation/metax_gpu.md.md)
**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU and MetaX GPU are currently under development and testing. Stay tuned for updates! **Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates!
## Get Started ## Get Started
@@ -68,20 +71,12 @@ Learn how to use FastDeploy through our documentation:
- [ERNIE-4.5-VL Multimodal Model Deployment](./docs/get_started/ernie-4.5-vl.md) - [ERNIE-4.5-VL Multimodal Model Deployment](./docs/get_started/ernie-4.5-vl.md)
- [Offline Inference Development](./docs/offline_inference.md) - [Offline Inference Development](./docs/offline_inference.md)
- [Online Service Deployment](./docs/online_serving/README.md) - [Online Service Deployment](./docs/online_serving/README.md)
- [Full Supported Models List](./docs/supported_models.md)
- [Best Practices](./docs/best_practices/README.md) - [Best Practices](./docs/best_practices/README.md)
## Supported Models ## Supported Models
| Model | Data Type | PD Disaggregation | Chunked Prefill | Prefix Caching | MTP | CUDA Graph | Maximum Context Length | Learn how to download models, enable using the torch format, and more:
|:--- | :------- | :---------- | :-------- | :-------- | :----- | :----- | :----- | - [Full Supported Models List](./docs/supported_models.md)
|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅| ✅ |128K |
|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|❌| ✅ | 128K |
|ERNIE-4.5-VL-424B-A47B | BF16/WINT4/WINT8 | WIP | ✅ | WIP | ❌ | WIP |128K |
|ERNIE-4.5-VL-28B-A3B | BF16/WINT4/WINT8 | ❌ | ✅ | WIP | ❌ | WIP |128K |
|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ✅ | ✅|128K |
|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ✅ | ✅ | ✅ | ❌ | ✅|128K |
|ERNIE-4.5-0.3B | BF16/WINT8/FP8 | ✅ | ✅ | ✅ | ❌ | ✅| 128K |
## Advanced Usage ## Advanced Usage

View File

@@ -26,7 +26,9 @@
# FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包 # FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包
## 最新活动 ## 最新活动
**[2025-08] 🔥 FastDeploy v2.1 全新发布:** 全新的KV Cache调度策略更多模型支持PD分离和CUDA Graph昆仑、海光等更多硬件支持增强全方面优化服务和推理引擎的性能。 **[2025-09] 🔥 FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容性能进一步优化更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持!
**[2025-08] FastDeploy v2.1 发布**:全新的KV Cache调度策略更多模型支持PD分离和CUDA Graph昆仑、海光等更多硬件支持增强全方面优化服务和推理引擎的性能。
**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) **[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)
@@ -55,8 +57,9 @@ FastDeploy 支持在**英伟达NVIDIAGPU**、**昆仑芯KunlunxinXPU
- [天数 CoreX](./docs/zh/get_started/installation/iluvatar_gpu.md) - [天数 CoreX](./docs/zh/get_started/installation/iluvatar_gpu.md)
- [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md) - [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md)
- [海光 DCU](./docs/zh/get_started/installation/hygon_dcu.md) - [海光 DCU](./docs/zh/get_started/installation/hygon_dcu.md)
- [沐曦 GPU](./docs/zh/get_started/installation/metax_gpu.md.md)
**注意:** 我们正在积极拓展硬件支持范围。目前包括昇腾AscendNPU 和 沐曦MetaXGPU 在内的其他硬件平台正在开发测试中。敬请关注更新! **注意:** 我们正在积极拓展硬件支持范围。目前包括昇腾AscendNPU 其他硬件平台正在开发测试中。敬请关注更新!
## 入门指南 ## 入门指南
@@ -66,20 +69,12 @@ FastDeploy 支持在**英伟达NVIDIAGPU**、**昆仑芯KunlunxinXPU
- [ERNIE-4.5-VL 部署](./docs/zh/get_started/ernie-4.5-vl.md) - [ERNIE-4.5-VL 部署](./docs/zh/get_started/ernie-4.5-vl.md)
- [离线推理](./docs/zh/offline_inference.md) - [离线推理](./docs/zh/offline_inference.md)
- [在线服务](./docs/zh/online_serving/README.md) - [在线服务](./docs/zh/online_serving/README.md)
- [模型支持列表](./docs/zh/supported_models.md)
- [最佳实践](./docs/zh/best_practices/README.md) - [最佳实践](./docs/zh/best_practices/README.md)
## 支持模型列表 ## 支持模型列表
| Model | Data Type | PD Disaggregation | Chunked Prefill | Prefix Caching | MTP | CUDA Graph | Maximum Context Length | 通过我们的文档了解如何下载模型如何支持torch格式等
|:--- | :------- | :---------- | :-------- | :-------- | :----- | :----- | :----- | - [模型支持列表](./docs/zh/supported_models.md)
|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅| ✅ |128K |
|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|❌| ✅ | 128K |
|ERNIE-4.5-VL-424B-A47B | BF16/WINT4/WINT8 | WIP | ✅ | WIP | ❌ | WIP |128K |
|ERNIE-4.5-VL-28B-A3B | BF16/WINT4/WINT8 | ❌ | ✅ | WIP | ❌ | WIP |128K |
|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ✅ | ✅|128K |
|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ✅ | ✅ | ✅ | ❌ | ✅|128K |
|ERNIE-4.5-0.3B | BF16/WINT8/FP8 | ✅ | ✅ | ✅ | ❌ | ✅| 128K |
## 进阶用法 ## 进阶用法

View File

@@ -1,6 +1,6 @@
FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0 FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.2.0
ARG PADDLE_VERSION=3.1.1 ARG PADDLE_VERSION=3.2.0
ARG FD_VERSION=2.1.0 ARG FD_VERSION=2.2.0
ENV DEBIAN_FRONTEND=noninteractive ENV DEBIAN_FRONTEND=noninteractive

View File

@@ -2,11 +2,13 @@ site_name: 'FastDeploy : Large Language Model Deployement'
repo_url: https://github.com/PaddlePaddle/FastDeploy repo_url: https://github.com/PaddlePaddle/FastDeploy
repo_name: FastDeploy repo_name: FastDeploy
copyright: Copyright © 2025 Maintained by FastDeploy
theme: theme:
name: material name: material
highlightjs: true highlightjs: true
icon: favicon: assets/images/favicon.ico
repo: fontawesome/brands/github logo: assets/images/logo.jpg
palette: palette:
- media: "(prefers-color-scheme: light)" # 浅色 - media: "(prefers-color-scheme: light)" # 浅色
scheme: default scheme: default
@@ -50,14 +52,17 @@ plugins:
HYGON DCU: 海光 DCU HYGON DCU: 海光 DCU
Enflame S60: 燧原 S60 Enflame S60: 燧原 S60
Iluvatar CoreX: 天数 CoreX Iluvatar CoreX: 天数 CoreX
Metax C550: 沐曦 C550
Quick Deployment For ERNIE-4.5-0.3B: ERNIE-4.5-0.3B快速部署 Quick Deployment For ERNIE-4.5-0.3B: ERNIE-4.5-0.3B快速部署
Quick Deployment for ERNIE-4.5-VL-28B-A3B: ERNIE-4.5-VL-28B-A3B快速部署 Quick Deployment for ERNIE-4.5-VL-28B-A3B: ERNIE-4.5-VL-28B-A3B快速部署
ERNIE-4.5-300B-A47B: ERNIE-4.5-300B-A47B快速部署 ERNIE-4.5-300B-A47B: ERNIE-4.5-300B-A47B快速部署
ERNIE-4.5-VL-424B-A47B: ERNIE-4.5-VL-424B-A47B快速部署 ERNIE-4.5-VL-424B-A47B: ERNIE-4.5-VL-424B-A47B快速部署
Quick Deployment For QWEN: Qwen3-0.6b快速部署
Online Serving: 在线服务 Online Serving: 在线服务
OpenAI-Compitable API Server: 兼容 OpenAI 协议的服务化部署 OpenAI-Compitable API Server: 兼容 OpenAI 协议的服务化部署
Monitor Metrics: 监控Metrics Monitor Metrics: 监控Metrics
Scheduler: 调度器 Scheduler: 调度器
Graceful Shutdown: 服务优雅关闭
Offline Inference: 离线推理 Offline Inference: 离线推理
Best Practices: 最佳实践 Best Practices: 最佳实践
ERNIE-4.5-0.3B: ERNIE-4.5-0.3B ERNIE-4.5-0.3B: ERNIE-4.5-0.3B
@@ -83,6 +88,8 @@ plugins:
Sampling: 采样策略 Sampling: 采样策略
MultiNode Deployment: 多机部署 MultiNode Deployment: 多机部署
Graph Optimization: 图优化 Graph Optimization: 图优化
Data Parallelism: 数据并行
PLAS: PLAS
Supported Models: 支持模型列表 Supported Models: 支持模型列表
Benchmark: 基准测试 Benchmark: 基准测试
Usage: 用法 Usage: 用法
@@ -91,23 +98,26 @@ plugins:
Environment Variables: 环境变量 Environment Variables: 环境变量
nav: nav:
- 'FastDeploy': index.md - FastDeploy: index.md
- 'Quick Start': - Quick Start:
- Installation: - Installation:
- 'Nvidia GPU': get_started/installation/nvidia_gpu.md - Nvidia GPU: get_started/installation/nvidia_gpu.md
- 'KunlunXin XPU': get_started/installation/kunlunxin_xpu.md - KunlunXin XPU: get_started/installation/kunlunxin_xpu.md
- 'HYGON DCU': get_started/installation/hygon_dcu.md - HYGON DCU: get_started/installation/hygon_dcu.md
- 'Enflame S60': get_started/installation/Enflame_gcu.md - Enflame S60: get_started/installation/Enflame_gcu.md
- 'Iluvatar CoreX': get_started/installation/iluvatar_gpu.md - Iluvatar CoreX: get_started/installation/iluvatar_gpu.md
- 'Quick Deployment For ERNIE-4.5-0.3B': get_started/quick_start.md - Metax C550: get_started/installation/metax_gpu.md
- 'Quick Deployment for ERNIE-4.5-VL-28B-A3B': get_started/quick_start_vl.md - Quick Deployment For ERNIE-4.5-0.3B: get_started/quick_start.md
- 'ERNIE-4.5-300B-A47B': get_started/ernie-4.5.md - Quick Deployment for ERNIE-4.5-VL-28B-A3B: get_started/quick_start_vl.md
- 'ERNIE-4.5-VL-424B-A47B': get_started/ernie-4.5-vl.md - ERNIE-4.5-300B-A47B: get_started/ernie-4.5.md
- 'Online Serving': - ERNIE-4.5-VL-424B-A47B: get_started/ernie-4.5-vl.md
- 'OpenAI-Compitable API Server': online_serving/README.md - Quick Deployment For QWEN: get_started/quick_start_qwen.md
- 'Monitor Metrics': online_serving/metrics.md - Online Serving:
- 'Scheduler': online_serving/scheduler.md - OpenAI-Compitable API Server: online_serving/README.md
- 'Offline Inference': offline_inference.md - Monitor Metrics: online_serving/metrics.md
- Scheduler: online_serving/scheduler.md
- Graceful Shutdown: online_serving/graceful_shutdown_service.md
- Offline Inference: offline_inference.md
- Best Practices: - Best Practices:
- ERNIE-4.5-0.3B: best_practices/ERNIE-4.5-0.3B-Paddle.md - ERNIE-4.5-0.3B: best_practices/ERNIE-4.5-0.3B-Paddle.md
- ERNIE-4.5-21B-A3B: best_practices/ERNIE-4.5-21B-A3B-Paddle.md - ERNIE-4.5-21B-A3B: best_practices/ERNIE-4.5-21B-A3B-Paddle.md
@@ -116,26 +126,27 @@ nav:
- ERNIE-4.5-VL-424B-A47B: best_practices/ERNIE-4.5-VL-424B-A47B-Paddle.md - ERNIE-4.5-VL-424B-A47B: best_practices/ERNIE-4.5-VL-424B-A47B-Paddle.md
- FAQ: best_practices/FAQ.md - FAQ: best_practices/FAQ.md
- Quantization: - Quantization:
- 'Overview': quantization/README.md - Overview: quantization/README.md
- 'Online Quantization': quantization/online_quantization.md - Online Quantization: quantization/online_quantization.md
- 'WINT2 Quantization': quantization/wint2.md - WINT2 Quantization: quantization/wint2.md
- Features: - Features:
- 'Prefix Caching': features/prefix_caching.md - Prefix Caching: features/prefix_caching.md
- 'Disaggregation': features/disaggregated.md - Disaggregation: features/disaggregated.md
- 'Chunked Prefill': features/chunked_prefill.md - Chunked Prefill: features/chunked_prefill.md
- 'Load Balance': features/load_balance.md - Load Balance: features/load_balance.md
- 'Speculative Decoding': features/speculative_decoding.md - Speculative Decoding: features/speculative_decoding.md
- 'Structured Outputs': features/structured_outputs.md - Structured Outputs: features/structured_outputs.md
- 'Reasoning Output': features/reasoning_output.md - Reasoning Output: features/reasoning_output.md
- 'Early Stop': features/early_stop.md - Early Stop: features/early_stop.md
- 'Plugins': features/plugins.md - Plugins: features/plugins.md
- 'Sampling': features/sampling.md - Sampling: features/sampling.md
- 'MultiNode Deployment': features/multi-node_deployment.md - MultiNode Deployment: features/multi-node_deployment.md
- 'Graph Optimization': features/graph_optimization.md - Graph Optimization: features/graph_optimization.md
- 'Supported Models': supported_models.md - Data Parallelism: features/data_parallel_service.md
- PLAS: features/plas_attention.md
- Supported Models: supported_models.md
- Benchmark: benchmark.md - Benchmark: benchmark.md
- Usage: - Usage:
- 'Log Description': usage/log.md - Log Description: usage/log.md
- 'Code Overview': usage/code_overview.md - Code Overview: usage/code_overview.md
- 'Environment Variables': usage/environment_variables.md - Environment Variables: usage/environment_variables.md
- 'FastDeploy Unit Test Guide': usage/fastdeploy_unit_test_guide.md