diff --git a/README.md b/README.md index f18ca3da9..0c20629ff 100644 --- a/README.md +++ b/README.md @@ -23,13 +23,11 @@ English | [简体中文](README_CN.md)

-------------------------------------------------------------------------------- -# FastDeploy 2.1: Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle +# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle ## News **[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine. -**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务,即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) - **[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728) **[2025-06] 🔥 Released FastDeploy v2.0:** Supports inference and deployment for ERNIE 4.5. Furthermore, we open-source an industrial-grade PD disaggregation with context caching, dynamic role switching for effective resource utilization to further enhance inference performance for MoE models. @@ -52,14 +50,15 @@ English | [简体中文](README_CN.md) ## Installation -FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**, **Iluvatar GPUs**, **Enflame GCUs**, and other hardware. For detailed installation instructions: +FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**, **Iluvatar GPUs**, **Enflame GCUs**, **Hygon DCUs** and other hardware. For detailed installation instructions: - [NVIDIA GPU](./docs/get_started/installation/nvidia_gpu.md) - [Kunlunxin XPU](./docs/get_started/installation/kunlunxin_xpu.md) - [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md) - [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md) +- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md) -**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU, Hygon DCU, and MetaX GPU are currently under development and testing. Stay tuned for updates! +**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU and MetaX GPU are currently under development and testing. Stay tuned for updates! ## Get Started diff --git a/README_CN.md b/README_CN.md index 7afca5fb8..6cebc527a 100644 --- a/README_CN.md +++ b/README_CN.md @@ -23,7 +23,7 @@

-------------------------------------------------------------------------------- -# FastDeploy 2.1:基于飞桨的大语言模型与视觉语言模型推理部署工具包 +# FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包 ## 最新活动 **[2025-08] 🔥 FastDeploy v2.1 全新发布:** 全新的KV Cache调度策略,更多模型支持PD分离和CUDA Graph,昆仑、海光等更多硬件支持增强,全方面优化服务和推理引擎的性能。 @@ -48,14 +48,15 @@ ## 安装 -FastDeploy 支持在**英伟达(NVIDIA)GPU**、**昆仑芯(Kunlunxin)XPU**、**天数(Iluvatar)GPU**、**燧原(Enflame)GCU** 以及其他硬件上进行推理部署。详细安装说明如下: +FastDeploy 支持在**英伟达(NVIDIA)GPU**、**昆仑芯(Kunlunxin)XPU**、**天数(Iluvatar)GPU**、**燧原(Enflame)GCU**、**海光(Hygon)DCU** 以及其他硬件上进行推理部署。详细安装说明如下: - [英伟达 GPU](./docs/zh/get_started/installation/nvidia_gpu.md) - [昆仑芯 XPU](./docs/zh/get_started/installation/kunlunxin_xpu.md) - [天数 CoreX](./docs/zh/get_started/installation/iluvatar_gpu.md) - [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md) +- [海光 DCU](./docs/zh/get_started/installation/hygon_dcu.md) -**注意:** 我们正在积极拓展硬件支持范围。目前,包括昇腾(Ascend)NPU、海光(Hygon)DCU 和摩尔线程(MetaX)GPU 在内的其他硬件平台正在开发测试中。敬请关注更新! +**注意:** 我们正在积极拓展硬件支持范围。目前,包括昇腾(Ascend)NPU 和 沐曦(MetaX)GPU 在内的其他硬件平台正在开发测试中。敬请关注更新! ## 入门指南 diff --git a/dockerfiles/Dockerfile.gpu b/dockerfiles/Dockerfile.gpu index 9e1d97834..6a31156ff 100644 --- a/dockerfiles/Dockerfile.gpu +++ b/dockerfiles/Dockerfile.gpu @@ -1,4 +1,4 @@ -FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.0.0 +FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0 ARG PADDLE_VERSION=3.1.1 ARG FD_VERSION=2.1.0 diff --git a/docs/get_started/installation/nvidia_gpu.md b/docs/get_started/installation/nvidia_gpu.md index 8cf69d5d9..381a2de0a 100644 --- a/docs/get_started/installation/nvidia_gpu.md +++ b/docs/get_started/installation/nvidia_gpu.md @@ -13,7 +13,7 @@ The following installation methods are available when your environment meets the **Notice**: The pre-built image only supports SM80/90 GPU(e.g. H800/A800),if you are deploying on SM86/89GPU(L40/4090/L20), please reinstall ```fastdpeloy-gpu``` after you create the container. ```shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.0.0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0 ``` ## 2. Pre-built Pip Installation diff --git a/docs/index.md b/docs/index.md index 1149811ac..b1e3c336f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,12 +13,12 @@ | Model | Data Type | PD Disaggregation | Chunked Prefill | Prefix Caching | MTP | CUDA Graph | Maximum Context Length | |:--- | :------- | :---------- | :-------- | :-------- | :----- | :----- | :----- | -|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅(WINT4)| WIP |128K | -|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|✅(WINT4)| WIP | 128K | +|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅| WIP |128K | +|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|❌| WIP | 128K | |ERNIE-4.5-VL-424B-A47B | BF16/WINT4/WINT8 | WIP | ✅ | WIP | ❌ | WIP |128K | |ERNIE-4.5-VL-28B-A3B | BF16/WINT4/WINT8 | ❌ | ✅ | WIP | ❌ | WIP |128K | -|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | WIP | ✅|128K | -|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | WIP | ✅|128K | +|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ✅ | ✅|128K | +|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ❌ | ✅|128K | |ERNIE-4.5-0.3B | BF16/WINT8/FP8 | ❌ | ✅ | ✅ | ❌ | ✅| 128K | ## Documentation diff --git a/docs/zh/get_started/installation/nvidia_gpu.md b/docs/zh/get_started/installation/nvidia_gpu.md index 5744a1892..a370a4589 100644 --- a/docs/zh/get_started/installation/nvidia_gpu.md +++ b/docs/zh/get_started/installation/nvidia_gpu.md @@ -15,7 +15,7 @@ **注意**: 如下镜像仅支持SM 80/90架构GPU(A800/H800等),如果你是在L20/L40/4090等SM 86/69架构的GPU上部署,请在创建容器后,卸载```fastdeploy-gpu```再重新安装如下文档指定支持86/89架构的`fastdeploy-gpu`包。 ``` shell -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.0.0 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0 ``` ## 2. 预编译Pip安装 diff --git a/docs/zh/index.md b/docs/zh/index.md index 312b3aed9..73bf10fa9 100644 --- a/docs/zh/index.md +++ b/docs/zh/index.md @@ -13,12 +13,12 @@ | Model | Data Type | PD Disaggregation | Chunked Prefill | Prefix Caching | MTP | CUDA Graph | Maximum Context Length | |:--- | :------- | :---------- | :-------- | :-------- | :----- | :----- | :----- | -|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅(WINT4)| WIP |128K | -|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|✅(WINT4)| WIP | 128K | +|ERNIE-4.5-300B-A47B | BF16/WINT4/WINT8/W4A8C8/WINT2/FP8 | ✅| ✅ | ✅|✅| WIP |128K | +|ERNIE-4.5-300B-A47B-Base| BF16/WINT4/WINT8 | ✅| ✅ | ✅|❌| WIP | 128K | |ERNIE-4.5-VL-424B-A47B | BF16/WINT4/WINT8 | WIP | ✅ | WIP | ❌ | WIP |128K | |ERNIE-4.5-VL-28B-A3B | BF16/WINT4/WINT8 | ❌ | ✅ | WIP | ❌ | WIP |128K | -|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | WIP | ✅|128K | -|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | WIP | ✅|128K | +|ERNIE-4.5-21B-A3B | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ✅ | ✅|128K | +|ERNIE-4.5-21B-A3B-Base | BF16/WINT4/WINT8/FP8 | ❌ | ✅ | ✅ | ❌ | ✅|128K | |ERNIE-4.5-0.3B | BF16/WINT8/FP8 | ❌ | ✅ | ✅ | ❌ | ✅| 128K | ## 文档说明 diff --git a/mkdocs.yml b/mkdocs.yml index ce24a3ff5..443659f6d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,4 +1,4 @@ -site_name: 'FastDeploy 2.0: Large Language Model Deployement' +site_name: 'FastDeploy : Large Language Model Deployement' repo_url: https://github.com/PaddlePaddle/FastDeploy repo_name: FastDeploy @@ -34,14 +34,14 @@ plugins: - locale: en default: true name: English - site_name: 'FastDeploy 2.0: Large Language Model Deployement' + site_name: 'FastDeploy: Large Language Model Deployement' build: true - locale: zh name: 简体中文 site_name: 飞桨大语言模型推理部署工具包 - link: /zh/ + link: /./zh/ nav_translations: - FastDeploy 2.0: FastDeploy 2.0 + FastDeploy: FastDeploy Quick Start: 快速入门 Installation: 安装 Nvidia GPU: 英伟达 GPU @@ -58,7 +58,7 @@ plugins: Monitor Metrics: 监控Metrics Scheduler: 调度器 Offline Inference: 离线推理 - Optimal Deployment: 最佳实践 + Best Practices: 最佳实践 ERNIE-4.5-0.3B: ERNIE-4.5-0.3B ERNIE-4.5-21B-A3B: ERNIE-4.5-21B-A3B ERNIE-4.5-300B-A47B: ERNIE-4.5-300B-A47B @@ -89,7 +89,7 @@ plugins: Environment Variables: 环境变量 nav: - - 'FastDeploy 2.0': index.md + - 'FastDeploy': index.md - 'Quick Start': - Installation: - 'Nvidia GPU': get_started/installation/nvidia_gpu.md @@ -106,7 +106,7 @@ nav: - 'Monitor Metrics': online_serving/metrics.md - 'Scheduler': online_serving/scheduler.md - 'Offline Inference': offline_inference.md - - Optimal Deployment: + - Best Practices: - ERNIE-4.5-0.3B: best_practices/ERNIE-4.5-0.3B-Paddle.md - ERNIE-4.5-21B-A3B: best_practices/ERNIE-4.5-21B-A3B-Paddle.md - ERNIE-4.5-300B-A47B: best_practices/ERNIE-4.5-300B-A47B-Paddle.md @@ -135,4 +135,3 @@ nav: - 'Log Description': usage/log.md - 'Code Overview': usage/code_overview.md - 'Environment Variables': usage/environment_variables.md -