update doc (#3990)

Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0214.tjdm.baidu.com>
2025-12-24 13:28:13 +08:00 · 2025-09-08 21:04:26 +08:00
parent d00faeec69
commit 08b3153661
7 changed files with 12 additions and 50 deletions
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Learn how to use FastDeploy through our documentation:

 ## Supported Models

-Learn how to download models, enable support for Torch weights, and calculate minimum resource requirements, and more:
+Learn how to download models, enable using the torch format, and more:
 - [Full Supported Models List](./docs/supported_models.md)

 ## Advanced Usage
--- a/README_CN.md
+++ b/README_CN.md
@@ -71,7 +71,7 @@ FastDeploy 支持在**英伟达（NVIDIA）GPU**、**昆仑芯（Kunlunxin）XPU

 ## 支持模型列表

-通过我们的文档了解如何下载模型，如何支持Torch 权重，如何计算最小资源部署等：
+通过我们的文档了解如何下载模型，如何支持torch格式等：
 - [模型支持列表](./docs/zh/supported_models.md)

 ## 进阶用法
--- a/docs/get_started/installation/nvidia_gpu.md
+++ b/docs/get_started/installation/nvidia_gpu.md
@@ -13,14 +13,14 @@ The following installation methods are available when your environment meets the
 **Notice**: The pre-built image only supports SM80/90 GPU(e.g. H800/A800)，if you are deploying on SM86/89GPU(L40/4090/L20), please reinstall ```fastdpeloy-gpu``` after you create the container.

 ```shell
-docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.1
+docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.2.0
 ```

 ## 2. Pre-built Pip Installation

 First install paddlepaddle-gpu. For detailed instructions, refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html)
 ```shell
-python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
 ```

 Then install fastdeploy. **Do not install from PyPI**. Use the following methods instead:
@@ -58,7 +58,7 @@ docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu .

 First install paddlepaddle-gpu. For detailed instructions, refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html)
 ```shell
-python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
 ```

 Then clone the source code and build:
--- a/docs/supported_models.md
+++ b/docs/supported_models.md
@@ -49,25 +49,4 @@ These models accept multi-modal inputs (e.g., images and text).
 | ERNIE-VL  |BF16/WINT4/WINT8| baidu/ERNIE-4.5-VL-424B-A47B-Paddle<br>&emsp;[quick start](./get_started/ernie-4.5-vl.md) &emsp; [best practice](./best_practices/ERNIE-4.5-VL-424B-A47B-Paddle.md) ;<br>baidu/ERNIE-4.5-VL-28B-A3B-Paddle<br>&emsp;[quick start](./get_started/quick_start_vl.md) &emsp; [best practice](./best_practices/ERNIE-4.5-VL-28B-A3B-Paddle.md) ;|
 | QWEN-VL  |BF16/WINT4/FP8| Qwen/Qwen2.5-VL-72B-Instruct;<br>Qwen/Qwen2.5-VL-32B-Instruct;<br>Qwen/Qwen2.5-VL-7B-Instruct;<br>Qwen/Qwen2.5-VL-3B-Instruct|

-## Minimum Resource Deployment Instruction
-
-There is no universal formula for minimum deployment resources; it depends on both context length and quantization method. We recommend estimating the required GPU memory using the following formula:
-```
-Required GPU Memory = Number of Parameters × Quantization Byte factor
-```
-> (The factor list is provided below.)
-
-And the final number of GPUs depends on:
-```
-Number of GPUs = Total Memory Requirement ÷ Memory per GPU
-```
-
-| Quantization Method | Bytes per Parameter factor |
-| :---      | :---      |
-|BF16       |2          |
-|FP8        |1          |
-|WINT8      |1          |
-|WINT4      |0.5        |
-|W4A8C8     |0.5        |
-
 More models are being supported. You can submit requests for new model support via [Github Issues](https://github.com/PaddlePaddle/FastDeploy/issues).
--- a/docs/zh/get_started/installation/nvidia_gpu.md
+++ b/docs/zh/get_started/installation/nvidia_gpu.md
@@ -15,7 +15,7 @@
 **注意**： 如下镜像仅支持SM 80/90架构GPU（A800/H800等），如果你是在L20/L40/4090等SM 86/69架构的GPU上部署，请在创建容器后，卸载```fastdeploy-gpu```再重新安装如下文档指定支持86/89架构的`fastdeploy-gpu`包。

 ``` shell
-docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.1
+docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.2.0
 ```

 ## 2. 预编译Pip安装
@@ -23,7 +23,7 @@ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12
 首先安装 paddlepaddle-gpu，详细安装方式参考 [PaddlePaddle安装](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html)

 ``` shell
-python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
 ```

 再安装 fastdeploy，**注意不要通过pypi源安装**，需要通过如下方式安装
@@ -64,7 +64,7 @@ docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu .
 首先安装 paddlepaddle-gpu，详细安装方式参考 [PaddlePaddle安装](https://www.paddlepaddle.org.cn/)

 ``` shell
-python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
 ```

 接着克隆源代码，编译安装
--- a/docs/zh/get_started/quick_start_qwen.md
+++ b/docs/zh/get_started/quick_start_qwen.md
@@ -7,13 +7,12 @@
 - CUDNN >= 9.5
 - Linux X86_64
 - Python >= 3.10
- 运行模型满足最低硬件配置要求，参考[支持模型列表文档](supported_models.md)

 为了快速在各类硬件部署，本文档采用 ```Qwen3-0.6b``` 模型作为示例，可在大部分硬件上完成部署。

 安装FastDeploy方式参考[安装文档](./installation/README.md)。
 ## 1. 启动服务
-安装FastDeploy后，在终端执行如下命令，启动服务，其中启动命令配置方式参考[参数说明](parameters.md)
+安装FastDeploy后，在终端执行如下命令，启动服务，其中启动命令配置方式参考[参数说明](../parameters.md)

 > ⚠️ **注意:**
 > 当使用HuggingFace 模型(torch格式)时, 需要开启 `--load_choices "default_v1"`
@@ -30,14 +29,14 @@ python -m fastdeploy.entrypoints.openai.api_server \
       --load_choices "default_v1"
 ```

->💡 注意：在 ```--model``` 指定的路径中，若当前目录下不存在该路径对应的子目录，则会尝试根据指定的模型名称（如 ```Qwen/Qwen3-0.6B```）查询AIStudio是否存在预置模型，若存在，则自动启动下载。默认的下载路径为：```~/xx```。关于模型自动下载的说明和配置参阅[模型下载](supported_models.md)。
+>💡 注意：在 ```--model``` 指定的路径中，若当前目录下不存在该路径对应的子目录，则会尝试根据指定的模型名称（如 ```Qwen/Qwen3-0.6B```）查询AIStudio是否存在预置模型，若存在，则自动启动下载。默认的下载路径为：```~/xx```。关于模型自动下载的说明和配置参阅[模型下载](../supported_models.md)。
 ```--max-model-len``` 表示当前部署的服务所支持的最长Token数量。
 ```--max-num-seqs``` 表示当前部署的服务所支持的最大并发处理数量。

 **相关文档**

- [服务部署配置](online_serving/README.md)
- [服务监控metrics](online_serving/metrics.md)
+- [服务部署配置](../online_serving/README.md)
+- [服务监控metrics](../online_serving/metrics.md)

 ## 2. 用户发起服务请求

@@ -92,6 +91,3 @@ for chunk in response:
        print(chunk.choices[0].delta.content, end='')
 print('\n')
 ```
-📌
-⚙️
-✕
--- a/docs/zh/supported_models.md
+++ b/docs/zh/supported_models.md
@@ -47,17 +47,4 @@ python -m fastdeploy.entrypoints.openai.api_server \
 | ERNIE-VL  |BF16/WINT4/WINT8| baidu/ERNIE-4.5-VL-424B-A47B-Paddle<br>&emsp;[快速部署](./get_started/ernie-4.5-vl.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-VL-424B-A47B-Paddle.md) ;<br>baidu/ERNIE-4.5-VL-28B-A3B-Paddle<br>&emsp;[快速部署](./get_started/quick_start_vl.md) &emsp; [最佳实践](./best_practices/ERNIE-4.5-VL-28B-A3B-Paddle.md) ;|
 | QWEN-VL  |BF16/WINT4/FP8| Qwen/Qwen2.5-VL-72B-Instruct;<br>Qwen/Qwen2.5-VL-32B-Instruct;<br>Qwen/Qwen2.5-VL-7B-Instruct;<br>Qwen/Qwen2.5-VL-3B-Instruct|

-## 最小资源部署说明
-
-最小部署资源没有普适公式，需要根据上下文长度 和 量化方式
-我们推荐计算显存需求 = 参数量 × 量化方式字节系数（系数列表如下），最终 GPU 数量取决于 总显存需求 ÷ 单卡显存
-
-|量化方式   |对应每参数字节系数 |
-| :---      | :---      |
-|BF16       |2          |
-|FP8        |1          |
-|WINT8      |1          |
-|WINT4      |0.5        |
-|W4A8C8     |0.5        |
-
 更多模型同步支持中，你可以通过[Github Issues](https://github.com/PaddlePaddle/FastDeploy/issues)向我们提交新模型的支持需求。