diff --git a/.new_docs/cn/quantize.md b/.new_docs/cn/quantize.md index eb626c6e6..5411cac29 100644 --- a/.new_docs/cn/quantize.md +++ b/.new_docs/cn/quantize.md @@ -1,11 +1,67 @@ [English](../en/quantize.md) | 简体中文 # 量化加速 +量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度. +FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速. -简要介绍量化加速的原理。 -目前量化支持在哪些硬件及后端的使用 +## FastDeploy 多个引擎和硬件支持量化模型部署 +当前,FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下: + +| 硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT | +| :-----------| :-------- | :--------------- | :------- | +| CPU | 支持 | 支持 | | +| GPU | | | 支持 | + + +## 模型量化 + +### 量化方法 +基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署. + +两种方法的主要对比如下表所示: +| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 | +| :-----------| :--------| :-------| :------- | :------- | +| 离线量化 | 无需训练,耗时短 | 比量化蒸馏训练稍低 | 两者一致 | 两者一致 | +| 量化蒸馏训练 | 需要训练,耗时稍高 | 较未量化模型有少量损失 | 两者一致 |两者一致 | + +### 用户使用FastDeploy一键模型量化工具来量化模型 +Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化. +- [FastDeploy 一键模型量化](../../tools/quantization/) +当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型. + ## 量化示例 +目前, FastDeploy已支持的模型量化如下表所示: -这里一个表格,展示目前支持的量化列表(跳转到相应的example下去),精度、性能 +### YOLO 系列 +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 | 量化蒸馏训练 | +| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | ONNX Runtime | CPU | 183.68 | 100.39 | 1.83 | 37.6 | 33.1 |量化蒸馏训练 | +| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 | 量化蒸馏训练 | +| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6|量化蒸馏训练 | +| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | ONNX Runtime | CPU | 345.85 | 131.81 | 2.60 |42.5| 36.1|量化蒸馏训练 | +| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2|量化蒸馏训练 | +| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8|量化蒸馏训练 | +| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | ONNX Runtime | CPU | 971.27 | 471.88 | 2.06 | 51.1 | 42.5|量化蒸馏训练 | +| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3|量化蒸馏训练 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试数据为COCO2017验证集中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + + +### PaddleClas系列 +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 86.87 | 59 .32 | 1.46 | 79.12 | 78.87| 离线量化| +| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 7.85 | 5.42 | 1.45 | 79.12 | 79.06 | 离线量化 | +| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 40.32 | 16.87 | 2.39 |77.89 | 75.09 |离线量化 | +| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 5.10 | 3.35 | 1.52 |77.89 | 76.86 | 离线量化 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试数据为ImageNet-2012验证集中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. diff --git a/examples/vision/classification/paddleclas/quantize/README.md b/examples/vision/classification/paddleclas/quantize/README.md new file mode 100644 index 000000000..3a100a823 --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/README.md @@ -0,0 +1,27 @@ +# PaddleClas 量化模型部署 +FastDeploy已支持部署量化模型,并提供一键模型量化的工具. +用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署. + +## FastDeploy一键模型量化工具 +FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化. +详细教程请见: [一键模型量化工具](../../../../../tools/quantization/) +注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可。 + +## 下载量化完成的PaddleClas模型 +用户也可以直接下载下表中的量化模型进行部署. +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | ONNX Runtime | CPU | 86.87 | 59 .32 | 1.46 | 79.12 | 78.87| 离线量化| +| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | TensorRT | GPU | 7.85 | 5.42 | 1.45 | 79.12 | 79.06 | 离线量化 | +| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | ONNX Runtime | CPU | 40.32 | 16.87 | 2.39 |77.89 | 75.09 |离线量化 | +| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | TensorRT | GPU | 5.10 | 3.35 | 1.52 |77.89 | 76.86 | 离线量化 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试图片为ImageNet-2012验证集中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + +## 详细部署文档 + +- [Python部署](python) +- [C++部署](cpp) diff --git a/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt b/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt new file mode 100644 index 000000000..fea1a2888 --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt @@ -0,0 +1,14 @@ +PROJECT(infer_demo C CXX) +CMAKE_MINIMUM_REQUIRED (VERSION 3.12) + +# 指定下载解压后的fastdeploy库路径 +option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") + +include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) + +# 添加FastDeploy依赖头文件 +include_directories(${FASTDEPLOY_INCS}) + +add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc) +# 添加FastDeploy库依赖 +target_link_libraries(infer_demo ${FASTDEPLOY_LIBS}) diff --git a/examples/vision/classification/paddleclas/quantize/cpp/README.md b/examples/vision/classification/paddleclas/quantize/cpp/README.md new file mode 100644 index 000000000..2c6c9b73e --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/cpp/README.md @@ -0,0 +1,33 @@ +# PaddleClas 量化模型 Python部署示例 +本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.) + +## 以量化后的ResNet50_Vd模型为例, 进行部署 +在本目录执行如下命令即可完成编译,以及量化模型部署. +```bash +mkdir build +cd build +wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz +tar xvf fastdeploy-linux-x64-0.2.0.tgz +cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0 +make -j + +#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar +tar -xvf resnet50_vd_ptq.tar +wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg + + +# 在CPU上使用Paddle-Inference推理量化模型 +./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 0 +# 在GPU上使用TensorRT推理量化模型 +./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 1 +``` diff --git a/examples/vision/classification/paddleclas/quantize/cpp/infer.cc b/examples/vision/classification/paddleclas/quantize/cpp/infer.cc new file mode 100644 index 000000000..ed4f05a24 --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/cpp/infer.cc @@ -0,0 +1,76 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "fastdeploy/vision.h" +#ifdef WIN32 +const char sep = '\\'; +#else +const char sep = '/'; +#endif + +void InitAndInfer(const std::string& model_dir, const std::string& image_file, + const fastdeploy::RuntimeOption& option) { + auto model_file = model_dir + sep + "inference.pdmodel"; + auto params_file = model_dir + sep + "inference.pdiparams"; + auto config_file = model_dir + sep + "inference_cls.yaml"; + + auto model = fastdeploy::vision::classification::PaddleClasModel( + model_file, params_file, config_file, option); + + assert(model.Initialized()); + + auto im = cv::imread(image_file); + auto im_bak = im.clone(); + + fastdeploy::vision::ClassifyResult res; + if (!model.Predict(&im, &res)) { + std::cerr << "Failed to predict." << std::endl; + return; + } + + std::cout << res.Str() << std::endl; + +} + +int main(int argc, char* argv[]) { + if (argc < 4) { + std::cout << "Usage: infer_demo path/to/quant_model " + "path/to/image " + "run_option, " + "e.g ./infer_demo ./ResNet50_vd_quant ./test.jpeg 0" + << std::endl; + std::cout << "The data type of run_option is int, 0: run on cpu with ORT " + "backend; 1: run " + "on gpu with TensorRT backend. " + << std::endl; + return -1; + } + + fastdeploy::RuntimeOption option; + int flag = std::atoi(argv[3]); + + if (flag == 0) { + option.UseCpu(); + option.UseOrtBackend(); + } else if (flag == 1) { + option.UseGpu(); + option.UseTrtBackend(); + option.SetTrtInputShape("inputs",{1, 3, 224, 224}); + } + + std::string model_dir = argv[1]; + std::string test_image = argv[2]; + InitAndInfer(model_dir, test_image, option); + return 0; +} diff --git a/examples/vision/classification/paddleclas/quantize/python/README.md b/examples/vision/classification/paddleclas/quantize/python/README.md new file mode 100644 index 000000000..88730f5df --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/python/README.md @@ -0,0 +1,29 @@ +# PaddleClas 量化模型 Python部署示例 +本目录下提供的`infer.py`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.) + + +## 以量化后的ResNet50_Vd模型为例, 进行部署 +```bash +#下载部署示例代码 +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd examples/vision/classification/paddleclas/quantize/python + +#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar +tar -xvf resnet50_vd_ptq.tar +wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg + +# 在CPU上使用Paddle-Inference推理量化模型 +python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device cpu --backend ort +# 在GPU上使用TensorRT推理量化模型 +python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device gpu --backend trt +``` diff --git a/examples/vision/classification/paddleclas/quantize/python/infer.py b/examples/vision/classification/paddleclas/quantize/python/infer.py new file mode 100644 index 000000000..0a4df1768 --- /dev/null +++ b/examples/vision/classification/paddleclas/quantize/python/infer.py @@ -0,0 +1,77 @@ +import fastdeploy as fd +import cv2 +import os +import time + + +def parse_arguments(): + import argparse + import ast + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", required=True, help="Path of paddleclas model.") + parser.add_argument( + "--image", required=True, help="Path of test image file.") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Type of inference device, support 'cpu' or 'gpu'.") + parser.add_argument( + "--backend", + type=str, + default="default", + help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu" + ) + parser.add_argument( + "--device_id", + type=int, + default=0, + help="Define which GPU card used to run model.") + parser.add_argument( + "--cpu_thread_num", + type=int, + default=9, + help="Number of threads while inference on CPU.") + return parser.parse_args() + + +def build_option(args): + option = fd.RuntimeOption() + if args.device.lower() == "gpu": + option.use_gpu(0) + + option.set_cpu_thread_num(args.cpu_thread_num) + + if args.backend.lower() == "trt": + assert args.device.lower( + ) == "gpu", "TensorRT backend require inferences on device GPU." + option.use_trt_backend() + option.set_trt_input_shape("inputs", min_shape=[1, 3, 224, 224]) + elif args.backend.lower() == "ort": + option.use_ort_backend() + elif args.backend.lower() == "paddle": + option.use_paddle_backend() + elif args.backend.lower() == "openvino": + assert args.device.lower( + ) == "cpu", "OpenVINO backend require inference on device CPU." + option.use_openvino_backend() + return option + + +args = parse_arguments() + +# 配置runtime,加载模型 +runtime_option = build_option(args) + +model_file = os.path.join(args.model, "inference.pdmodel") +params_file = os.path.join(args.model, "inference.pdiparams") +config_file = os.path.join(args.model, "inference_cls.yaml") + +model = fd.vision.classification.PaddleClasModel( + model_file, params_file, config_file, runtime_option=runtime_option) + +# 预测图片检测结果 +im = cv2.imread(args.image) +result = model.predict(im.copy()) +print(result) diff --git a/examples/vision/detection/paddledetection/quantize/README.md b/examples/vision/detection/paddledetection/quantize/README.md new file mode 100644 index 000000000..f3e87e70d --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/README.md @@ -0,0 +1,24 @@ +# PaddleDetection 量化模型部署 +FastDeploy已支持部署量化模型,并提供一键模型量化的工具. +用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署. + +## FastDeploy一键模型量化工具 +FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化. +详细教程请见: [一键模型量化工具](../../../../../tools/quantization/) + +## 下载量化完成的PP-YOLOE-l模型 +用户也可以直接下载下表中的量化模型进行部署. +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar ) | TensorRT | GPU | 43.83 | 31.57 | 1.39 | 51.4 | 50.7 | 量化蒸馏训练 | +| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar ) | ONNX Runtime | CPU | 1085.18 | 475.55 | 2.29 |51.4 | 50.0 |量化蒸馏训练 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试图片为COCO val2017中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + +## 详细部署文档 + +- [Python部署](python) +- [C++部署](cpp) diff --git a/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt b/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt new file mode 100644 index 000000000..bd245c9ac --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt @@ -0,0 +1,13 @@ +PROJECT(infer_demo C CXX) +CMAKE_MINIMUM_REQUIRED (VERSION 3.10) + +# 指定下载解压后的fastdeploy库路径 +option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") + +include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) + +# 添加FastDeploy依赖头文件 +include_directories(${FASTDEPLOY_INCS}) + +add_executable(infer_ppyoloe_demo ${PROJECT_SOURCE_DIR}/infer_ppyoloe.cc) +target_link_libraries(infer_ppyoloe_demo ${FASTDEPLOY_LIBS}) diff --git a/examples/vision/detection/paddledetection/quantize/cpp/README.md b/examples/vision/detection/paddledetection/quantize/cpp/README.md new file mode 100644 index 000000000..43ccbd33d --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/cpp/README.md @@ -0,0 +1,33 @@ +# PP-YOLOE-l量化模型 C++部署示例 + +本目录下提供的`infer_ppyoloe.cc`,可以帮助用户快速完成PP-YOLOE-l量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.) + +## 以量化后的PP-YOLOE-l模型为例, 进行部署 +在本目录执行如下命令即可完成编译,以及量化模型部署. +```bash +mkdir build +cd build +wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz +tar xvf fastdeploy-linux-x64-0.2.0.tgz +cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0 +make -j + +#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar +tar -xvf ppyoloe_crn_l_300e_coco_qat.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +# 在CPU上使用ONNX Runtime推理量化模型 +./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 0 +# 在GPU上使用TensorRT推理量化模型 +./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 1 +``` diff --git a/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc b/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc new file mode 100644 index 000000000..9ed06b575 --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc @@ -0,0 +1,80 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "fastdeploy/vision.h" +#ifdef WIN32 +const char sep = '\\'; +#else +const char sep = '/'; +#endif + +void InitAndInfer(const std::string& model_dir, const std::string& image_file, + const fastdeploy::RuntimeOption& option) { + auto model_file = model_dir + sep + "model.pdmodel"; + auto params_file = model_dir + sep + "model.pdiparams"; + auto config_file = model_dir + sep + "infer_cfg.yml"; + + auto model = fastdeploy::vision::detection::PPYOLOE(model_file, params_file, + config_file, option); + assert(model.Initialized()); + + auto im = cv::imread(image_file); + auto im_bak = im.clone(); + + fastdeploy::vision::DetectionResult res; + if (!model.Predict(&im, &res)) { + std::cerr << "Failed to predict." << std::endl; + return; + } + + std::cout << res.Str() << std::endl; + + auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res, 0.5); + cv::imwrite("vis_result.jpg", vis_im); + std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; + +} + +int main(int argc, char* argv[]) { + if (argc < 4) { + std::cout << "Usage: infer_demo path/to/quant_model " + "path/to/image " + "run_option, " + "e.g ./infer_demo ./PPYOLOE_L_quant ./test.jpeg 0" + << std::endl; + std::cout << "The data type of run_option is int, 0: run on cpu with ORT " + "backend; 1: run " + "on gpu with TensorRT backend. " + << std::endl; + return -1; + } + + fastdeploy::RuntimeOption option; + int flag = std::atoi(argv[3]); + + if (flag == 0) { + option.UseCpu(); + option.UseOrtBackend(); + } else if (flag == 1) { + option.UseGpu(); + option.UseTrtBackend(); + option.SetTrtInputShape("inputs",{1, 3, 640, 640}); + option.SetTrtInputShape("scale_factor",{1,2}); + } + + std::string model_dir = argv[1]; + std::string test_image = argv[2]; + InitAndInfer(model_dir, test_image, option); + return 0; +} diff --git a/examples/vision/detection/paddledetection/quantize/python/README.md b/examples/vision/detection/paddledetection/quantize/python/README.md new file mode 100644 index 000000000..9df40f570 --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/python/README.md @@ -0,0 +1,29 @@ +# PP-YOLOE-l量化模型 Python部署示例 +本目录下提供的`infer.py`,可以帮助用户快速完成PP-YOLOE量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.) + + +## 以量化后的PP-YOLOE-l模型为例, 进行部署 +```bash +#下载部署示例代码 +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd /examples/vision/detection/paddledetection/quantize/python + +#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar +tar -xvf ppyoloe_crn_l_300e_coco_qat.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +# 在CPU上使用ONNX Runtime推理量化模型 +python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device cpu --backend ort +# 在GPU上使用TensorRT推理量化模型 +python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device gpu --backend trt +``` diff --git a/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py b/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py new file mode 100644 index 000000000..85f3c9d55 --- /dev/null +++ b/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py @@ -0,0 +1,82 @@ +import fastdeploy as fd +import cv2 +import os + + +def parse_arguments(): + import argparse + import ast + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", required=True, help="Path of PPYOLOE model.") + parser.add_argument( + "--image", required=True, help="Path of test image file.") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Type of inference device, support 'cpu' or 'gpu'.") + parser.add_argument( + "--backend", + type=str, + default="default", + help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu" + ) + parser.add_argument( + "--device_id", + type=int, + default=0, + help="Define which GPU card used to run model.") + parser.add_argument( + "--cpu_thread_num", + type=int, + default=9, + help="Number of threads while inference on CPU.") + return parser.parse_args() + + +def build_option(args): + option = fd.RuntimeOption() + if args.device.lower() == "gpu": + option.use_gpu(0) + + option.set_cpu_thread_num(args.cpu_thread_num) + + if args.backend.lower() == "trt": + assert args.device.lower( + ) == "gpu", "TensorRT backend require inferences on device GPU." + option.use_trt_backend() + option.set_trt_cache_file(os.path.join(args.model, "model.trt")) + option.set_trt_input_shape("image", min_shape=[1, 3, 640, 640]) + option.set_trt_input_shape("scale_factor", min_shape=[1, 2]) + elif args.backend.lower() == "ort": + option.use_ort_backend() + elif args.backend.lower() == "paddle": + option.use_paddle_backend() + elif args.backend.lower() == "openvino": + assert args.device.lower( + ) == "cpu", "OpenVINO backend require inference on device CPU." + option.use_openvino_backend() + return option + + +args = parse_arguments() + +model_file = os.path.join(args.model, "model.pdmodel") +params_file = os.path.join(args.model, "model.pdiparams") +config_file = os.path.join(args.model, "infer_cfg.yml") + +# 配置runtime,加载模型 +runtime_option = build_option(args) +model = fd.vision.detection.PPYOLOE( + model_file, params_file, config_file, runtime_option=runtime_option) + +# 预测图片检测结果 +im = cv2.imread(args.image) +result = model.predict(im.copy()) +print(result) + +# 预测结果可视化 +vis_im = fd.vision.vis_detection(im, result, score_threshold=0.5) +cv2.imwrite("visualized_result.jpg", vis_im) +print("Visualized result save in ./visualized_result.jpg") diff --git a/examples/vision/detection/yolov5/quantize/README.md b/examples/vision/detection/yolov5/quantize/README.md new file mode 100644 index 000000000..16dff9e84 --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/README.md @@ -0,0 +1,24 @@ +# YOLOv5量化模型部署 +FastDeploy已支持部署量化模型,并提供一键模型量化的工具. +用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署. + +## FastDeploy一键模型量化工具 +FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化. +详细教程请见: [一键模型量化工具](../../../../../tools/quantization/) + +## 下载量化完成的YOLOv5s模型 +用户也可以直接下载下表中的量化模型进行部署. +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 | 量化蒸馏训练 | +| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 |量化蒸馏训练 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试图片为COCO val2017中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + +## 详细部署文档 + +- [Python部署](python) +- [C++部署](cpp) diff --git a/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt new file mode 100644 index 000000000..fea1a2888 --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt @@ -0,0 +1,14 @@ +PROJECT(infer_demo C CXX) +CMAKE_MINIMUM_REQUIRED (VERSION 3.12) + +# 指定下载解压后的fastdeploy库路径 +option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") + +include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) + +# 添加FastDeploy依赖头文件 +include_directories(${FASTDEPLOY_INCS}) + +add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc) +# 添加FastDeploy库依赖 +target_link_libraries(infer_demo ${FASTDEPLOY_LIBS}) diff --git a/examples/vision/detection/yolov5/quantize/cpp/README.md b/examples/vision/detection/yolov5/quantize/cpp/README.md new file mode 100644 index 000000000..2a6733768 --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/cpp/README.md @@ -0,0 +1,34 @@ +# YOLOv5量化模型 C++部署示例 + +本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv5s量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + +## 以量化后的YOLOv5s模型为例, 进行部署 +在本目录执行如下命令即可完成编译,以及量化模型部署. +```bash +mkdir build +cd build +wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz +tar xvf fastdeploy-linux-x64-0.2.0.tgz +cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0 +make -j + +#下载FastDeloy提供的yolov5s量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar +tar -xvf yolov5s_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + + +# 在CPU上使用Paddle-Inference推理量化模型 +./infer_demo yolov5s_quant 000000014439.jpg 0 +# 在GPU上使用TensorRT推理量化模型 +./infer_demo yolov5s_quant 000000014439.jpg 1 +``` diff --git a/examples/vision/detection/yolov5/quantize/cpp/infer.cc b/examples/vision/detection/yolov5/quantize/cpp/infer.cc new file mode 100644 index 000000000..88a9e15fc --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/cpp/infer.cc @@ -0,0 +1,77 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "fastdeploy/vision.h" +#ifdef WIN32 +const char sep = '\\'; +#else +const char sep = '/'; +#endif + +void InitAndInfer(const std::string& model_dir, const std::string& image_file, + const fastdeploy::RuntimeOption& option) { + auto model_file = model_dir + sep + "model.pdmodel"; + auto params_file = model_dir + sep + "model.pdiparams"; + + auto model = fastdeploy::vision::detection::YOLOv5( + model_file, params_file, option, fastdeploy::ModelFormat::PADDLE); + assert(model.Initialized()); + + auto im = cv::imread(image_file); + auto im_bak = im.clone(); + + fastdeploy::vision::DetectionResult res; + if (!model.Predict(&im, &res)) { + std::cerr << "Failed to predict." << std::endl; + return; + } + + std::cout << res.Str() << std::endl; + + auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res); + cv::imwrite("vis_result.jpg", vis_im); + std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; +} + +int main(int argc, char* argv[]) { + if (argc < 4) { + std::cout << "Usage: infer_demo path/to/quant_model " + "path/to/image " + "run_option, " + "e.g ./infer_demo ./yolov5s_quant ./000000014439.jpg 0" + << std::endl; + std::cout << "The data type of run_option is int, 0: run on cpu with ORT " + "backend; 1: run " + "on cpu with Paddle backend ; 2: run with gpu and use " + "TensorRT backend." + << std::endl; + return -1; + } + + fastdeploy::RuntimeOption option; + int flag = std::atoi(argv[3]); + + if (flag == 0) { + option.UseCpu(); + option.UsePaddleBackend(); + } else if (flag == 1) { + option.UseGpu(); + option.UseTrtBackend(); + } + + std::string model_dir = argv[1]; + std::string test_image = argv[2]; + InitAndInfer(model_dir, test_image, option); + return 0; +} diff --git a/examples/vision/detection/yolov5/quantize/python/README.md b/examples/vision/detection/yolov5/quantize/python/README.md new file mode 100644 index 000000000..adc9eeba5 --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/python/README.md @@ -0,0 +1,29 @@ +# YOLOv5s量化模型 Python部署示例 +本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv5量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + + +## 以量化后的YOLOv5s模型为例, 进行部署 +```bash +#下载部署示例代码 +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd examples/vision/detection/yolov5/quantize/python + +#下载FastDeloy提供的yolov5s量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar +tar -xvf yolov5s_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +# 在CPU上使用Paddle-Inference推理量化模型 +python infer.py --model yolov5s_quant --image 000000014439.jpg --device cpu --backend paddle +# 在GPU上使用TensorRT推理量化模型 +python infer.py --model yolov5s_quant --image 000000014439.jpg --device gpu --backend trt +``` diff --git a/examples/vision/detection/yolov5/quantize/python/infer.py b/examples/vision/detection/yolov5/quantize/python/infer.py new file mode 100644 index 000000000..aa56ef18b --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/python/infer.py @@ -0,0 +1,81 @@ +import fastdeploy as fd +import cv2 +import os +from fastdeploy import ModelFormat + + +def parse_arguments(): + import argparse + import ast + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", required=True, help="Path of yolov5 onnx model.") + parser.add_argument( + "--image", required=True, help="Path of test image file.") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Type of inference device, support 'cpu' or 'gpu'.") + parser.add_argument( + "--backend", + type=str, + default="default", + help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu" + ) + parser.add_argument( + "--device_id", + type=int, + default=0, + help="Define which GPU card used to run model.") + parser.add_argument( + "--cpu_thread_num", + type=int, + default=9, + help="Number of threads while inference on CPU.") + return parser.parse_args() + + +def build_option(args): + option = fd.RuntimeOption() + if args.device.lower() == "gpu": + option.use_gpu(0) + + option.set_cpu_thread_num(args.cpu_thread_num) + + if args.backend.lower() == "trt": + assert args.device.lower( + ) == "gpu", "TensorRT backend require inference on device GPU." + option.use_trt_backend() + elif args.backend.lower() == "ort": + option.use_ort_backend() + elif args.backend.lower() == "paddle": + option.use_paddle_backend() + elif args.backend.lower() == "openvino": + assert args.device.lower( + ) == "cpu", "OpenVINO backend require inference on device CPU." + option.use_openvino_backend() + return option + + +args = parse_arguments() + +model_file = os.path.join(args.model, "model.pdmodel") +params_file = os.path.join(args.model, "model.pdiparams") +# 配置runtime,加载模型 +runtime_option = build_option(args) +model = fd.vision.detection.YOLOv5( + model_file, + params_file, + runtime_option=runtime_option, + model_format=ModelFormat.PADDLE) + +# 预测图片检测结果 +im = cv2.imread(args.image) +result = model.predict(im.copy()) +print(result) + +# 预测结果可视化 +vis_im = fd.vision.vis_detection(im, result) +cv2.imwrite("visualized_result.jpg", vis_im) +print("Visualized result save in ./visualized_result.jpg") diff --git a/examples/vision/detection/yolov6/quantize/README.md b/examples/vision/detection/yolov6/quantize/README.md new file mode 100644 index 000000000..594d59e5c --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/README.md @@ -0,0 +1,25 @@ +# YOLOv6量化模型部署 +FastDeploy已支持部署量化模型,并提供一键模型量化的工具. +用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署. + +## FastDeploy一键模型量化工具 +FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化. +详细教程请见: [一键模型量化工具](../../../../../tools/quantization/) + +## 下载量化完成的YOLOv6s模型 +用户也可以直接下载下表中的量化模型进行部署. + +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- | ------ | +| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6| 量化蒸馏训练 | +| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2|量化蒸馏训练 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试图片为COCO val2017中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + +## 详细部署文档 + +- [Python部署](python) +- [C++部署](cpp) diff --git a/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt new file mode 100644 index 000000000..fea1a2888 --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt @@ -0,0 +1,14 @@ +PROJECT(infer_demo C CXX) +CMAKE_MINIMUM_REQUIRED (VERSION 3.12) + +# 指定下载解压后的fastdeploy库路径 +option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") + +include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) + +# 添加FastDeploy依赖头文件 +include_directories(${FASTDEPLOY_INCS}) + +add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc) +# 添加FastDeploy库依赖 +target_link_libraries(infer_demo ${FASTDEPLOY_LIBS}) diff --git a/examples/vision/detection/yolov6/quantize/cpp/README.md b/examples/vision/detection/yolov6/quantize/cpp/README.md new file mode 100644 index 000000000..5713abcfb --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/cpp/README.md @@ -0,0 +1,34 @@ +# YOLOv6量化模型 C++部署示例 + +本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv6s量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + +## 以量化后的YOLOv6s模型为例, 进行部署 +在本目录执行如下命令即可完成编译,以及量化模型部署. +```bash +mkdir build +cd build +wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz +tar xvf fastdeploy-linux-x64-0.2.0.tgz +cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0 +make -j + +#下载FastDeloy提供的yolov6s量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar +tar -xvf yolov6s_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + + +# 在CPU上使用Paddle-Inference推理量化模型 +./infer_demo yolov6s_quant 000000014439.jpg 0 +# 在GPU上使用TensorRT推理量化模型 +./infer_demo yolov6s_quant 000000014439.jpg 1 +``` diff --git a/examples/vision/detection/yolov6/quantize/cpp/infer.cc b/examples/vision/detection/yolov6/quantize/cpp/infer.cc new file mode 100644 index 000000000..f7a9d2c16 --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/cpp/infer.cc @@ -0,0 +1,77 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "fastdeploy/vision.h" +#ifdef WIN32 +const char sep = '\\'; +#else +const char sep = '/'; +#endif + +void InitAndInfer(const std::string& model_dir, const std::string& image_file, + const fastdeploy::RuntimeOption& option) { + auto model_file = model_dir + sep + "model.pdmodel"; + auto params_file = model_dir + sep + "model.pdiparams"; + + auto model = fastdeploy::vision::detection::YOLOv6( + model_file, params_file, option, fastdeploy::ModelFormat::PADDLE); + assert(model.Initialized()); + + auto im = cv::imread(image_file); + auto im_bak = im.clone(); + + fastdeploy::vision::DetectionResult res; + if (!model.Predict(&im, &res)) { + std::cerr << "Failed to predict." << std::endl; + return; + } + + std::cout << res.Str() << std::endl; + + auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res); + cv::imwrite("vis_result.jpg", vis_im); + std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; +} + +int main(int argc, char* argv[]) { + if (argc < 4) { + std::cout << "Usage: infer_demo path/to/quant_model " + "path/to/image " + "run_option, " + "e.g ./infer_demo ./yolov6s_quant ./000000014439.jpg 0" + << std::endl; + std::cout << "The data type of run_option is int, 0: run on cpu with ORT " + "backend; 1: run " + "on cpu with Paddle backend ; 2: run with gpu and use " + "TensorRT backend." + << std::endl; + return -1; + } + + fastdeploy::RuntimeOption option; + int flag = std::atoi(argv[3]); + + if (flag == 0) { + option.UseCpu(); + option.UsePaddleBackend(); + } else if (flag == 1) { + option.UseGpu(); + option.UseTrtBackend(); + } + + std::string model_dir = argv[1]; + std::string test_image = argv[2]; + InitAndInfer(model_dir, test_image, option); + return 0; +} diff --git a/examples/vision/detection/yolov6/quantize/python/README.md b/examples/vision/detection/yolov6/quantize/python/README.md new file mode 100644 index 000000000..48af7a7f6 --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/python/README.md @@ -0,0 +1,28 @@ +# YOLOv6量化模型 Python部署示例 +本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv6量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + +## 以量化后的YOLOv6s模型为例, 进行部署 +```bash +#下载部署示例代码 +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd examples/slim/yolov6/python + +#下载FastDeloy提供的yolov6s量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar +tar -xvf yolov6s_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +# 在CPU上使用Paddle-Inference推理量化模型 +python infer.py --model yolov6s_quant --image 000000014439.jpg --device cpu --backend paddle +# 在GPU上使用TensorRT推理量化模型 +python infer.py --model yolov6s_quant --image 000000014439.jpg --device gpu --backend trt +``` diff --git a/examples/vision/detection/yolov6/quantize/python/infer.py b/examples/vision/detection/yolov6/quantize/python/infer.py new file mode 100644 index 000000000..ec0602272 --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/python/infer.py @@ -0,0 +1,81 @@ +import fastdeploy as fd +import cv2 +import os +from fastdeploy import ModelFormat + + +def parse_arguments(): + import argparse + import ast + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", required=True, help="Path of yolov6 onnx model.") + parser.add_argument( + "--image", required=True, help="Path of test image file.") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Type of inference device, support 'cpu' or 'gpu'.") + parser.add_argument( + "--backend", + type=str, + default="default", + help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu" + ) + parser.add_argument( + "--device_id", + type=int, + default=0, + help="Define which GPU card used to run model.") + parser.add_argument( + "--cpu_thread_num", + type=int, + default=9, + help="Number of threads while inference on CPU.") + return parser.parse_args() + + +def build_option(args): + option = fd.RuntimeOption() + if args.device.lower() == "gpu": + option.use_gpu(0) + + option.set_cpu_thread_num(args.cpu_thread_num) + + if args.backend.lower() == "trt": + assert args.device.lower( + ) == "gpu", "TensorRT backend require inference on device GPU." + option.use_trt_backend() + elif args.backend.lower() == "ort": + option.use_ort_backend() + elif args.backend.lower() == "paddle": + option.use_paddle_backend() + elif args.backend.lower() == "openvino": + assert args.device.lower( + ) == "cpu", "OpenVINO backend require inference on device CPU." + option.use_openvino_backend() + return option + + +args = parse_arguments() + +model_file = os.path.join(args.model, "model.pdmodel") +params_file = os.path.join(args.model, "model.pdiparams") +# 配置runtime,加载模型 +runtime_option = build_option(args) +model = fd.vision.detection.YOLOv6( + model_file, + params_file, + runtime_option=runtime_option, + model_format=ModelFormat.PADDLE) + +# 预测图片检测结果 +im = cv2.imread(args.image) +result = model.predict(im.copy()) +print(result) + +# 预测结果可视化 +vis_im = fd.vision.vis_detection(im, result) +cv2.imwrite("visualized_result.jpg", vis_im) +print("Visualized result save in ./visualized_result.jpg") diff --git a/examples/vision/detection/yolov7/quantize/README.md b/examples/vision/detection/yolov7/quantize/README.md new file mode 100644 index 000000000..6d29ea3f3 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/README.md @@ -0,0 +1,25 @@ +# YOLOv7量化模型部署 +FastDeploy已支持部署量化模型,并提供一键模型量化的工具. +用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署. + +## FastDeploy一键模型量化工具 +FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化. +详细教程请见: [一键模型量化工具](../../../../../tools/quantization/) + +## 下载量化完成的YOLOv7模型 +用户也可以直接下载下表中的量化模型进行部署. + +| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 | +| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | +| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8| 量化蒸馏训练 | +| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3| 量化蒸馏训练 | + +上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能. +- 测试图片为COCO val2017中的图片. +- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. +- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. + +## 详细部署文档 + +- [Python部署](python) +- [C++部署](cpp) diff --git a/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt new file mode 100644 index 000000000..fea1a2888 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt @@ -0,0 +1,14 @@ +PROJECT(infer_demo C CXX) +CMAKE_MINIMUM_REQUIRED (VERSION 3.12) + +# 指定下载解压后的fastdeploy库路径 +option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") + +include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) + +# 添加FastDeploy依赖头文件 +include_directories(${FASTDEPLOY_INCS}) + +add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc) +# 添加FastDeploy库依赖 +target_link_libraries(infer_demo ${FASTDEPLOY_LIBS}) diff --git a/examples/vision/detection/yolov7/quantize/cpp/README.md b/examples/vision/detection/yolov7/quantize/cpp/README.md new file mode 100644 index 000000000..285454e6e --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/cpp/README.md @@ -0,0 +1,34 @@ +# YOLOv7量化模型 C++部署示例 + +本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + +## 以量化后的YOLOv7模型为例, 进行部署 +在本目录执行如下命令即可完成编译,以及量化模型部署. +```bash +mkdir build +cd build +wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz +tar xvf fastdeploy-linux-x64-0.2.0.tgz +cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0 +make -j + +#下载FastDeloy提供的yolov7量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar +tar -xvf yolov7_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + + +# 在CPU上使用Paddle-Inference推理量化模型 +./infer_demo yolov7_quant 000000014439.jpg 0 +# 在GPU上使用TensorRT推理量化模型 +./infer_demo yolov7_quant 000000014439.jpg 1 +``` diff --git a/examples/vision/detection/yolov7/quantize/cpp/infer.cc b/examples/vision/detection/yolov7/quantize/cpp/infer.cc new file mode 100644 index 000000000..45cba4b29 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/cpp/infer.cc @@ -0,0 +1,77 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "fastdeploy/vision.h" +#ifdef WIN32 +const char sep = '\\'; +#else +const char sep = '/'; +#endif + +void InitAndInfer(const std::string& model_dir, const std::string& image_file, + const fastdeploy::RuntimeOption& option) { + auto model_file = model_dir + sep + "model.pdmodel"; + auto params_file = model_dir + sep + "model.pdiparams"; + + auto model = fastdeploy::vision::detection::YOLOv7( + model_file, params_file, option, fastdeploy::ModelFormat::PADDLE); + assert(model.Initialized()); + + auto im = cv::imread(image_file); + auto im_bak = im.clone(); + + fastdeploy::vision::DetectionResult res; + if (!model.Predict(&im, &res)) { + std::cerr << "Failed to predict." << std::endl; + return; + } + + std::cout << res.Str() << std::endl; + + auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res); + cv::imwrite("vis_result.jpg", vis_im); + std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; +} + +int main(int argc, char* argv[]) { + if (argc < 4) { + std::cout << "Usage: infer_demo path/to/quant_model " + "path/to/image " + "run_option, " + "e.g ./infer_demo ./yolov7s_quant ./000000014439.jpg 0" + << std::endl; + std::cout << "The data type of run_option is int, 0: run on cpu with ORT " + "backend; 1: run " + "on cpu with Paddle backend ; 2: run with gpu and use " + "TensorRT backend." + << std::endl; + return -1; + } + + fastdeploy::RuntimeOption option; + int flag = std::atoi(argv[3]); + + if (flag == 0) { + option.UseCpu(); + option.UsePaddleBackend(); + } else if (flag == 1) { + option.UseGpu(); + option.UseTrtBackend(); + } + + std::string model_dir = argv[1]; + std::string test_image = argv[2]; + InitAndInfer(model_dir, test_image, option); + return 0; +} diff --git a/examples/vision/detection/yolov7/quantize/python/README.md b/examples/vision/detection/yolov7/quantize/python/README.md new file mode 100644 index 000000000..0cf007038 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/python/README.md @@ -0,0 +1,28 @@ +# YOLOv7量化模型 Python部署示例 +本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速. + +## 部署准备 +### FastDeploy环境准备 +- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md) +- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start) + +### 量化模型准备 +- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署. +- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署. + +## 以量化后的YOLOv7模型为例, 进行部署 +```bash +#下载部署示例代码 +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd examples/vision/detection/yolov7/quantize/python + +#下载FastDeloy提供的yolov7量化模型文件和测试图片 +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar +tar -xvf yolov7_quant.tar +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +# 在CPU上使用Paddle-Inference推理量化模型 +python infer.py --model yolov7_quant --image 000000014439.jpg --device cpu --backend paddle +# 在GPU上使用TensorRT推理量化模型 +python infer.py --model yolov7_quant --image 000000014439.jpg --device gpu --backend trt +``` diff --git a/examples/vision/detection/yolov7/quantize/python/infer.py b/examples/vision/detection/yolov7/quantize/python/infer.py new file mode 100644 index 000000000..3c42679e7 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/python/infer.py @@ -0,0 +1,81 @@ +import fastdeploy as fd +import cv2 +import os +from fastdeploy import ModelFormat + + +def parse_arguments(): + import argparse + import ast + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", required=True, help="Path of yolov7 onnx model.") + parser.add_argument( + "--image", required=True, help="Path of test image file.") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Type of inference device, support 'cpu' or 'gpu'.") + parser.add_argument( + "--backend", + type=str, + default="default", + help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu" + ) + parser.add_argument( + "--device_id", + type=int, + default=0, + help="Define which GPU card used to run model.") + parser.add_argument( + "--cpu_thread_num", + type=int, + default=9, + help="Number of threads while inference on CPU.") + return parser.parse_args() + + +def build_option(args): + option = fd.RuntimeOption() + if args.device.lower() == "gpu": + option.use_gpu(0) + + option.set_cpu_thread_num(args.cpu_thread_num) + + if args.backend.lower() == "trt": + assert args.device.lower( + ) == "gpu", "TensorRT backend require inference on device GPU." + option.use_trt_backend() + elif args.backend.lower() == "ort": + option.use_ort_backend() + elif args.backend.lower() == "paddle": + option.use_paddle_backend() + elif args.backend.lower() == "openvino": + assert args.device.lower( + ) == "cpu", "OpenVINO backend require inference on device CPU." + option.use_openvino_backend() + return option + + +args = parse_arguments() + +model_file = os.path.join(args.model, "model.pdmodel") +params_file = os.path.join(args.model, "model.pdiparams") +# 配置runtime,加载模型 +runtime_option = build_option(args) +model = fd.vision.detection.YOLOv7( + model_file, + params_file, + runtime_option=runtime_option, + model_format=ModelFormat.PADDLE) + +# 预测图片检测结果 +im = cv2.imread(args.image) +result = model.predict(im.copy()) +print(result) + +# 预测结果可视化 +vis_im = fd.vision.vis_detection(im, result) +cv2.imwrite("visualized_result.jpg", vis_im) +print("Visualized result save in ./visualized_result.jpg") diff --git a/fastdeploy/vision/detection/contrib/yolov7.cc b/fastdeploy/vision/detection/contrib/yolov7.cc index e776a8c6c..51e7a605c 100644 --- a/fastdeploy/vision/detection/contrib/yolov7.cc +++ b/fastdeploy/vision/detection/contrib/yolov7.cc @@ -64,12 +64,13 @@ YOLOv7::YOLOv7(const std::string& model_file, const std::string& params_file, valid_cpu_backends = {Backend::OPENVINO, Backend::ORT}; valid_gpu_backends = {Backend::ORT, Backend::TRT}; } else { - valid_cpu_backends = {Backend::PDINFER}; - valid_gpu_backends = {Backend::PDINFER}; + valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT}; + valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT}; } runtime_option = custom_option; runtime_option.model_format = model_format; runtime_option.model_file = model_file; + runtime_option.params_file = params_file; initialized = Initialize(); } diff --git a/tools/quantization/readme.md b/tools/quantization/README.md similarity index 51% rename from tools/quantization/readme.md rename to tools/quantization/README.md index 600f79441..2bf2da7f2 100644 --- a/tools/quantization/readme.md +++ b/tools/quantization/README.md @@ -1,5 +1,6 @@ # FastDeploy 一键模型量化 -FastDeploy 给用户提供了一键量化功能, 支持离线量化和量化蒸馏训练. 本文档已Yolov5s为例, 用户可参考如何安装并执行FastDeploy的一键量化功能. +FastDeploy基于PaddleSlim, 给用户提供了一键模型量化的工具, 支持离线量化和量化蒸馏训练. +本文档以Yolov5s为例, 供用户参考如何安装并执行FastDeploy的一键模型量化. ## 1.安装 @@ -24,7 +25,7 @@ python setup.py install ## 2.使用方式 -### 一键离线量化示例 +### 一键量化示例 #### 离线量化 @@ -34,7 +35,7 @@ python setup.py install ```shell # 下载yolov5.onnx -wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx +wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx # 下载数据集, 此Calibration数据集为COCO val2017中的前320张图片 wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz @@ -42,20 +43,21 @@ tar -xvf COCO_val_320.tar.gz ``` ##### 2.使用fastdeploy_quant命令,执行一键模型量化: - +以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可. ```shell fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/' ``` ##### 3.参数说明 +目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化. + | 参数 | 作用 | | -------------------- | ------------------------------------------------------------ | -| --config_path | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) | +| --config_path | 一键量化所需要的量化配置文件.[详解](./configs/README.md) | | --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT | | --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 | -注意:目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化 #### 量化蒸馏训练 @@ -63,10 +65,11 @@ fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method=' ##### 1.准备待量化模型和训练数据集 FastDeploy目前的量化蒸馏训练,只支持无标注图片训练,训练过程中不支持评估模型精度. 数据集为真实预测场景下的图片,图片数量依据数据集大小来定,尽量覆盖所有部署场景. 此例中,我们为用户准备了COCO2017验证集中的前320张图片. +注: 如果用户想通过量化蒸馏训练的方法,获得精度更高的量化模型, 可以自行准备更多的数据, 以及训练更多的轮数. ```shell # 下载yolov5.onnx -wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx +wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx # 下载数据集, 此Calibration数据集为COCO2017验证集中的前320张图片 wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz @@ -74,47 +77,31 @@ tar -xvf COCO_val_320.tar.gz ``` ##### 2.使用fastdeploy_quant命令,执行一键模型量化: - +以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可. ```shell +# 执行命令默认为单卡训练,训练前请指定单卡GPU, 否则在训练过程中可能会卡住. export CUDA_VISIBLE_DEVICES=0 fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/' ``` ##### 3.参数说明 +目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化. + | 参数 | 作用 | | -------------------- | ------------------------------------------------------------ | -| --config_path | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) | +| --config_path | 一键量化所需要的量化配置文件.[详解](./configs/README.md)| | --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT | | --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 | -注意:目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化 - ## 3. FastDeploy 部署量化模型 -用户在获得量化模型之后,只需要简单地传入量化后的模型路径及相应参数,即可以使用FastDeploy进行部署. +用户在获得量化模型之后,即可以使用FastDeploy进行部署, 部署文档请参考: 具体请用户参考示例文档: -- [YOLOv5s 量化模型Python部署](../examples/slim/yolov5s/python/) -- [YOLOv5s 量化模型C++部署](../examples/slim/yolov5s/cpp/) -- [YOLOv6s 量化模型Python部署](../examples/slim/yolov6s/python/) -- [YOLOv6s 量化模型C++部署](../examples/slim/yolov6s/cpp/) -- [YOLOv7 量化模型Python部署](../examples/slim/yolov7/python/) -- [YOLOv7 量化模型C++部署](../examples/slim/yolov7/cpp/) +- [YOLOv5 量化模型部署](../../examples/vision/detection/yolov5/quantize/) -## 4.Benchmark -下表为模型量化前后,在FastDeploy部署的端到端推理性能. -- 测试图片为COCO val2017中的图片. -- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒. -- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. +- [YOLOv6 量化模型部署](../../examples/vision/detection/yolov6/quantize/) -| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | -| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- | -| YOLOv5s | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 | -| YOLOv5s | ONNX Runtime | CPU | 183.68 | 100.39 | 1.83 | 37.6 | 33.1 | -| YOLOv5s | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 | -| YOLOv6s | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6| -| YOLOv6s | ONNX Runtime | CPU | 345.85 | 131.81 | 2.60 |42.5| 36.1| -| YOLOv6s | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2| -| YOLOv7 | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8| -| YOLOv7 | ONNX Runtime | CPU | 971.27 | 471.88 | 2.06 | 51.1 | 42.5| -| YOLOv7 | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3| +- [YOLOv7 量化模型部署](../../examples/vision/detection/yolov7/quantize/) + +- [PadddleClas 量化模型部署](../../examples/vision/classification/paddleclas/quantize/) diff --git a/tools/quantization/configs/README.md b/tools/quantization/configs/README.md new file mode 100644 index 000000000..7bab2de34 --- /dev/null +++ b/tools/quantization/configs/README.md @@ -0,0 +1,51 @@ +# FastDeploy 量化配置文件说明 +FastDeploy 量化配置文件中,包含了全局配置,量化蒸馏训练配置,离线量化配置和训练配置. +用户除了直接使用FastDeploy提供在本目录的配置文件外,可以按需求自行修改相关配置文件 + +## 实例解读 + +``` +# 全局配置 +Global: + model_dir: ./yolov5s.onnx #输入模型的路径 + format: 'onnx' #输入模型的格式, paddle模型请选择'paddle' + model_filename: model.pdmodel #量化后转为paddle格式模型的模型名字 + params_filename: model.pdiparams #量化后转为paddle格式模型的参数名字 + image_path: ./COCO_val_320 #离线量化或者量化蒸馏训练使用的数据集路径 + arch: YOLOv5 #模型结构 + input_list: ['x2paddle_images'] #待量化的模型的输入名字 + preprocess: yolo_image_preprocess #模型量化时,对数据做的预处理函数, 用户可以在 ../fdquant/dataset.py 中修改或自行编写新的预处理函数 + +#量化蒸馏训练配置 +Distillation: + alpha: 1.0 #蒸馏loss所占权重 + loss: soft_label #蒸馏loss算法 + +Quantization: + onnx_format: true #是否采用ONNX量化标准格式, 要在FastDeploy上部署, 必须选true + use_pact: true #量化训练是否使用PACT方法 + activation_quantize_type: 'moving_average_abs_max' #激活量化方式 + quantize_op_types: #需要进行量化的OP + - conv2d + - depthwise_conv2d + +#离线量化配置 +PTQ: + calibration_method: 'avg' #离线量化的激活校准算法, 可选: avg, abs_max, hist, KL, mse, emd + skip_tensor_list: None #用户可指定跳过某些conv层,不进行量化 + +#训练参数配置 +TrainConfig: + train_iter: 3000 + learning_rate: 0.00001 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + target_metric: 0.365 + +``` +## 更多详细配置方法 + +FastDeploy一键量化功能由PaddeSlim助力, 更详细的量化配置方法请参考: +[自动化压缩超参详细教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/hyperparameter_tutorial.md) diff --git a/tools/quantization/configs/detection/ppyoloe_l_quant.yaml b/tools/quantization/configs/detection/ppyoloe_l_quant.yaml new file mode 100644 index 000000000..43cbab4f9 --- /dev/null +++ b/tools/quantization/configs/detection/ppyoloe_l_quant.yaml @@ -0,0 +1,37 @@ +Global: + model_dir: ./ppyoloe_crn_l_300e_coco + format: paddle + model_filename: model.pdmodel + params_filename: model.pdiparams + image_path: ./COCO_val_320 + arch: PPYOLOE + input_list: ['image','scale_factor'] + preprocess: ppdet_image_preprocess + +Distillation: + alpha: 1.0 + loss: soft_label + +Quantization: + onnx_format: true + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + + +PTQ: + calibration_method: 'avg' # option: avg, abs_max, hist, KL, mse + skip_tensor_list: None + +TrainConfig: + train_iter: 5000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 diff --git a/tools/quantization/configs/readme.md b/tools/quantization/configs/readme.md deleted file mode 100644 index 782584815..000000000 --- a/tools/quantization/configs/readme.md +++ /dev/null @@ -1,48 +0,0 @@ -# FastDeploy 量化配置文件说明 -FastDeploy 量化配置文件中,包含了全局配置,量化蒸馏训练配置,离线量化配置和训练配置. -用户除了直接使用FastDeploy提供在本目录的配置文件外,可以按需求自行修改相关配置文件 - -## 实例解读 - -``` -#全局信息 -Global: - model_dir: ./yolov7-tiny.onnx #输入模型路径 - format: 'onnx' #输入模型格式,选项为 onnx 或者 paddle - model_filename: model.pdmodel #paddle模型的模型文件名 - params_filename: model.pdiparams #paddle模型的参数文件名 - image_path: ./COCO_val_320 #PTQ所有的Calibration数据集或者量化训练所用的训练集 - arch: YOLOv7 #模型系列 - -#量化蒸馏训练中的蒸馏参数设置 -Distillation: - alpha: 1.0 - loss: soft_label - -#量化蒸馏训练中的量化参数设置 -Quantization: - onnx_format: true - activation_quantize_type: 'moving_average_abs_max' - quantize_op_types: - - conv2d - - depthwise_conv2d - -#离线量化参数配置 -PTQ: - calibration_method: 'avg' #Calibraion算法,可选为 avg, abs_max, hist, KL, mse - skip_tensor_list: None #不进行离线量化的tensor - - -#训练参数 -TrainConfig: - train_iter: 3000 - learning_rate: - type: CosineAnnealingDecay - learning_rate: 0.00003 - T_max: 8000 - optimizer_builder: - optimizer: - type: SGD - weight_decay: 0.00004 - -```