mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-06 00:57:33 +08:00
Add Examples to deploy quantized models (#342)
* Add PaddleOCR Support * Add PaddleOCR Support * Add PaddleOCRv3 Support * Add PaddleOCRv3 Support * Update README.md * Update README.md * Update README.md * Update README.md * Add PaddleOCRv3 Support * Add PaddleOCRv3 Supports * Add PaddleOCRv3 Suport * Fix Rec diff * Remove useless functions * Remove useless comments * Add PaddleOCRv2 Support * Add PaddleOCRv3 & PaddleOCRv2 Support * remove useless parameters * Add utils of sorting det boxes * Fix code naming convention * Fix code naming convention * Fix code naming convention * Fix bug in the Classify process * Imporve OCR Readme * Fix diff in Cls model * Update Model Download Link in Readme * Fix diff in PPOCRv2 * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Fix conflict * Add readme for OCRResult * Improve OCR readme * Add OCRResult readme * Improve OCR readme * Improve OCR readme * Add Model Quantization Demo * Fix Model Quantization Readme * Fix Model Quantization Readme * Add the function to do PTQ quantization * Improve quant tools readme * Improve quant tool readme * Improve quant tool readme * Add PaddleInference-GPU for OCR Rec model * Add QAT method to fastdeploy-quantization tool * Remove examples/slim for now * Move configs folder * Add Quantization Support for Classification Model * Imporve ways of importing preprocess * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Improve Quantization configs and readme * Add support for multi-inputs model * Add backends and params file for YOLOv7 * Add quantized model deployment support for YOLO series * Fix YOLOv5 quantize readme * Fix YOLO quantize readme * Fix YOLO quantize readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Fix bug, change Fronted to ModelFormat * Change Fronted to ModelFormat * Add examples to deploy quantized paddleclas models * Fix readme * Add quantize Readme * Add quantize Readme * Add quantize Readme * Modify readme of quantization tools * Modify readme of quantization tools * Improve quantization tools readme * Improve quantization readme * Improve PaddleClas quantized model deployment readme * Add PPYOLOE-l quantized deployment examples * Improve quantization tools readme
This commit is contained in:
@@ -1,11 +1,67 @@
|
|||||||
[English](../en/quantize.md) | 简体中文
|
[English](../en/quantize.md) | 简体中文
|
||||||
|
|
||||||
# 量化加速
|
# 量化加速
|
||||||
|
量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度.
|
||||||
|
FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速.
|
||||||
|
|
||||||
简要介绍量化加速的原理。
|
|
||||||
|
|
||||||
目前量化支持在哪些硬件及后端的使用
|
## FastDeploy 多个引擎和硬件支持量化模型部署
|
||||||
|
当前,FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下:
|
||||||
|
|
||||||
|
| 硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT |
|
||||||
|
| :-----------| :-------- | :--------------- | :------- |
|
||||||
|
| CPU | 支持 | 支持 | |
|
||||||
|
| GPU | | | 支持 |
|
||||||
|
|
||||||
|
|
||||||
|
## 模型量化
|
||||||
|
|
||||||
|
### 量化方法
|
||||||
|
基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署.
|
||||||
|
|
||||||
|
两种方法的主要对比如下表所示:
|
||||||
|
| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 |
|
||||||
|
| :-----------| :--------| :-------| :------- | :------- |
|
||||||
|
| 离线量化 | 无需训练,耗时短 | 比量化蒸馏训练稍低 | 两者一致 | 两者一致 |
|
||||||
|
| 量化蒸馏训练 | 需要训练,耗时稍高 | 较未量化模型有少量损失 | 两者一致 |两者一致 |
|
||||||
|
|
||||||
|
### 用户使用FastDeploy一键模型量化工具来量化模型
|
||||||
|
Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化.
|
||||||
|
- [FastDeploy 一键模型量化](../../tools/quantization/)
|
||||||
|
当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型.
|
||||||
|
|
||||||
|
|
||||||
## 量化示例
|
## 量化示例
|
||||||
|
目前, FastDeploy已支持的模型量化如下表所示:
|
||||||
|
|
||||||
这里一个表格,展示目前支持的量化列表(跳转到相应的example下去),精度、性能
|
### YOLO 系列
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 | 量化蒸馏训练 |
|
||||||
|
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | ONNX Runtime | CPU | 183.68 | 100.39 | 1.83 | 37.6 | 33.1 |量化蒸馏训练 |
|
||||||
|
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 | 量化蒸馏训练 |
|
||||||
|
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6|量化蒸馏训练 |
|
||||||
|
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | ONNX Runtime | CPU | 345.85 | 131.81 | 2.60 |42.5| 36.1|量化蒸馏训练 |
|
||||||
|
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2|量化蒸馏训练 |
|
||||||
|
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8|量化蒸馏训练 |
|
||||||
|
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | ONNX Runtime | CPU | 971.27 | 471.88 | 2.06 | 51.1 | 42.5|量化蒸馏训练 |
|
||||||
|
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3|量化蒸馏训练 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试数据为COCO2017验证集中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
|
||||||
|
### PaddleClas系列
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 86.87 | 59 .32 | 1.46 | 79.12 | 78.87| 离线量化|
|
||||||
|
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 7.85 | 5.42 | 1.45 | 79.12 | 79.06 | 离线量化 |
|
||||||
|
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 40.32 | 16.87 | 2.39 |77.89 | 75.09 |离线量化 |
|
||||||
|
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 5.10 | 3.35 | 1.52 |77.89 | 76.86 | 离线量化 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试数据为ImageNet-2012验证集中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
27
examples/vision/classification/paddleclas/quantize/README.md
Normal file
27
examples/vision/classification/paddleclas/quantize/README.md
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
# PaddleClas 量化模型部署
|
||||||
|
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
|
||||||
|
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
|
||||||
|
|
||||||
|
## FastDeploy一键模型量化工具
|
||||||
|
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
|
||||||
|
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
|
||||||
|
注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可。
|
||||||
|
|
||||||
|
## 下载量化完成的PaddleClas模型
|
||||||
|
用户也可以直接下载下表中的量化模型进行部署.
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | ONNX Runtime | CPU | 86.87 | 59 .32 | 1.46 | 79.12 | 78.87| 离线量化|
|
||||||
|
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | TensorRT | GPU | 7.85 | 5.42 | 1.45 | 79.12 | 79.06 | 离线量化 |
|
||||||
|
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | ONNX Runtime | CPU | 40.32 | 16.87 | 2.39 |77.89 | 75.09 |离线量化 |
|
||||||
|
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | TensorRT | GPU | 5.10 | 3.35 | 1.52 |77.89 | 76.86 | 离线量化 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试图片为ImageNet-2012验证集中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
## 详细部署文档
|
||||||
|
|
||||||
|
- [Python部署](python)
|
||||||
|
- [C++部署](cpp)
|
@@ -0,0 +1,14 @@
|
|||||||
|
PROJECT(infer_demo C CXX)
|
||||||
|
CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
|
||||||
|
|
||||||
|
# 指定下载解压后的fastdeploy库路径
|
||||||
|
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
|
||||||
|
|
||||||
|
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
|
||||||
|
|
||||||
|
# 添加FastDeploy依赖头文件
|
||||||
|
include_directories(${FASTDEPLOY_INCS})
|
||||||
|
|
||||||
|
add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
|
||||||
|
# 添加FastDeploy库依赖
|
||||||
|
target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
|
@@ -0,0 +1,33 @@
|
|||||||
|
# PaddleClas 量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
|
||||||
|
|
||||||
|
## 以量化后的ResNet50_Vd模型为例, 进行部署
|
||||||
|
在本目录执行如下命令即可完成编译,以及量化模型部署.
|
||||||
|
```bash
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
tar xvf fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
|
||||||
|
make -j
|
||||||
|
|
||||||
|
#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar
|
||||||
|
tar -xvf resnet50_vd_ptq.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
|
||||||
|
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 0
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 1
|
||||||
|
```
|
@@ -0,0 +1,76 @@
|
|||||||
|
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||||
|
//
|
||||||
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
// you may not use this file except in compliance with the License.
|
||||||
|
// You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing, software
|
||||||
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
// See the License for the specific language governing permissions and
|
||||||
|
// limitations under the License.
|
||||||
|
|
||||||
|
#include "fastdeploy/vision.h"
|
||||||
|
#ifdef WIN32
|
||||||
|
const char sep = '\\';
|
||||||
|
#else
|
||||||
|
const char sep = '/';
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void InitAndInfer(const std::string& model_dir, const std::string& image_file,
|
||||||
|
const fastdeploy::RuntimeOption& option) {
|
||||||
|
auto model_file = model_dir + sep + "inference.pdmodel";
|
||||||
|
auto params_file = model_dir + sep + "inference.pdiparams";
|
||||||
|
auto config_file = model_dir + sep + "inference_cls.yaml";
|
||||||
|
|
||||||
|
auto model = fastdeploy::vision::classification::PaddleClasModel(
|
||||||
|
model_file, params_file, config_file, option);
|
||||||
|
|
||||||
|
assert(model.Initialized());
|
||||||
|
|
||||||
|
auto im = cv::imread(image_file);
|
||||||
|
auto im_bak = im.clone();
|
||||||
|
|
||||||
|
fastdeploy::vision::ClassifyResult res;
|
||||||
|
if (!model.Predict(&im, &res)) {
|
||||||
|
std::cerr << "Failed to predict." << std::endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << res.Str() << std::endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
if (argc < 4) {
|
||||||
|
std::cout << "Usage: infer_demo path/to/quant_model "
|
||||||
|
"path/to/image "
|
||||||
|
"run_option, "
|
||||||
|
"e.g ./infer_demo ./ResNet50_vd_quant ./test.jpeg 0"
|
||||||
|
<< std::endl;
|
||||||
|
std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
|
||||||
|
"backend; 1: run "
|
||||||
|
"on gpu with TensorRT backend. "
|
||||||
|
<< std::endl;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
fastdeploy::RuntimeOption option;
|
||||||
|
int flag = std::atoi(argv[3]);
|
||||||
|
|
||||||
|
if (flag == 0) {
|
||||||
|
option.UseCpu();
|
||||||
|
option.UseOrtBackend();
|
||||||
|
} else if (flag == 1) {
|
||||||
|
option.UseGpu();
|
||||||
|
option.UseTrtBackend();
|
||||||
|
option.SetTrtInputShape("inputs",{1, 3, 224, 224});
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string model_dir = argv[1];
|
||||||
|
std::string test_image = argv[2];
|
||||||
|
InitAndInfer(model_dir, test_image, option);
|
||||||
|
return 0;
|
||||||
|
}
|
@@ -0,0 +1,29 @@
|
|||||||
|
# PaddleClas 量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.py`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
|
||||||
|
|
||||||
|
|
||||||
|
## 以量化后的ResNet50_Vd模型为例, 进行部署
|
||||||
|
```bash
|
||||||
|
#下载部署示例代码
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||||||
|
cd examples/vision/classification/paddleclas/quantize/python
|
||||||
|
|
||||||
|
#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar
|
||||||
|
tar -xvf resnet50_vd_ptq.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device cpu --backend ort
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device gpu --backend trt
|
||||||
|
```
|
@@ -0,0 +1,77 @@
|
|||||||
|
import fastdeploy as fd
|
||||||
|
import cv2
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments():
|
||||||
|
import argparse
|
||||||
|
import ast
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
"--model", required=True, help="Path of paddleclas model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--image", required=True, help="Path of test image file.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--device",
|
||||||
|
type=str,
|
||||||
|
default='cpu',
|
||||||
|
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--backend",
|
||||||
|
type=str,
|
||||||
|
default="default",
|
||||||
|
help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--device_id",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Define which GPU card used to run model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--cpu_thread_num",
|
||||||
|
type=int,
|
||||||
|
default=9,
|
||||||
|
help="Number of threads while inference on CPU.")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def build_option(args):
|
||||||
|
option = fd.RuntimeOption()
|
||||||
|
if args.device.lower() == "gpu":
|
||||||
|
option.use_gpu(0)
|
||||||
|
|
||||||
|
option.set_cpu_thread_num(args.cpu_thread_num)
|
||||||
|
|
||||||
|
if args.backend.lower() == "trt":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "gpu", "TensorRT backend require inferences on device GPU."
|
||||||
|
option.use_trt_backend()
|
||||||
|
option.set_trt_input_shape("inputs", min_shape=[1, 3, 224, 224])
|
||||||
|
elif args.backend.lower() == "ort":
|
||||||
|
option.use_ort_backend()
|
||||||
|
elif args.backend.lower() == "paddle":
|
||||||
|
option.use_paddle_backend()
|
||||||
|
elif args.backend.lower() == "openvino":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "cpu", "OpenVINO backend require inference on device CPU."
|
||||||
|
option.use_openvino_backend()
|
||||||
|
return option
|
||||||
|
|
||||||
|
|
||||||
|
args = parse_arguments()
|
||||||
|
|
||||||
|
# 配置runtime,加载模型
|
||||||
|
runtime_option = build_option(args)
|
||||||
|
|
||||||
|
model_file = os.path.join(args.model, "inference.pdmodel")
|
||||||
|
params_file = os.path.join(args.model, "inference.pdiparams")
|
||||||
|
config_file = os.path.join(args.model, "inference_cls.yaml")
|
||||||
|
|
||||||
|
model = fd.vision.classification.PaddleClasModel(
|
||||||
|
model_file, params_file, config_file, runtime_option=runtime_option)
|
||||||
|
|
||||||
|
# 预测图片检测结果
|
||||||
|
im = cv2.imread(args.image)
|
||||||
|
result = model.predict(im.copy())
|
||||||
|
print(result)
|
24
examples/vision/detection/paddledetection/quantize/README.md
Normal file
24
examples/vision/detection/paddledetection/quantize/README.md
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
# PaddleDetection 量化模型部署
|
||||||
|
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
|
||||||
|
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
|
||||||
|
|
||||||
|
## FastDeploy一键模型量化工具
|
||||||
|
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
|
||||||
|
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
|
||||||
|
|
||||||
|
## 下载量化完成的PP-YOLOE-l模型
|
||||||
|
用户也可以直接下载下表中的量化模型进行部署.
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar ) | TensorRT | GPU | 43.83 | 31.57 | 1.39 | 51.4 | 50.7 | 量化蒸馏训练 |
|
||||||
|
| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar ) | ONNX Runtime | CPU | 1085.18 | 475.55 | 2.29 |51.4 | 50.0 |量化蒸馏训练 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试图片为COCO val2017中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
## 详细部署文档
|
||||||
|
|
||||||
|
- [Python部署](python)
|
||||||
|
- [C++部署](cpp)
|
@@ -0,0 +1,13 @@
|
|||||||
|
PROJECT(infer_demo C CXX)
|
||||||
|
CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
|
||||||
|
|
||||||
|
# 指定下载解压后的fastdeploy库路径
|
||||||
|
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
|
||||||
|
|
||||||
|
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
|
||||||
|
|
||||||
|
# 添加FastDeploy依赖头文件
|
||||||
|
include_directories(${FASTDEPLOY_INCS})
|
||||||
|
|
||||||
|
add_executable(infer_ppyoloe_demo ${PROJECT_SOURCE_DIR}/infer_ppyoloe.cc)
|
||||||
|
target_link_libraries(infer_ppyoloe_demo ${FASTDEPLOY_LIBS})
|
@@ -0,0 +1,33 @@
|
|||||||
|
# PP-YOLOE-l量化模型 C++部署示例
|
||||||
|
|
||||||
|
本目录下提供的`infer_ppyoloe.cc`,可以帮助用户快速完成PP-YOLOE-l量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
|
||||||
|
|
||||||
|
## 以量化后的PP-YOLOE-l模型为例, 进行部署
|
||||||
|
在本目录执行如下命令即可完成编译,以及量化模型部署.
|
||||||
|
```bash
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
tar xvf fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
|
||||||
|
make -j
|
||||||
|
|
||||||
|
#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar
|
||||||
|
tar -xvf ppyoloe_crn_l_300e_coco_qat.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
# 在CPU上使用ONNX Runtime推理量化模型
|
||||||
|
./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 0
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 1
|
||||||
|
```
|
@@ -0,0 +1,80 @@
|
|||||||
|
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||||
|
//
|
||||||
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
// you may not use this file except in compliance with the License.
|
||||||
|
// You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing, software
|
||||||
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
// See the License for the specific language governing permissions and
|
||||||
|
// limitations under the License.
|
||||||
|
|
||||||
|
#include "fastdeploy/vision.h"
|
||||||
|
#ifdef WIN32
|
||||||
|
const char sep = '\\';
|
||||||
|
#else
|
||||||
|
const char sep = '/';
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void InitAndInfer(const std::string& model_dir, const std::string& image_file,
|
||||||
|
const fastdeploy::RuntimeOption& option) {
|
||||||
|
auto model_file = model_dir + sep + "model.pdmodel";
|
||||||
|
auto params_file = model_dir + sep + "model.pdiparams";
|
||||||
|
auto config_file = model_dir + sep + "infer_cfg.yml";
|
||||||
|
|
||||||
|
auto model = fastdeploy::vision::detection::PPYOLOE(model_file, params_file,
|
||||||
|
config_file, option);
|
||||||
|
assert(model.Initialized());
|
||||||
|
|
||||||
|
auto im = cv::imread(image_file);
|
||||||
|
auto im_bak = im.clone();
|
||||||
|
|
||||||
|
fastdeploy::vision::DetectionResult res;
|
||||||
|
if (!model.Predict(&im, &res)) {
|
||||||
|
std::cerr << "Failed to predict." << std::endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << res.Str() << std::endl;
|
||||||
|
|
||||||
|
auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res, 0.5);
|
||||||
|
cv::imwrite("vis_result.jpg", vis_im);
|
||||||
|
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
if (argc < 4) {
|
||||||
|
std::cout << "Usage: infer_demo path/to/quant_model "
|
||||||
|
"path/to/image "
|
||||||
|
"run_option, "
|
||||||
|
"e.g ./infer_demo ./PPYOLOE_L_quant ./test.jpeg 0"
|
||||||
|
<< std::endl;
|
||||||
|
std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
|
||||||
|
"backend; 1: run "
|
||||||
|
"on gpu with TensorRT backend. "
|
||||||
|
<< std::endl;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
fastdeploy::RuntimeOption option;
|
||||||
|
int flag = std::atoi(argv[3]);
|
||||||
|
|
||||||
|
if (flag == 0) {
|
||||||
|
option.UseCpu();
|
||||||
|
option.UseOrtBackend();
|
||||||
|
} else if (flag == 1) {
|
||||||
|
option.UseGpu();
|
||||||
|
option.UseTrtBackend();
|
||||||
|
option.SetTrtInputShape("inputs",{1, 3, 640, 640});
|
||||||
|
option.SetTrtInputShape("scale_factor",{1,2});
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string model_dir = argv[1];
|
||||||
|
std::string test_image = argv[2];
|
||||||
|
InitAndInfer(model_dir, test_image, option);
|
||||||
|
return 0;
|
||||||
|
}
|
@@ -0,0 +1,29 @@
|
|||||||
|
# PP-YOLOE-l量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.py`,可以帮助用户快速完成PP-YOLOE量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
|
||||||
|
|
||||||
|
|
||||||
|
## 以量化后的PP-YOLOE-l模型为例, 进行部署
|
||||||
|
```bash
|
||||||
|
#下载部署示例代码
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||||||
|
cd /examples/vision/detection/paddledetection/quantize/python
|
||||||
|
|
||||||
|
#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar
|
||||||
|
tar -xvf ppyoloe_crn_l_300e_coco_qat.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
# 在CPU上使用ONNX Runtime推理量化模型
|
||||||
|
python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device cpu --backend ort
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device gpu --backend trt
|
||||||
|
```
|
@@ -0,0 +1,82 @@
|
|||||||
|
import fastdeploy as fd
|
||||||
|
import cv2
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments():
|
||||||
|
import argparse
|
||||||
|
import ast
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
"--model", required=True, help="Path of PPYOLOE model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--image", required=True, help="Path of test image file.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--device",
|
||||||
|
type=str,
|
||||||
|
default='cpu',
|
||||||
|
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--backend",
|
||||||
|
type=str,
|
||||||
|
default="default",
|
||||||
|
help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--device_id",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Define which GPU card used to run model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--cpu_thread_num",
|
||||||
|
type=int,
|
||||||
|
default=9,
|
||||||
|
help="Number of threads while inference on CPU.")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def build_option(args):
|
||||||
|
option = fd.RuntimeOption()
|
||||||
|
if args.device.lower() == "gpu":
|
||||||
|
option.use_gpu(0)
|
||||||
|
|
||||||
|
option.set_cpu_thread_num(args.cpu_thread_num)
|
||||||
|
|
||||||
|
if args.backend.lower() == "trt":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "gpu", "TensorRT backend require inferences on device GPU."
|
||||||
|
option.use_trt_backend()
|
||||||
|
option.set_trt_cache_file(os.path.join(args.model, "model.trt"))
|
||||||
|
option.set_trt_input_shape("image", min_shape=[1, 3, 640, 640])
|
||||||
|
option.set_trt_input_shape("scale_factor", min_shape=[1, 2])
|
||||||
|
elif args.backend.lower() == "ort":
|
||||||
|
option.use_ort_backend()
|
||||||
|
elif args.backend.lower() == "paddle":
|
||||||
|
option.use_paddle_backend()
|
||||||
|
elif args.backend.lower() == "openvino":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "cpu", "OpenVINO backend require inference on device CPU."
|
||||||
|
option.use_openvino_backend()
|
||||||
|
return option
|
||||||
|
|
||||||
|
|
||||||
|
args = parse_arguments()
|
||||||
|
|
||||||
|
model_file = os.path.join(args.model, "model.pdmodel")
|
||||||
|
params_file = os.path.join(args.model, "model.pdiparams")
|
||||||
|
config_file = os.path.join(args.model, "infer_cfg.yml")
|
||||||
|
|
||||||
|
# 配置runtime,加载模型
|
||||||
|
runtime_option = build_option(args)
|
||||||
|
model = fd.vision.detection.PPYOLOE(
|
||||||
|
model_file, params_file, config_file, runtime_option=runtime_option)
|
||||||
|
|
||||||
|
# 预测图片检测结果
|
||||||
|
im = cv2.imread(args.image)
|
||||||
|
result = model.predict(im.copy())
|
||||||
|
print(result)
|
||||||
|
|
||||||
|
# 预测结果可视化
|
||||||
|
vis_im = fd.vision.vis_detection(im, result, score_threshold=0.5)
|
||||||
|
cv2.imwrite("visualized_result.jpg", vis_im)
|
||||||
|
print("Visualized result save in ./visualized_result.jpg")
|
24
examples/vision/detection/yolov5/quantize/README.md
Normal file
24
examples/vision/detection/yolov5/quantize/README.md
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
# YOLOv5量化模型部署
|
||||||
|
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
|
||||||
|
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
|
||||||
|
|
||||||
|
## FastDeploy一键模型量化工具
|
||||||
|
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
|
||||||
|
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
|
||||||
|
|
||||||
|
## 下载量化完成的YOLOv5s模型
|
||||||
|
用户也可以直接下载下表中的量化模型进行部署.
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 | 量化蒸馏训练 |
|
||||||
|
| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 |量化蒸馏训练 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试图片为COCO val2017中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
## 详细部署文档
|
||||||
|
|
||||||
|
- [Python部署](python)
|
||||||
|
- [C++部署](cpp)
|
14
examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt
Normal file
14
examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
PROJECT(infer_demo C CXX)
|
||||||
|
CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
|
||||||
|
|
||||||
|
# 指定下载解压后的fastdeploy库路径
|
||||||
|
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
|
||||||
|
|
||||||
|
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
|
||||||
|
|
||||||
|
# 添加FastDeploy依赖头文件
|
||||||
|
include_directories(${FASTDEPLOY_INCS})
|
||||||
|
|
||||||
|
add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
|
||||||
|
# 添加FastDeploy库依赖
|
||||||
|
target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
|
34
examples/vision/detection/yolov5/quantize/cpp/README.md
Normal file
34
examples/vision/detection/yolov5/quantize/cpp/README.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
# YOLOv5量化模型 C++部署示例
|
||||||
|
|
||||||
|
本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv5s量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
## 以量化后的YOLOv5s模型为例, 进行部署
|
||||||
|
在本目录执行如下命令即可完成编译,以及量化模型部署.
|
||||||
|
```bash
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
tar xvf fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
|
||||||
|
make -j
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov5s量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar
|
||||||
|
tar -xvf yolov5s_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
./infer_demo yolov5s_quant 000000014439.jpg 0
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
./infer_demo yolov5s_quant 000000014439.jpg 1
|
||||||
|
```
|
77
examples/vision/detection/yolov5/quantize/cpp/infer.cc
Normal file
77
examples/vision/detection/yolov5/quantize/cpp/infer.cc
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||||
|
//
|
||||||
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
// you may not use this file except in compliance with the License.
|
||||||
|
// You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing, software
|
||||||
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
// See the License for the specific language governing permissions and
|
||||||
|
// limitations under the License.
|
||||||
|
|
||||||
|
#include "fastdeploy/vision.h"
|
||||||
|
#ifdef WIN32
|
||||||
|
const char sep = '\\';
|
||||||
|
#else
|
||||||
|
const char sep = '/';
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void InitAndInfer(const std::string& model_dir, const std::string& image_file,
|
||||||
|
const fastdeploy::RuntimeOption& option) {
|
||||||
|
auto model_file = model_dir + sep + "model.pdmodel";
|
||||||
|
auto params_file = model_dir + sep + "model.pdiparams";
|
||||||
|
|
||||||
|
auto model = fastdeploy::vision::detection::YOLOv5(
|
||||||
|
model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
|
||||||
|
assert(model.Initialized());
|
||||||
|
|
||||||
|
auto im = cv::imread(image_file);
|
||||||
|
auto im_bak = im.clone();
|
||||||
|
|
||||||
|
fastdeploy::vision::DetectionResult res;
|
||||||
|
if (!model.Predict(&im, &res)) {
|
||||||
|
std::cerr << "Failed to predict." << std::endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << res.Str() << std::endl;
|
||||||
|
|
||||||
|
auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
|
||||||
|
cv::imwrite("vis_result.jpg", vis_im);
|
||||||
|
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
if (argc < 4) {
|
||||||
|
std::cout << "Usage: infer_demo path/to/quant_model "
|
||||||
|
"path/to/image "
|
||||||
|
"run_option, "
|
||||||
|
"e.g ./infer_demo ./yolov5s_quant ./000000014439.jpg 0"
|
||||||
|
<< std::endl;
|
||||||
|
std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
|
||||||
|
"backend; 1: run "
|
||||||
|
"on cpu with Paddle backend ; 2: run with gpu and use "
|
||||||
|
"TensorRT backend."
|
||||||
|
<< std::endl;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
fastdeploy::RuntimeOption option;
|
||||||
|
int flag = std::atoi(argv[3]);
|
||||||
|
|
||||||
|
if (flag == 0) {
|
||||||
|
option.UseCpu();
|
||||||
|
option.UsePaddleBackend();
|
||||||
|
} else if (flag == 1) {
|
||||||
|
option.UseGpu();
|
||||||
|
option.UseTrtBackend();
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string model_dir = argv[1];
|
||||||
|
std::string test_image = argv[2];
|
||||||
|
InitAndInfer(model_dir, test_image, option);
|
||||||
|
return 0;
|
||||||
|
}
|
29
examples/vision/detection/yolov5/quantize/python/README.md
Normal file
29
examples/vision/detection/yolov5/quantize/python/README.md
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
# YOLOv5s量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv5量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
|
||||||
|
## 以量化后的YOLOv5s模型为例, 进行部署
|
||||||
|
```bash
|
||||||
|
#下载部署示例代码
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||||||
|
cd examples/vision/detection/yolov5/quantize/python
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov5s量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar
|
||||||
|
tar -xvf yolov5s_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
python infer.py --model yolov5s_quant --image 000000014439.jpg --device cpu --backend paddle
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
python infer.py --model yolov5s_quant --image 000000014439.jpg --device gpu --backend trt
|
||||||
|
```
|
81
examples/vision/detection/yolov5/quantize/python/infer.py
Normal file
81
examples/vision/detection/yolov5/quantize/python/infer.py
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
import fastdeploy as fd
|
||||||
|
import cv2
|
||||||
|
import os
|
||||||
|
from fastdeploy import ModelFormat
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments():
|
||||||
|
import argparse
|
||||||
|
import ast
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
"--model", required=True, help="Path of yolov5 onnx model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--image", required=True, help="Path of test image file.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--device",
|
||||||
|
type=str,
|
||||||
|
default='cpu',
|
||||||
|
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--backend",
|
||||||
|
type=str,
|
||||||
|
default="default",
|
||||||
|
help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--device_id",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Define which GPU card used to run model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--cpu_thread_num",
|
||||||
|
type=int,
|
||||||
|
default=9,
|
||||||
|
help="Number of threads while inference on CPU.")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def build_option(args):
|
||||||
|
option = fd.RuntimeOption()
|
||||||
|
if args.device.lower() == "gpu":
|
||||||
|
option.use_gpu(0)
|
||||||
|
|
||||||
|
option.set_cpu_thread_num(args.cpu_thread_num)
|
||||||
|
|
||||||
|
if args.backend.lower() == "trt":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "gpu", "TensorRT backend require inference on device GPU."
|
||||||
|
option.use_trt_backend()
|
||||||
|
elif args.backend.lower() == "ort":
|
||||||
|
option.use_ort_backend()
|
||||||
|
elif args.backend.lower() == "paddle":
|
||||||
|
option.use_paddle_backend()
|
||||||
|
elif args.backend.lower() == "openvino":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "cpu", "OpenVINO backend require inference on device CPU."
|
||||||
|
option.use_openvino_backend()
|
||||||
|
return option
|
||||||
|
|
||||||
|
|
||||||
|
args = parse_arguments()
|
||||||
|
|
||||||
|
model_file = os.path.join(args.model, "model.pdmodel")
|
||||||
|
params_file = os.path.join(args.model, "model.pdiparams")
|
||||||
|
# 配置runtime,加载模型
|
||||||
|
runtime_option = build_option(args)
|
||||||
|
model = fd.vision.detection.YOLOv5(
|
||||||
|
model_file,
|
||||||
|
params_file,
|
||||||
|
runtime_option=runtime_option,
|
||||||
|
model_format=ModelFormat.PADDLE)
|
||||||
|
|
||||||
|
# 预测图片检测结果
|
||||||
|
im = cv2.imread(args.image)
|
||||||
|
result = model.predict(im.copy())
|
||||||
|
print(result)
|
||||||
|
|
||||||
|
# 预测结果可视化
|
||||||
|
vis_im = fd.vision.vis_detection(im, result)
|
||||||
|
cv2.imwrite("visualized_result.jpg", vis_im)
|
||||||
|
print("Visualized result save in ./visualized_result.jpg")
|
25
examples/vision/detection/yolov6/quantize/README.md
Normal file
25
examples/vision/detection/yolov6/quantize/README.md
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
# YOLOv6量化模型部署
|
||||||
|
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
|
||||||
|
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
|
||||||
|
|
||||||
|
## FastDeploy一键模型量化工具
|
||||||
|
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
|
||||||
|
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
|
||||||
|
|
||||||
|
## 下载量化完成的YOLOv6s模型
|
||||||
|
用户也可以直接下载下表中的量化模型进行部署.
|
||||||
|
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- | ------ |
|
||||||
|
| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6| 量化蒸馏训练 |
|
||||||
|
| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2|量化蒸馏训练 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试图片为COCO val2017中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
## 详细部署文档
|
||||||
|
|
||||||
|
- [Python部署](python)
|
||||||
|
- [C++部署](cpp)
|
14
examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt
Normal file
14
examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
PROJECT(infer_demo C CXX)
|
||||||
|
CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
|
||||||
|
|
||||||
|
# 指定下载解压后的fastdeploy库路径
|
||||||
|
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
|
||||||
|
|
||||||
|
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
|
||||||
|
|
||||||
|
# 添加FastDeploy依赖头文件
|
||||||
|
include_directories(${FASTDEPLOY_INCS})
|
||||||
|
|
||||||
|
add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
|
||||||
|
# 添加FastDeploy库依赖
|
||||||
|
target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
|
34
examples/vision/detection/yolov6/quantize/cpp/README.md
Normal file
34
examples/vision/detection/yolov6/quantize/cpp/README.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
# YOLOv6量化模型 C++部署示例
|
||||||
|
|
||||||
|
本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv6s量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
## 以量化后的YOLOv6s模型为例, 进行部署
|
||||||
|
在本目录执行如下命令即可完成编译,以及量化模型部署.
|
||||||
|
```bash
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
tar xvf fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
|
||||||
|
make -j
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov6s量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar
|
||||||
|
tar -xvf yolov6s_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
./infer_demo yolov6s_quant 000000014439.jpg 0
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
./infer_demo yolov6s_quant 000000014439.jpg 1
|
||||||
|
```
|
77
examples/vision/detection/yolov6/quantize/cpp/infer.cc
Normal file
77
examples/vision/detection/yolov6/quantize/cpp/infer.cc
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||||
|
//
|
||||||
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
// you may not use this file except in compliance with the License.
|
||||||
|
// You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing, software
|
||||||
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
// See the License for the specific language governing permissions and
|
||||||
|
// limitations under the License.
|
||||||
|
|
||||||
|
#include "fastdeploy/vision.h"
|
||||||
|
#ifdef WIN32
|
||||||
|
const char sep = '\\';
|
||||||
|
#else
|
||||||
|
const char sep = '/';
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void InitAndInfer(const std::string& model_dir, const std::string& image_file,
|
||||||
|
const fastdeploy::RuntimeOption& option) {
|
||||||
|
auto model_file = model_dir + sep + "model.pdmodel";
|
||||||
|
auto params_file = model_dir + sep + "model.pdiparams";
|
||||||
|
|
||||||
|
auto model = fastdeploy::vision::detection::YOLOv6(
|
||||||
|
model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
|
||||||
|
assert(model.Initialized());
|
||||||
|
|
||||||
|
auto im = cv::imread(image_file);
|
||||||
|
auto im_bak = im.clone();
|
||||||
|
|
||||||
|
fastdeploy::vision::DetectionResult res;
|
||||||
|
if (!model.Predict(&im, &res)) {
|
||||||
|
std::cerr << "Failed to predict." << std::endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << res.Str() << std::endl;
|
||||||
|
|
||||||
|
auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
|
||||||
|
cv::imwrite("vis_result.jpg", vis_im);
|
||||||
|
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
if (argc < 4) {
|
||||||
|
std::cout << "Usage: infer_demo path/to/quant_model "
|
||||||
|
"path/to/image "
|
||||||
|
"run_option, "
|
||||||
|
"e.g ./infer_demo ./yolov6s_quant ./000000014439.jpg 0"
|
||||||
|
<< std::endl;
|
||||||
|
std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
|
||||||
|
"backend; 1: run "
|
||||||
|
"on cpu with Paddle backend ; 2: run with gpu and use "
|
||||||
|
"TensorRT backend."
|
||||||
|
<< std::endl;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
fastdeploy::RuntimeOption option;
|
||||||
|
int flag = std::atoi(argv[3]);
|
||||||
|
|
||||||
|
if (flag == 0) {
|
||||||
|
option.UseCpu();
|
||||||
|
option.UsePaddleBackend();
|
||||||
|
} else if (flag == 1) {
|
||||||
|
option.UseGpu();
|
||||||
|
option.UseTrtBackend();
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string model_dir = argv[1];
|
||||||
|
std::string test_image = argv[2];
|
||||||
|
InitAndInfer(model_dir, test_image, option);
|
||||||
|
return 0;
|
||||||
|
}
|
28
examples/vision/detection/yolov6/quantize/python/README.md
Normal file
28
examples/vision/detection/yolov6/quantize/python/README.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# YOLOv6量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv6量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
## 以量化后的YOLOv6s模型为例, 进行部署
|
||||||
|
```bash
|
||||||
|
#下载部署示例代码
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||||||
|
cd examples/slim/yolov6/python
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov6s量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar
|
||||||
|
tar -xvf yolov6s_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
python infer.py --model yolov6s_quant --image 000000014439.jpg --device cpu --backend paddle
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
python infer.py --model yolov6s_quant --image 000000014439.jpg --device gpu --backend trt
|
||||||
|
```
|
81
examples/vision/detection/yolov6/quantize/python/infer.py
Normal file
81
examples/vision/detection/yolov6/quantize/python/infer.py
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
import fastdeploy as fd
|
||||||
|
import cv2
|
||||||
|
import os
|
||||||
|
from fastdeploy import ModelFormat
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments():
|
||||||
|
import argparse
|
||||||
|
import ast
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
"--model", required=True, help="Path of yolov6 onnx model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--image", required=True, help="Path of test image file.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--device",
|
||||||
|
type=str,
|
||||||
|
default='cpu',
|
||||||
|
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--backend",
|
||||||
|
type=str,
|
||||||
|
default="default",
|
||||||
|
help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--device_id",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Define which GPU card used to run model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--cpu_thread_num",
|
||||||
|
type=int,
|
||||||
|
default=9,
|
||||||
|
help="Number of threads while inference on CPU.")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def build_option(args):
|
||||||
|
option = fd.RuntimeOption()
|
||||||
|
if args.device.lower() == "gpu":
|
||||||
|
option.use_gpu(0)
|
||||||
|
|
||||||
|
option.set_cpu_thread_num(args.cpu_thread_num)
|
||||||
|
|
||||||
|
if args.backend.lower() == "trt":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "gpu", "TensorRT backend require inference on device GPU."
|
||||||
|
option.use_trt_backend()
|
||||||
|
elif args.backend.lower() == "ort":
|
||||||
|
option.use_ort_backend()
|
||||||
|
elif args.backend.lower() == "paddle":
|
||||||
|
option.use_paddle_backend()
|
||||||
|
elif args.backend.lower() == "openvino":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "cpu", "OpenVINO backend require inference on device CPU."
|
||||||
|
option.use_openvino_backend()
|
||||||
|
return option
|
||||||
|
|
||||||
|
|
||||||
|
args = parse_arguments()
|
||||||
|
|
||||||
|
model_file = os.path.join(args.model, "model.pdmodel")
|
||||||
|
params_file = os.path.join(args.model, "model.pdiparams")
|
||||||
|
# 配置runtime,加载模型
|
||||||
|
runtime_option = build_option(args)
|
||||||
|
model = fd.vision.detection.YOLOv6(
|
||||||
|
model_file,
|
||||||
|
params_file,
|
||||||
|
runtime_option=runtime_option,
|
||||||
|
model_format=ModelFormat.PADDLE)
|
||||||
|
|
||||||
|
# 预测图片检测结果
|
||||||
|
im = cv2.imread(args.image)
|
||||||
|
result = model.predict(im.copy())
|
||||||
|
print(result)
|
||||||
|
|
||||||
|
# 预测结果可视化
|
||||||
|
vis_im = fd.vision.vis_detection(im, result)
|
||||||
|
cv2.imwrite("visualized_result.jpg", vis_im)
|
||||||
|
print("Visualized result save in ./visualized_result.jpg")
|
25
examples/vision/detection/yolov7/quantize/README.md
Normal file
25
examples/vision/detection/yolov7/quantize/README.md
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
# YOLOv7量化模型部署
|
||||||
|
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
|
||||||
|
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
|
||||||
|
|
||||||
|
## FastDeploy一键模型量化工具
|
||||||
|
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
|
||||||
|
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
|
||||||
|
|
||||||
|
## 下载量化完成的YOLOv7模型
|
||||||
|
用户也可以直接下载下表中的量化模型进行部署.
|
||||||
|
|
||||||
|
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 |
|
||||||
|
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
|
||||||
|
| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8| 量化蒸馏训练 |
|
||||||
|
| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3| 量化蒸馏训练 |
|
||||||
|
|
||||||
|
上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
|
||||||
|
- 测试图片为COCO val2017中的图片.
|
||||||
|
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
||||||
|
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
||||||
|
|
||||||
|
## 详细部署文档
|
||||||
|
|
||||||
|
- [Python部署](python)
|
||||||
|
- [C++部署](cpp)
|
14
examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt
Normal file
14
examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
PROJECT(infer_demo C CXX)
|
||||||
|
CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
|
||||||
|
|
||||||
|
# 指定下载解压后的fastdeploy库路径
|
||||||
|
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
|
||||||
|
|
||||||
|
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
|
||||||
|
|
||||||
|
# 添加FastDeploy依赖头文件
|
||||||
|
include_directories(${FASTDEPLOY_INCS})
|
||||||
|
|
||||||
|
add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
|
||||||
|
# 添加FastDeploy库依赖
|
||||||
|
target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
|
34
examples/vision/detection/yolov7/quantize/cpp/README.md
Normal file
34
examples/vision/detection/yolov7/quantize/cpp/README.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
# YOLOv7量化模型 C++部署示例
|
||||||
|
|
||||||
|
本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
## 以量化后的YOLOv7模型为例, 进行部署
|
||||||
|
在本目录执行如下命令即可完成编译,以及量化模型部署.
|
||||||
|
```bash
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
tar xvf fastdeploy-linux-x64-0.2.0.tgz
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
|
||||||
|
make -j
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov7量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar
|
||||||
|
tar -xvf yolov7_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
./infer_demo yolov7_quant 000000014439.jpg 0
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
./infer_demo yolov7_quant 000000014439.jpg 1
|
||||||
|
```
|
77
examples/vision/detection/yolov7/quantize/cpp/infer.cc
Normal file
77
examples/vision/detection/yolov7/quantize/cpp/infer.cc
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||||
|
//
|
||||||
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
// you may not use this file except in compliance with the License.
|
||||||
|
// You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing, software
|
||||||
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
// See the License for the specific language governing permissions and
|
||||||
|
// limitations under the License.
|
||||||
|
|
||||||
|
#include "fastdeploy/vision.h"
|
||||||
|
#ifdef WIN32
|
||||||
|
const char sep = '\\';
|
||||||
|
#else
|
||||||
|
const char sep = '/';
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void InitAndInfer(const std::string& model_dir, const std::string& image_file,
|
||||||
|
const fastdeploy::RuntimeOption& option) {
|
||||||
|
auto model_file = model_dir + sep + "model.pdmodel";
|
||||||
|
auto params_file = model_dir + sep + "model.pdiparams";
|
||||||
|
|
||||||
|
auto model = fastdeploy::vision::detection::YOLOv7(
|
||||||
|
model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
|
||||||
|
assert(model.Initialized());
|
||||||
|
|
||||||
|
auto im = cv::imread(image_file);
|
||||||
|
auto im_bak = im.clone();
|
||||||
|
|
||||||
|
fastdeploy::vision::DetectionResult res;
|
||||||
|
if (!model.Predict(&im, &res)) {
|
||||||
|
std::cerr << "Failed to predict." << std::endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::cout << res.Str() << std::endl;
|
||||||
|
|
||||||
|
auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
|
||||||
|
cv::imwrite("vis_result.jpg", vis_im);
|
||||||
|
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[]) {
|
||||||
|
if (argc < 4) {
|
||||||
|
std::cout << "Usage: infer_demo path/to/quant_model "
|
||||||
|
"path/to/image "
|
||||||
|
"run_option, "
|
||||||
|
"e.g ./infer_demo ./yolov7s_quant ./000000014439.jpg 0"
|
||||||
|
<< std::endl;
|
||||||
|
std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
|
||||||
|
"backend; 1: run "
|
||||||
|
"on cpu with Paddle backend ; 2: run with gpu and use "
|
||||||
|
"TensorRT backend."
|
||||||
|
<< std::endl;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
fastdeploy::RuntimeOption option;
|
||||||
|
int flag = std::atoi(argv[3]);
|
||||||
|
|
||||||
|
if (flag == 0) {
|
||||||
|
option.UseCpu();
|
||||||
|
option.UsePaddleBackend();
|
||||||
|
} else if (flag == 1) {
|
||||||
|
option.UseGpu();
|
||||||
|
option.UseTrtBackend();
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string model_dir = argv[1];
|
||||||
|
std::string test_image = argv[2];
|
||||||
|
InitAndInfer(model_dir, test_image, option);
|
||||||
|
return 0;
|
||||||
|
}
|
28
examples/vision/detection/yolov7/quantize/python/README.md
Normal file
28
examples/vision/detection/yolov7/quantize/python/README.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# YOLOv7量化模型 Python部署示例
|
||||||
|
本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速.
|
||||||
|
|
||||||
|
## 部署准备
|
||||||
|
### FastDeploy环境准备
|
||||||
|
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/environment.md)
|
||||||
|
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/quick_start)
|
||||||
|
|
||||||
|
### 量化模型准备
|
||||||
|
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
|
||||||
|
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
|
||||||
|
|
||||||
|
## 以量化后的YOLOv7模型为例, 进行部署
|
||||||
|
```bash
|
||||||
|
#下载部署示例代码
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||||||
|
cd examples/vision/detection/yolov7/quantize/python
|
||||||
|
|
||||||
|
#下载FastDeloy提供的yolov7量化模型文件和测试图片
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar
|
||||||
|
tar -xvf yolov7_quant.tar
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
|
||||||
|
# 在CPU上使用Paddle-Inference推理量化模型
|
||||||
|
python infer.py --model yolov7_quant --image 000000014439.jpg --device cpu --backend paddle
|
||||||
|
# 在GPU上使用TensorRT推理量化模型
|
||||||
|
python infer.py --model yolov7_quant --image 000000014439.jpg --device gpu --backend trt
|
||||||
|
```
|
81
examples/vision/detection/yolov7/quantize/python/infer.py
Normal file
81
examples/vision/detection/yolov7/quantize/python/infer.py
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
import fastdeploy as fd
|
||||||
|
import cv2
|
||||||
|
import os
|
||||||
|
from fastdeploy import ModelFormat
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments():
|
||||||
|
import argparse
|
||||||
|
import ast
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument(
|
||||||
|
"--model", required=True, help="Path of yolov7 onnx model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--image", required=True, help="Path of test image file.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--device",
|
||||||
|
type=str,
|
||||||
|
default='cpu',
|
||||||
|
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--backend",
|
||||||
|
type=str,
|
||||||
|
default="default",
|
||||||
|
help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--device_id",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Define which GPU card used to run model.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--cpu_thread_num",
|
||||||
|
type=int,
|
||||||
|
default=9,
|
||||||
|
help="Number of threads while inference on CPU.")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def build_option(args):
|
||||||
|
option = fd.RuntimeOption()
|
||||||
|
if args.device.lower() == "gpu":
|
||||||
|
option.use_gpu(0)
|
||||||
|
|
||||||
|
option.set_cpu_thread_num(args.cpu_thread_num)
|
||||||
|
|
||||||
|
if args.backend.lower() == "trt":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "gpu", "TensorRT backend require inference on device GPU."
|
||||||
|
option.use_trt_backend()
|
||||||
|
elif args.backend.lower() == "ort":
|
||||||
|
option.use_ort_backend()
|
||||||
|
elif args.backend.lower() == "paddle":
|
||||||
|
option.use_paddle_backend()
|
||||||
|
elif args.backend.lower() == "openvino":
|
||||||
|
assert args.device.lower(
|
||||||
|
) == "cpu", "OpenVINO backend require inference on device CPU."
|
||||||
|
option.use_openvino_backend()
|
||||||
|
return option
|
||||||
|
|
||||||
|
|
||||||
|
args = parse_arguments()
|
||||||
|
|
||||||
|
model_file = os.path.join(args.model, "model.pdmodel")
|
||||||
|
params_file = os.path.join(args.model, "model.pdiparams")
|
||||||
|
# 配置runtime,加载模型
|
||||||
|
runtime_option = build_option(args)
|
||||||
|
model = fd.vision.detection.YOLOv7(
|
||||||
|
model_file,
|
||||||
|
params_file,
|
||||||
|
runtime_option=runtime_option,
|
||||||
|
model_format=ModelFormat.PADDLE)
|
||||||
|
|
||||||
|
# 预测图片检测结果
|
||||||
|
im = cv2.imread(args.image)
|
||||||
|
result = model.predict(im.copy())
|
||||||
|
print(result)
|
||||||
|
|
||||||
|
# 预测结果可视化
|
||||||
|
vis_im = fd.vision.vis_detection(im, result)
|
||||||
|
cv2.imwrite("visualized_result.jpg", vis_im)
|
||||||
|
print("Visualized result save in ./visualized_result.jpg")
|
@@ -64,12 +64,13 @@ YOLOv7::YOLOv7(const std::string& model_file, const std::string& params_file,
|
|||||||
valid_cpu_backends = {Backend::OPENVINO, Backend::ORT};
|
valid_cpu_backends = {Backend::OPENVINO, Backend::ORT};
|
||||||
valid_gpu_backends = {Backend::ORT, Backend::TRT};
|
valid_gpu_backends = {Backend::ORT, Backend::TRT};
|
||||||
} else {
|
} else {
|
||||||
valid_cpu_backends = {Backend::PDINFER};
|
valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
|
||||||
valid_gpu_backends = {Backend::PDINFER};
|
valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
|
||||||
}
|
}
|
||||||
runtime_option = custom_option;
|
runtime_option = custom_option;
|
||||||
runtime_option.model_format = model_format;
|
runtime_option.model_format = model_format;
|
||||||
runtime_option.model_file = model_file;
|
runtime_option.model_file = model_file;
|
||||||
|
runtime_option.params_file = params_file;
|
||||||
initialized = Initialize();
|
initialized = Initialize();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,5 +1,6 @@
|
|||||||
# FastDeploy 一键模型量化
|
# FastDeploy 一键模型量化
|
||||||
FastDeploy 给用户提供了一键量化功能, 支持离线量化和量化蒸馏训练. 本文档已Yolov5s为例, 用户可参考如何安装并执行FastDeploy的一键量化功能.
|
FastDeploy基于PaddleSlim, 给用户提供了一键模型量化的工具, 支持离线量化和量化蒸馏训练.
|
||||||
|
本文档以Yolov5s为例, 供用户参考如何安装并执行FastDeploy的一键模型量化.
|
||||||
|
|
||||||
## 1.安装
|
## 1.安装
|
||||||
|
|
||||||
@@ -24,7 +25,7 @@ python setup.py install
|
|||||||
|
|
||||||
## 2.使用方式
|
## 2.使用方式
|
||||||
|
|
||||||
### 一键离线量化示例
|
### 一键量化示例
|
||||||
|
|
||||||
#### 离线量化
|
#### 离线量化
|
||||||
|
|
||||||
@@ -34,7 +35,7 @@ python setup.py install
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
# 下载yolov5.onnx
|
# 下载yolov5.onnx
|
||||||
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
|
wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx
|
||||||
|
|
||||||
# 下载数据集, 此Calibration数据集为COCO val2017中的前320张图片
|
# 下载数据集, 此Calibration数据集为COCO val2017中的前320张图片
|
||||||
wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
|
||||||
@@ -42,20 +43,21 @@ tar -xvf COCO_val_320.tar.gz
|
|||||||
```
|
```
|
||||||
|
|
||||||
##### 2.使用fastdeploy_quant命令,执行一键模型量化:
|
##### 2.使用fastdeploy_quant命令,执行一键模型量化:
|
||||||
|
以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可.
|
||||||
```shell
|
```shell
|
||||||
fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/'
|
fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/'
|
||||||
```
|
```
|
||||||
|
|
||||||
##### 3.参数说明
|
##### 3.参数说明
|
||||||
|
|
||||||
|
目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化.
|
||||||
|
|
||||||
| 参数 | 作用 |
|
| 参数 | 作用 |
|
||||||
| -------------------- | ------------------------------------------------------------ |
|
| -------------------- | ------------------------------------------------------------ |
|
||||||
| --config_path | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) |
|
| --config_path | 一键量化所需要的量化配置文件.[详解](./configs/README.md) |
|
||||||
| --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT |
|
| --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT |
|
||||||
| --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 |
|
| --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 |
|
||||||
|
|
||||||
注意:目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
|
|
||||||
|
|
||||||
|
|
||||||
#### 量化蒸馏训练
|
#### 量化蒸馏训练
|
||||||
@@ -63,10 +65,11 @@ fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='
|
|||||||
##### 1.准备待量化模型和训练数据集
|
##### 1.准备待量化模型和训练数据集
|
||||||
FastDeploy目前的量化蒸馏训练,只支持无标注图片训练,训练过程中不支持评估模型精度.
|
FastDeploy目前的量化蒸馏训练,只支持无标注图片训练,训练过程中不支持评估模型精度.
|
||||||
数据集为真实预测场景下的图片,图片数量依据数据集大小来定,尽量覆盖所有部署场景. 此例中,我们为用户准备了COCO2017验证集中的前320张图片.
|
数据集为真实预测场景下的图片,图片数量依据数据集大小来定,尽量覆盖所有部署场景. 此例中,我们为用户准备了COCO2017验证集中的前320张图片.
|
||||||
|
注: 如果用户想通过量化蒸馏训练的方法,获得精度更高的量化模型, 可以自行准备更多的数据, 以及训练更多的轮数.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
# 下载yolov5.onnx
|
# 下载yolov5.onnx
|
||||||
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
|
wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx
|
||||||
|
|
||||||
# 下载数据集, 此Calibration数据集为COCO2017验证集中的前320张图片
|
# 下载数据集, 此Calibration数据集为COCO2017验证集中的前320张图片
|
||||||
wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
|
||||||
@@ -74,47 +77,31 @@ tar -xvf COCO_val_320.tar.gz
|
|||||||
```
|
```
|
||||||
|
|
||||||
##### 2.使用fastdeploy_quant命令,执行一键模型量化:
|
##### 2.使用fastdeploy_quant命令,执行一键模型量化:
|
||||||
|
以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可.
|
||||||
```shell
|
```shell
|
||||||
|
# 执行命令默认为单卡训练,训练前请指定单卡GPU, 否则在训练过程中可能会卡住.
|
||||||
export CUDA_VISIBLE_DEVICES=0
|
export CUDA_VISIBLE_DEVICES=0
|
||||||
fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/'
|
fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/'
|
||||||
```
|
```
|
||||||
|
|
||||||
##### 3.参数说明
|
##### 3.参数说明
|
||||||
|
|
||||||
|
目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化.
|
||||||
|
|
||||||
| 参数 | 作用 |
|
| 参数 | 作用 |
|
||||||
| -------------------- | ------------------------------------------------------------ |
|
| -------------------- | ------------------------------------------------------------ |
|
||||||
| --config_path | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) |
|
| --config_path | 一键量化所需要的量化配置文件.[详解](./configs/README.md)|
|
||||||
| --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT |
|
| --method | 量化方式选择, 离线量化选PTQ,量化蒸馏训练选QAT |
|
||||||
| --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 |
|
| --save_dir | 产出的量化后模型路径, 该模型可直接在FastDeploy部署 |
|
||||||
|
|
||||||
注意:目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
|
|
||||||
|
|
||||||
|
|
||||||
## 3. FastDeploy 部署量化模型
|
## 3. FastDeploy 部署量化模型
|
||||||
用户在获得量化模型之后,只需要简单地传入量化后的模型路径及相应参数,即可以使用FastDeploy进行部署.
|
用户在获得量化模型之后,即可以使用FastDeploy进行部署, 部署文档请参考:
|
||||||
具体请用户参考示例文档:
|
具体请用户参考示例文档:
|
||||||
- [YOLOv5s 量化模型Python部署](../examples/slim/yolov5s/python/)
|
- [YOLOv5 量化模型部署](../../examples/vision/detection/yolov5/quantize/)
|
||||||
- [YOLOv5s 量化模型C++部署](../examples/slim/yolov5s/cpp/)
|
|
||||||
- [YOLOv6s 量化模型Python部署](../examples/slim/yolov6s/python/)
|
|
||||||
- [YOLOv6s 量化模型C++部署](../examples/slim/yolov6s/cpp/)
|
|
||||||
- [YOLOv7 量化模型Python部署](../examples/slim/yolov7/python/)
|
|
||||||
- [YOLOv7 量化模型C++部署](../examples/slim/yolov7/cpp/)
|
|
||||||
|
|
||||||
## 4.Benchmark
|
- [YOLOv6 量化模型部署](../../examples/vision/detection/yolov6/quantize/)
|
||||||
下表为模型量化前后,在FastDeploy部署的端到端推理性能.
|
|
||||||
- 测试图片为COCO val2017中的图片.
|
|
||||||
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
|
|
||||||
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
|
|
||||||
|
|
||||||
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |
|
- [YOLOv7 量化模型部署](../../examples/vision/detection/yolov7/quantize/)
|
||||||
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |
|
|
||||||
| YOLOv5s | TensorRT | GPU | 14.13 | 11.22 | 1.26 | 37.6 | 36.6 |
|
- [PadddleClas 量化模型部署](../../examples/vision/classification/paddleclas/quantize/)
|
||||||
| YOLOv5s | ONNX Runtime | CPU | 183.68 | 100.39 | 1.83 | 37.6 | 33.1 |
|
|
||||||
| YOLOv5s | Paddle Inference | CPU | 226.36 | 152.27 | 1.48 |37.6 | 36.8 |
|
|
||||||
| YOLOv6s | TensorRT | GPU | 12.89 | 8.92 | 1.45 | 42.5 | 40.6|
|
|
||||||
| YOLOv6s | ONNX Runtime | CPU | 345.85 | 131.81 | 2.60 |42.5| 36.1|
|
|
||||||
| YOLOv6s | Paddle Inference | CPU | 366.41 | 131.70 | 2.78 |42.5| 41.2|
|
|
||||||
| YOLOv7 | TensorRT | GPU | 30.43 | 15.40 | 1.98 | 51.1| 50.8|
|
|
||||||
| YOLOv7 | ONNX Runtime | CPU | 971.27 | 471.88 | 2.06 | 51.1 | 42.5|
|
|
||||||
| YOLOv7 | Paddle Inference | CPU | 1015.70 | 562.41 | 1.82 |51.1 | 46.3|
|
|
51
tools/quantization/configs/README.md
Normal file
51
tools/quantization/configs/README.md
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
# FastDeploy 量化配置文件说明
|
||||||
|
FastDeploy 量化配置文件中,包含了全局配置,量化蒸馏训练配置,离线量化配置和训练配置.
|
||||||
|
用户除了直接使用FastDeploy提供在本目录的配置文件外,可以按需求自行修改相关配置文件
|
||||||
|
|
||||||
|
## 实例解读
|
||||||
|
|
||||||
|
```
|
||||||
|
# 全局配置
|
||||||
|
Global:
|
||||||
|
model_dir: ./yolov5s.onnx #输入模型的路径
|
||||||
|
format: 'onnx' #输入模型的格式, paddle模型请选择'paddle'
|
||||||
|
model_filename: model.pdmodel #量化后转为paddle格式模型的模型名字
|
||||||
|
params_filename: model.pdiparams #量化后转为paddle格式模型的参数名字
|
||||||
|
image_path: ./COCO_val_320 #离线量化或者量化蒸馏训练使用的数据集路径
|
||||||
|
arch: YOLOv5 #模型结构
|
||||||
|
input_list: ['x2paddle_images'] #待量化的模型的输入名字
|
||||||
|
preprocess: yolo_image_preprocess #模型量化时,对数据做的预处理函数, 用户可以在 ../fdquant/dataset.py 中修改或自行编写新的预处理函数
|
||||||
|
|
||||||
|
#量化蒸馏训练配置
|
||||||
|
Distillation:
|
||||||
|
alpha: 1.0 #蒸馏loss所占权重
|
||||||
|
loss: soft_label #蒸馏loss算法
|
||||||
|
|
||||||
|
Quantization:
|
||||||
|
onnx_format: true #是否采用ONNX量化标准格式, 要在FastDeploy上部署, 必须选true
|
||||||
|
use_pact: true #量化训练是否使用PACT方法
|
||||||
|
activation_quantize_type: 'moving_average_abs_max' #激活量化方式
|
||||||
|
quantize_op_types: #需要进行量化的OP
|
||||||
|
- conv2d
|
||||||
|
- depthwise_conv2d
|
||||||
|
|
||||||
|
#离线量化配置
|
||||||
|
PTQ:
|
||||||
|
calibration_method: 'avg' #离线量化的激活校准算法, 可选: avg, abs_max, hist, KL, mse, emd
|
||||||
|
skip_tensor_list: None #用户可指定跳过某些conv层,不进行量化
|
||||||
|
|
||||||
|
#训练参数配置
|
||||||
|
TrainConfig:
|
||||||
|
train_iter: 3000
|
||||||
|
learning_rate: 0.00001
|
||||||
|
optimizer_builder:
|
||||||
|
optimizer:
|
||||||
|
type: SGD
|
||||||
|
weight_decay: 4.0e-05
|
||||||
|
target_metric: 0.365
|
||||||
|
|
||||||
|
```
|
||||||
|
## 更多详细配置方法
|
||||||
|
|
||||||
|
FastDeploy一键量化功能由PaddeSlim助力, 更详细的量化配置方法请参考:
|
||||||
|
[自动化压缩超参详细教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/hyperparameter_tutorial.md)
|
37
tools/quantization/configs/detection/ppyoloe_l_quant.yaml
Normal file
37
tools/quantization/configs/detection/ppyoloe_l_quant.yaml
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
Global:
|
||||||
|
model_dir: ./ppyoloe_crn_l_300e_coco
|
||||||
|
format: paddle
|
||||||
|
model_filename: model.pdmodel
|
||||||
|
params_filename: model.pdiparams
|
||||||
|
image_path: ./COCO_val_320
|
||||||
|
arch: PPYOLOE
|
||||||
|
input_list: ['image','scale_factor']
|
||||||
|
preprocess: ppdet_image_preprocess
|
||||||
|
|
||||||
|
Distillation:
|
||||||
|
alpha: 1.0
|
||||||
|
loss: soft_label
|
||||||
|
|
||||||
|
Quantization:
|
||||||
|
onnx_format: true
|
||||||
|
use_pact: true
|
||||||
|
activation_quantize_type: 'moving_average_abs_max'
|
||||||
|
quantize_op_types:
|
||||||
|
- conv2d
|
||||||
|
- depthwise_conv2d
|
||||||
|
|
||||||
|
|
||||||
|
PTQ:
|
||||||
|
calibration_method: 'avg' # option: avg, abs_max, hist, KL, mse
|
||||||
|
skip_tensor_list: None
|
||||||
|
|
||||||
|
TrainConfig:
|
||||||
|
train_iter: 5000
|
||||||
|
learning_rate:
|
||||||
|
type: CosineAnnealingDecay
|
||||||
|
learning_rate: 0.00003
|
||||||
|
T_max: 6000
|
||||||
|
optimizer_builder:
|
||||||
|
optimizer:
|
||||||
|
type: SGD
|
||||||
|
weight_decay: 4.0e-05
|
@@ -1,48 +0,0 @@
|
|||||||
# FastDeploy 量化配置文件说明
|
|
||||||
FastDeploy 量化配置文件中,包含了全局配置,量化蒸馏训练配置,离线量化配置和训练配置.
|
|
||||||
用户除了直接使用FastDeploy提供在本目录的配置文件外,可以按需求自行修改相关配置文件
|
|
||||||
|
|
||||||
## 实例解读
|
|
||||||
|
|
||||||
```
|
|
||||||
#全局信息
|
|
||||||
Global:
|
|
||||||
model_dir: ./yolov7-tiny.onnx #输入模型路径
|
|
||||||
format: 'onnx' #输入模型格式,选项为 onnx 或者 paddle
|
|
||||||
model_filename: model.pdmodel #paddle模型的模型文件名
|
|
||||||
params_filename: model.pdiparams #paddle模型的参数文件名
|
|
||||||
image_path: ./COCO_val_320 #PTQ所有的Calibration数据集或者量化训练所用的训练集
|
|
||||||
arch: YOLOv7 #模型系列
|
|
||||||
|
|
||||||
#量化蒸馏训练中的蒸馏参数设置
|
|
||||||
Distillation:
|
|
||||||
alpha: 1.0
|
|
||||||
loss: soft_label
|
|
||||||
|
|
||||||
#量化蒸馏训练中的量化参数设置
|
|
||||||
Quantization:
|
|
||||||
onnx_format: true
|
|
||||||
activation_quantize_type: 'moving_average_abs_max'
|
|
||||||
quantize_op_types:
|
|
||||||
- conv2d
|
|
||||||
- depthwise_conv2d
|
|
||||||
|
|
||||||
#离线量化参数配置
|
|
||||||
PTQ:
|
|
||||||
calibration_method: 'avg' #Calibraion算法,可选为 avg, abs_max, hist, KL, mse
|
|
||||||
skip_tensor_list: None #不进行离线量化的tensor
|
|
||||||
|
|
||||||
|
|
||||||
#训练参数
|
|
||||||
TrainConfig:
|
|
||||||
train_iter: 3000
|
|
||||||
learning_rate:
|
|
||||||
type: CosineAnnealingDecay
|
|
||||||
learning_rate: 0.00003
|
|
||||||
T_max: 8000
|
|
||||||
optimizer_builder:
|
|
||||||
optimizer:
|
|
||||||
type: SGD
|
|
||||||
weight_decay: 0.00004
|
|
||||||
|
|
||||||
```
|
|
Reference in New Issue
Block a user