Add Quantization Function. (#256)

* Add PaddleOCR Support * Add PaddleOCR Support * Add PaddleOCRv3 Support * Add PaddleOCRv3 Support * Update README.md * Update README.md * Update README.md * Update README.md * Add PaddleOCRv3 Support * Add PaddleOCRv3 Supports * Add PaddleOCRv3 Suport * Fix Rec diff * Remove useless functions * Remove useless comments * Add PaddleOCRv2 Support * Add PaddleOCRv3 & PaddleOCRv2 Support * remove useless parameters * Add utils of sorting det boxes * Fix code naming convention * Fix code naming convention * Fix code naming convention * Fix bug in the Classify process * Imporve OCR Readme * Fix diff in Cls model * Update Model Download Link in Readme * Fix diff in PPOCRv2 * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Fix conflict * Add readme for OCRResult * Improve OCR readme * Add OCRResult readme * Improve OCR readme * Improve OCR readme * Add Model Quantization Demo * Fix Model Quantization Readme * Fix Model Quantization Readme * Add the function to do PTQ quantization * Improve quant tools readme * Improve quant tool readme * Improve quant tool readme * Add PaddleInference-GPU for OCR Rec model * Add QAT method to fastdeploy-quantization tool * Remove examples/slim for now * Move configs folder * Add Quantization Support for Classification Model * Imporve ways of importing preprocess * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Improve Quantization configs and readme * Add support for multi-inputs model
2025-10-05 08:37:06 +08:00 · 2022-10-08 15:45:28 +08:00
parent 8d47637541
commit 1efc0fa6b0
13 changed files with 736 additions and 0 deletions
--- a/tools/quantization/configs/classification/mobilenetv1_ssld_quant.yaml
+++ b/tools/quantization/configs/classification/mobilenetv1_ssld_quant.yaml
@@ -0,0 +1,49 @@
+Global:
+  model_dir: ./MobileNetV1_ssld_infer/
+  format: 'paddle'
+  model_filename: inference.pdmodel
+  params_filename: inference.pdiparams
+  image_path: ./ImageNet_val_640
+  arch: MobileNetV1
+  input_list: ['input']
+  preprocess: cls_image_preprocess
+
+
+Distillation:
+  alpha: 1.0
+  loss: l2
+  node:
+  - softmax_0.tmp_0
+
+
+Quantization:
+  use_pact: true
+  activation_bits: 8
+  is_full_quantize: false
+  onnx_format: True
+  activation_quantize_type: moving_average_abs_max
+  weight_quantize_type: channel_wise_abs_max
+  not_quant_pattern:
+  - skip_quant
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  weight_bits: 8
+
+
+TrainConfig:
+  train_iter: 5000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.015
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: Momentum
+    weight_decay: 0.00002
+  origin_metric: 0.70898
+
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list: None
--- a/tools/quantization/configs/classification/resnet50_vd_quant.yaml
+++ b/tools/quantization/configs/classification/resnet50_vd_quant.yaml
@@ -0,0 +1,47 @@
+Global:
+  model_dir: ./ResNet50_vd_infer/
+  format: 'paddle'
+  model_filename: inference.pdmodel
+  params_filename: inference.pdiparams
+  image_path: ./ImageNet_val_640
+  arch: ResNet50
+  input_list: ['input']
+  preprocess: cls_image_preprocess
+
+
+Distillation:
+  alpha: 1.0
+  loss: l2
+  node:
+  - softmax_0.tmp_0
+
+Quantization:
+  use_pact: true
+  activation_bits: 8
+  is_full_quantize: false
+  onnx_format: True
+  activation_quantize_type: moving_average_abs_max
+  weight_quantize_type: channel_wise_abs_max
+  not_quant_pattern:
+  - skip_quant
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  weight_bits: 8
+
+TrainConfig:
+  train_iter: 5000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.015
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: Momentum
+    weight_decay: 0.00002
+  origin_metric: 0.7912
+
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list: None
--- a/tools/quantization/configs/detection/yolov5s_quant.yaml
+++ b/tools/quantization/configs/detection/yolov5s_quant.yaml
@@ -0,0 +1,35 @@
+Global:
+  model_dir: ./yolov5s.onnx
+  format: 'onnx'
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+  image_path: ./COCO_val_320
+  arch: YOLOv5
+  input_list: ['x2paddle_images']
+  preprocess: yolo_image_preprocess
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+Quantization:
+  onnx_format: true
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list: None
+
+TrainConfig:
+  train_iter: 3000
+  learning_rate: 0.00001
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
+  target_metric: 0.365
--- a/tools/quantization/configs/detection/yolov6s_quant.yaml
+++ b/tools/quantization/configs/detection/yolov6s_quant.yaml
@@ -0,0 +1,35 @@
+Global:
+  model_dir: ./yolov6s.onnx
+  format: 'onnx'
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+  image_path: ./COCO_val_320
+  arch: YOLOv6
+  input_list: ['x2paddle_image_arrays']
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+Quantization:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list:  ['conv2d_2.w_0', 'conv2d_15.w_0', 'conv2d_46.w_0', 'conv2d_11.w_0', 'conv2d_49.w_0']
+
+TrainConfig:
+  train_iter: 8000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/tools/quantization/configs/detection/yolov7_quant.yaml
+++ b/tools/quantization/configs/detection/yolov7_quant.yaml
@@ -0,0 +1,34 @@
+Global:
+  model_dir: ./yolov7.onnx
+  format: 'onnx'
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+  image_path: ./COCO_val_320
+  arch: YOLOv7
+  input_list: ['x2paddle_images']
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+Quantization:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list: None
+
+TrainConfig:
+  train_iter: 3000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/tools/quantization/configs/readme.md
+++ b/tools/quantization/configs/readme.md
@@ -0,0 +1,48 @@
+# FastDeploy 量化配置文件说明
+FastDeploy 量化配置文件中，包含了全局配置，量化蒸馏训练配置，离线量化配置和训练配置.
+用户除了直接使用FastDeploy提供在本目录的配置文件外，可以按需求自行修改相关配置文件
+
+## 实例解读
+
+```
+#全局信息
+Global:
+  model_dir: ./yolov7-tiny.onnx     #输入模型路径
+  format: 'onnx'                    #输入模型格式，选项为 onnx 或者 paddle
+  model_filename: model.pdmodel     #paddle模型的模型文件名
+  params_filename: model.pdiparams  #paddle模型的参数文件名
+  image_path: ./COCO_val_320        #PTQ所有的Calibration数据集或者量化训练所用的训练集
+  arch: YOLOv7                      #模型系列
+
+#量化蒸馏训练中的蒸馏参数设置
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+#量化蒸馏训练中的量化参数设置
+Quantization:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+#离线量化参数配置
+PTQ:
+  calibration_method: 'avg' #Calibraion算法，可选为 avg, abs_max, hist, KL, mse
+  skip_tensor_list: None    #不进行离线量化的tensor
+
+
+#训练参数
+TrainConfig:
+  train_iter: 3000  
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
+
+```
--- a/tools/quantization/fdquant/init.py
+++ b/tools/quantization/fdquant/init.py
--- a/tools/quantization/fdquant/dataset.py
+++ b/tools/quantization/fdquant/dataset.py
@@ -0,0 +1,150 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import os
+import numpy as np
+import paddle
+
+
+def generate_scale(im, target_shape):
+    origin_shape = im.shape[:2]
+    im_size_min = np.min(origin_shape)
+    im_size_max = np.max(origin_shape)
+    target_size_min = np.min(target_shape)
+    target_size_max = np.max(target_shape)
+    im_scale = float(target_size_min) / float(im_size_min)
+    if np.round(im_scale * im_size_max) > target_size_max:
+        im_scale = float(target_size_max) / float(im_size_max)
+    im_scale_x = im_scale
+    im_scale_y = im_scale
+
+    return im_scale_y, im_scale_x
+
+
+def yolo_image_preprocess(img, target_shape=[640, 640]):
+    # Resize image
+    im_scale_y, im_scale_x = generate_scale(img, target_shape)
+    img = cv2.resize(
+        img,
+        None,
+        None,
+        fx=im_scale_x,
+        fy=im_scale_y,
+        interpolation=cv2.INTER_LINEAR)
+    # Pad
+    im_h, im_w = img.shape[:2]
+    h, w = target_shape[:]
+    if h != im_h or w != im_w:
+        canvas = np.ones((h, w, 3), dtype=np.float32)
+        canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
+        canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
+        img = canvas
+    img = np.transpose(img / 255, [2, 0, 1])
+
+    return img.astype(np.float32)
+
+
+def cls_resize_short(img, target_size):
+
+    img_h, img_w = img.shape[:2]
+    percent = float(target_size) / min(img_w, img_h)
+    w = int(round(img_w * percent))
+    h = int(round(img_h * percent))
+
+    return cv2.resize(img, (w, h), interpolation=cv2.INTER_LINEAR)
+
+
+def crop_image(img, target_size, center):
+
+    height, width = img.shape[:2]
+    size = target_size
+
+    if center == True:
+        w_start = (width - size) // 2
+        h_start = (height - size) // 2
+    else:
+        w_start = np.random.randint(0, width - size + 1)
+        h_start = np.random.randint(0, height - size + 1)
+    w_end = w_start + size
+    h_end = h_start + size
+
+    return img[h_start:h_end, w_start:w_end, :]
+
+
+def cls_image_preprocess(img):
+
+    # resize
+    img = cls_resize_short(img, target_size=256)
+    # crop
+    img = crop_image(img, target_size=224, center=True)
+
+    #ToCHWImage & Normalize
+    img = np.transpose(img / 255, [2, 0, 1])
+
+    img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
+    img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
+    img -= img_mean
+    img /= img_std
+
+    return img.astype(np.float32)
+
+
+def ppdet_resize_no_keepratio(img, target_shape=[640, 640]):
+    im_shape = img.shape
+
+    resize_h, resize_w = target_shape
+    im_scale_y = resize_h / im_shape[0]
+    im_scale_x = resize_w / im_shape[1]
+
+    scale_factor = np.asarray([im_scale_y, im_scale_x], dtype=np.float32)
+    return cv2.resize(
+        img, None, None, fx=im_scale_x, fy=im_scale_y,
+        interpolation=2), scale_factor
+
+
+def ppdet_normliaze(img, is_scale=True):
+
+    mean = [0.485, 0.456, 0.406]
+    std = [0.229, 0.224, 0.225]
+    img = img.astype(np.float32, copy=False)
+
+    if is_scale:
+        scale = 1.0 / 255.0
+        img *= scale
+
+    mean = np.array(mean)[np.newaxis, np.newaxis, :]
+    std = np.array(std)[np.newaxis, np.newaxis, :]
+    img -= mean
+    img /= std
+    return img
+
+
+def hwc_to_chw(img):
+    img = img.transpose((2, 0, 1))
+    return img
+
+
+def ppdet_image_preprocess(img):
+
+    img, scale_factor = ppdet_resize_no_keepratio(img, target_shape=[640, 640])
+
+    img = np.transpose(img / 255, [2, 0, 1])
+
+    img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
+    img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
+    img -= img_mean
+    img /= img_std
+
+    return img.astype(np.float32), scale_factor
--- a/tools/quantization/fdquant/fdquant.py
+++ b/tools/quantization/fdquant/fdquant.py
@@ -0,0 +1,155 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import time
+import argparse
+from tqdm import tqdm
+import paddle
+from paddleslim.common import load_config, load_onnx_model
+from paddleslim.auto_compression import AutoCompression
+from paddleslim.quant import quant_post_static
+from fdquant.dataset import *
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--method',
+        type=str,
+        default=None,
+        help="choose PTQ or QAT as quantization method",
+        required=True)
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='output',
+        help="directory to save compressed model.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+
+    return parser
+
+
+def reader_wrapper(reader, input_list=None):
+    def gen():
+        for data_list in reader:
+            in_dict = {}
+            for data in data_list:
+                for i, input_name in enumerate(input_list):
+                    in_dict[input_name] = data[i]
+                yield in_dict
+
+    return gen
+
+
+def main():
+
+    time_s = time.time()
+
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+
+    global global_config
+    all_config = load_config(FLAGS.config_path)
+    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    global_config = all_config["Global"]
+    input_list = global_config['input_list']
+
+    assert os.path.exists(global_config[
+        'image_path']), "image_path does not exist!"
+    paddle.vision.image.set_image_backend('cv2')
+    # transform could be customized.
+    train_dataset = paddle.vision.datasets.ImageFolder(
+        global_config['image_path'],
+        transform=eval(global_config['preprocess']))
+    train_loader = paddle.io.DataLoader(
+        train_dataset,
+        batch_size=1,
+        shuffle=True,
+        drop_last=True,
+        num_workers=0)
+    train_loader = reader_wrapper(train_loader, input_list=input_list)
+    eval_func = None
+
+    # ACT compression
+    if FLAGS.method == 'QAT':
+        ac = AutoCompression(
+            model_dir=global_config['model_dir'],
+            model_filename=global_config['model_filename'],
+            params_filename=global_config['params_filename'],
+            train_dataloader=train_loader,
+            save_dir=FLAGS.save_dir,
+            config=all_config,
+            eval_callback=eval_func)
+        ac.compress()
+
+    # PTQ compression
+    if FLAGS.method == 'PTQ':
+
+        # Read PTQ config
+        assert "PTQ" in all_config, f"Key 'PTQ' not found in config file. \n{all_config}"
+        ptq_config = all_config["PTQ"]
+
+        # Inititalize the executor
+        place = paddle.CUDAPlace(
+            0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+        exe = paddle.static.Executor(place)
+
+        # Read ONNX or PADDLE format model
+        if global_config['format'] == 'onnx':
+            load_onnx_model(global_config["model_dir"])
+            inference_model_path = global_config["model_dir"].rstrip().rstrip(
+                '.onnx') + '_infer'
+        else:
+            inference_model_path = global_config["model_dir"].rstrip('/')
+
+        quant_post_static(
+            executor=exe,
+            model_dir=inference_model_path,
+            quantize_model_path=FLAGS.save_dir,
+            data_loader=train_loader,
+            model_filename=global_config["model_filename"],
+            params_filename=global_config["params_filename"],
+            batch_size=32,
+            batch_nums=10,
+            algo=ptq_config['calibration_method'],
+            hist_percent=0.999,
+            is_full_quantize=False,
+            bias_correction=False,
+            onnx_format=True,
+            skip_tensor_list=ptq_config['skip_tensor_list']
+            if 'skip_tensor_list' in ptq_config else None)
+
+    time_total = time.time() - time_s
+    print("Finish Compression, total time used is : ", time_total, "seconds.")
+
+
+if __name__ == '__main__':
+    main()
--- a/tools/quantization/readme.md
+++ b/tools/quantization/readme.md
@@ -0,0 +1,120 @@
+# FastDeploy 一键模型量化
+FastDeploy 给用户提供了一键量化功能, 支持离线量化和量化蒸馏训练. 本文档已Yolov5s为例, 用户可参考如何安装并执行FastDeploy的一键量化功能.
+
+## 1.安装
+
+### 环境依赖
+
+1.用户参考PaddlePaddle官网, 安装develop版本
+```
+https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html
+```
+
+2.安装paddleslim-develop版本
+```bash
+git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
+python setup.py install
+```
+
+### FastDeploy-Quantization 安装方式
+用户在当前目录下，运行如下命令:
+```
+python setup.py install
+```
+
+## 2.使用方式
+
+### 一键离线量化示例
+
+#### 离线量化
+
+##### 1. 准备模型和Calibration数据集
+用户需要自行准备待量化模型与Calibration数据集.
+本例中用户可执行以下命令, 下载待量化的yolov5s.onnx模型和我们为用户准备的Calibration数据集示例.
+
+```shell
+# 下载yolov5.onnx
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
+
+# 下载数据集, 此Calibration数据集为COCO val2017中的前320张图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
+tar -xvf COCO_val_320.tar.gz
+```
+
+##### 2.使用fastdeploy_quant命令，执行一键模型量化:
+
+```shell
+fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/'
+```
+
+##### 3.参数说明
+
+| 参数                 | 作用                                                         |
+| -------------------- | ------------------------------------------------------------ |
+| --config_path          | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md)                        |
+| --method               | 量化方式选择, 离线量化选PTQ，量化蒸馏训练选QAT     |
+| --save_dir             | 产出的量化后模型路径, 该模型可直接在FastDeploy部署     |
+
+注意：目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
+
+
+#### 量化蒸馏训练
+
+##### 1.准备待量化模型和训练数据集
+FastDeploy目前的量化蒸馏训练，只支持无标注图片训练，训练过程中不支持评估模型精度.
+数据集为真实预测场景下的图片，图片数量依据数据集大小来定，尽量覆盖所有部署场景. 此例中，我们为用户准备了COCO2017验证集中的前320张图片.
+
+```shell
+# 下载yolov5.onnx
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
+
+# 下载数据集, 此Calibration数据集为COCO2017验证集中的前320张图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
+tar -xvf COCO_val_320.tar.gz
+```
+
+##### 2.使用fastdeploy_quant命令，执行一键模型量化:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/'
+```
+
+##### 3.参数说明
+
+| 参数                 | 作用                                                         |
+| -------------------- | ------------------------------------------------------------ |
+| --config_path          | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) |
+| --method               | 量化方式选择, 离线量化选PTQ，量化蒸馏训练选QAT     |
+| --save_dir             | 产出的量化后模型路径, 该模型可直接在FastDeploy部署     |
+
+注意：目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
+
+
+## 3. FastDeploy 部署量化模型
+用户在获得量化模型之后，只需要简单地传入量化后的模型路径及相应参数，即可以使用FastDeploy进行部署.
+具体请用户参考示例文档:
+- [YOLOv5s 量化模型Python部署](../examples/slim/yolov5s/python/)
+- [YOLOv5s 量化模型C++部署](../examples/slim/yolov5s/cpp/)
+- [YOLOv6s 量化模型Python部署](../examples/slim/yolov6s/python/)
+- [YOLOv6s 量化模型C++部署](../examples/slim/yolov6s/cpp/)
+- [YOLOv7 量化模型Python部署](../examples/slim/yolov7/python/)
+- [YOLOv7 量化模型C++部署](../examples/slim/yolov7/cpp/)
+
+## 4.Benchmark
+下表为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为COCO val2017中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |
+| YOLOv5s             | TensorRT         |    GPU    |  14.13        |  11.22      |      1.26         | 37.6  | 36.6 |
+| YOLOv5s             | ONNX Runtime     |    CPU    |  183.68       |    100.39   |      1.83         | 37.6  | 33.1 |
+| YOLOv5s             | Paddle Inference  |    CPU    |      226.36   |   152.27     |      1.48         |37.6 | 36.8 |
+| YOLOv6s             | TensorRT         |    GPU    |       12.89        |   8.92          |  1.45             | 42.5 | 40.6|
+| YOLOv6s             | ONNX Runtime     |    CPU    |   345.85            |  131.81           |      2.60         |42.5| 36.1|
+| YOLOv6s             | Paddle Inference  |    CPU    |         366.41      |    131.70         |     2.78          |42.5| 41.2|
+| YOLOv7             | TensorRT          |    GPU    |     30.43          |      15.40       |       1.98        | 51.1| 50.8|
+| YOLOv7             | ONNX Runtime     |    CPU    |     971.27          |  471.88           |  2.06             | 51.1 | 42.5|
+| YOLOv7             | Paddle Inference  |    CPU    |          1015.70     |      562.41       |    1.82           |51.1 | 46.3|
--- a/tools/quantization/requirements.txt
+++ b/tools/quantization/requirements.txt
@@ -0,0 +1 @@
+paddleslim
--- a/tools/quantization/setup.py
+++ b/tools/quantization/setup.py
@@ -0,0 +1,25 @@
+import setuptools
+import fdquant
+
+long_description = "FDQuant is a toolkit for model quantization of FastDeploy.\n\n"
+long_description += "Usage: fastdeploy_quant --config_path=./yolov7_tiny_qat_dis.yaml --method='QAT' --save_dir='../v7_qat_outmodel/' \n"
+
+with open("requirements.txt") as fin:
+    REQUIRED_PACKAGES = fin.read()
+
+setuptools.setup(
+    name="fastdeploy-quantization",  # name of package
+    description="A toolkit for model quantization of FastDeploy.",
+    long_description=long_description,
+    long_description_content_type="text/plain",
+    packages=setuptools.find_packages(),
+    install_requires=REQUIRED_PACKAGES,
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: Apache Software License",
+        "Operating System :: OS Independent",
+    ],
+    license='Apache 2.0',
+    entry_points={
+        'console_scripts': ['fastdeploy_quant=fdquant.fdquant:main', ]
+    })