diff --git a/.new_docs/cn/quantize.md b/.new_docs/cn/quantize.md
index eb626c6e6..5411cac29 100644
--- a/.new_docs/cn/quantize.md
+++ b/.new_docs/cn/quantize.md
@@ -1,11 +1,67 @@
 [English](../en/quantize.md) | 简体中文
 
 # 量化加速
+量化是一种流行的模型压缩方法，量化后的模型拥有更小的体积和更快的推理速度.
+FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速.
 
-简要介绍量化加速的原理。
 
-目前量化支持在哪些硬件及后端的使用
+## FastDeploy 多个引擎和硬件支持量化模型部署
+当前，FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下:
+
+| 硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT |
+| :-----------| :--------   | :--------------- | :------- |
+|   CPU       |  支持        |  支持            |          |  
+|   GPU       |             |                  | 支持      |
+
+
+## 模型量化
+
+### 量化方法
+基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署.
+
+两种方法的主要对比如下表所示:
+| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 |
+| :-----------| :--------| :-------| :------- | :------- |
+|   离线量化      |  无需训练，耗时短 |  比量化蒸馏训练稍低       | 两者一致   | 两者一致   |  
+|   量化蒸馏训练      |  需要训练，耗时稍高 |  较未量化模型有少量损失 | 两者一致   |两者一致   |  
+
+### 用户使用FastDeploy一键模型量化工具来量化模型
+Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具，请参考如下文档进行模型量化.
+- [FastDeploy 一键模型量化](../../tools/quantization/)
+当用户获得产出的量化模型之后，即可以使用FastDeploy来部署量化模型.
+
 
 ## 量化示例
+目前, FastDeploy已支持的模型量化如下表所示:
 
-这里一个表格，展示目前支持的量化列表(跳转到相应的example下去)，精度、性能
+### YOLO 系列
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP | 量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/)             | TensorRT         |    GPU    |  14.13        |  11.22      |      1.26         | 37.6  | 36.6 | 量化蒸馏训练 |
+| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/)              | ONNX Runtime     |    CPU    |  183.68       |    100.39   |      1.83         | 37.6  | 33.1 |量化蒸馏训练 |
+| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/)              | Paddle Inference  |    CPU    |      226.36   |   152.27     |      1.48         |37.6 | 36.8 | 量化蒸馏训练 |
+| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/)            | TensorRT         |    GPU    |       12.89        |   8.92          |  1.45             | 42.5 | 40.6|量化蒸馏训练 |
+| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/)            | ONNX Runtime     |    CPU    |   345.85            |  131.81           |      2.60         |42.5| 36.1|量化蒸馏训练 |
+| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/)             | Paddle Inference  |    CPU    |         366.41      |    131.70         |     2.78          |42.5| 41.2|量化蒸馏训练 |
+| [YOLOv7](../../examples/vision/detection/yolov7/quantize/)            | TensorRT          |    GPU    |     30.43          |      15.40       |       1.98        | 51.1| 50.8|量化蒸馏训练 |
+| [YOLOv7](../../examples/vision/detection/yolov7/quantize/)             | ONNX Runtime     |    CPU    |     971.27          |  471.88           |  2.06             | 51.1 | 42.5|量化蒸馏训练 |
+| [YOLOv7](../../examples/vision/detection/yolov7/quantize/)             | Paddle Inference  |    CPU    |          1015.70     |      562.41       |    1.82           |51.1 | 46.3|量化蒸馏训练 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试数据为COCO2017验证集中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+
+### PaddleClas系列
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 Top1 | INT8 Top1 |量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/)            | ONNX Runtime         |    CPU    |  86.87        |  59 .32     |      1.46         | 79.12  | 78.87|  离线量化|
+| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/)            | TensorRT         |    GPU    |  7.85        |  5.42      |      1.45         | 79.12  | 79.06 | 离线量化 |
+| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/)             | ONNX Runtime |    CPU    |      40.32   |   16.87     |      2.39         |77.89 | 75.09 |离线量化 |
+| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/)             | TensorRT  |    GPU    |      5.10   |   3.35     |      1.52         |77.89 | 76.86 | 离线量化 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试数据为ImageNet-2012验证集中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
diff --git a/examples/vision/classification/paddleclas/quantize/README.md b/examples/vision/classification/paddleclas/quantize/README.md
new file mode 100644
index 000000000..3a100a823
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/README.md
@@ -0,0 +1,27 @@
+# PaddleClas 量化模型部署
+FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
+用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
+
+## FastDeploy一键模型量化工具
+FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
+详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
+注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可。
+
+## 下载量化完成的PaddleClas模型
+用户也可以直接下载下表中的量化模型进行部署.
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 Top1 | INT8 Top1 |量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar)            | ONNX Runtime         |    CPU    |  86.87        |  59 .32     |      1.46         | 79.12  | 78.87|  离线量化|
+| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar)            | TensorRT         |    GPU    |  7.85        |  5.42      |      1.45         | 79.12  | 79.06 | 离线量化 |
+| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar)             | ONNX Runtime |    CPU    |      40.32   |   16.87     |      2.39         |77.89 | 75.09 |离线量化 |
+| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar)             | TensorRT  |    GPU    |      5.10   |   3.35     |      1.52         |77.89 | 76.86 | 离线量化 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为ImageNet-2012验证集中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+## 详细部署文档
+
+- [Python部署](python)
+- [C++部署](cpp)
diff --git a/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt b/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt
new file mode 100644
index 000000000..fea1a2888
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/cpp/CMakeLists.txt
@@ -0,0 +1,14 @@
+PROJECT(infer_demo C CXX)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
+# 添加FastDeploy库依赖
+target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
diff --git a/examples/vision/classification/paddleclas/quantize/cpp/README.md b/examples/vision/classification/paddleclas/quantize/cpp/README.md
new file mode 100644
index 000000000..2c6c9b73e
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/cpp/README.md
@@ -0,0 +1,33 @@
+# PaddleClas 量化模型 Python部署示例
+本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
+
+## 以量化后的ResNet50_Vd模型为例, 进行部署
+在本目录执行如下命令即可完成编译,以及量化模型部署.
+```bash
+mkdir build
+cd build
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
+tar xvf fastdeploy-linux-x64-0.2.0.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
+make -j
+
+#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar
+tar -xvf resnet50_vd_ptq.tar
+wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
+
+
+# 在CPU上使用Paddle-Inference推理量化模型
+./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 0
+# 在GPU上使用TensorRT推理量化模型
+./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 1
+```
diff --git a/examples/vision/classification/paddleclas/quantize/cpp/infer.cc b/examples/vision/classification/paddleclas/quantize/cpp/infer.cc
new file mode 100644
index 000000000..ed4f05a24
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/cpp/infer.cc
@@ -0,0 +1,76 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision.h"
+#ifdef WIN32
+const char sep = '\\';
+#else
+const char sep = '/';
+#endif
+
+void InitAndInfer(const std::string& model_dir, const std::string& image_file,
+                  const fastdeploy::RuntimeOption& option) {
+  auto model_file = model_dir + sep + "inference.pdmodel";
+  auto params_file = model_dir + sep + "inference.pdiparams";
+  auto config_file = model_dir + sep + "inference_cls.yaml";
+
+  auto model = fastdeploy::vision::classification::PaddleClasModel(
+      model_file, params_file, config_file, option);
+
+  assert(model.Initialized());
+
+  auto im = cv::imread(image_file);
+  auto im_bak = im.clone();
+
+  fastdeploy::vision::ClassifyResult res;
+  if (!model.Predict(&im, &res)) {
+    std::cerr << "Failed to predict." << std::endl;
+    return;
+  }
+
+  std::cout << res.Str() << std::endl;
+
+}
+
+int main(int argc, char* argv[]) {
+  if (argc < 4) {
+    std::cout << "Usage: infer_demo path/to/quant_model "
+                 "path/to/image "
+                 "run_option, "
+                 "e.g ./infer_demo ./ResNet50_vd_quant ./test.jpeg 0"
+              << std::endl;
+    std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
+                 "backend; 1: run "
+                 "on gpu with TensorRT backend. "
+              << std::endl;
+    return -1;
+  }
+
+  fastdeploy::RuntimeOption option;
+  int flag = std::atoi(argv[3]);
+
+  if (flag == 0) {
+    option.UseCpu();
+    option.UseOrtBackend();
+  } else if (flag == 1) {
+    option.UseGpu();
+    option.UseTrtBackend();
+    option.SetTrtInputShape("inputs",{1, 3, 224, 224});
+  }
+
+  std::string model_dir = argv[1];
+  std::string test_image = argv[2];
+  InitAndInfer(model_dir, test_image, option);
+  return 0;
+}
diff --git a/examples/vision/classification/paddleclas/quantize/python/README.md b/examples/vision/classification/paddleclas/quantize/python/README.md
new file mode 100644
index 000000000..88730f5df
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/python/README.md
@@ -0,0 +1,29 @@
+# PaddleClas 量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
+
+
+## 以量化后的ResNet50_Vd模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd examples/vision/classification/paddleclas/quantize/python
+
+#下载FastDeloy提供的ResNet50_Vd量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar
+tar -xvf resnet50_vd_ptq.tar
+wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
+
+# 在CPU上使用Paddle-Inference推理量化模型
+python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device cpu --backend ort
+# 在GPU上使用TensorRT推理量化模型
+python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device gpu --backend trt
+```
diff --git a/examples/vision/classification/paddleclas/quantize/python/infer.py b/examples/vision/classification/paddleclas/quantize/python/infer.py
new file mode 100644
index 000000000..0a4df1768
--- /dev/null
+++ b/examples/vision/classification/paddleclas/quantize/python/infer.py
@@ -0,0 +1,77 @@
+import fastdeploy as fd
+import cv2
+import os
+import time
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of paddleclas model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inferences on device GPU."
+        option.use_trt_backend()
+        option.set_trt_input_shape("inputs", min_shape=[1, 3, 224, 224])
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+
+model_file = os.path.join(args.model, "inference.pdmodel")
+params_file = os.path.join(args.model, "inference.pdiparams")
+config_file = os.path.join(args.model, "inference_cls.yaml")
+
+model = fd.vision.classification.PaddleClasModel(
+    model_file, params_file, config_file, runtime_option=runtime_option)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)
diff --git a/examples/vision/detection/paddledetection/quantize/README.md b/examples/vision/detection/paddledetection/quantize/README.md
new file mode 100644
index 000000000..f3e87e70d
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/README.md
@@ -0,0 +1,24 @@
+# PaddleDetection 量化模型部署
+FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
+用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
+
+## FastDeploy一键模型量化工具
+FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
+详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
+
+## 下载量化完成的PP-YOLOE-l模型
+用户也可以直接下载下表中的量化模型进行部署.
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP |量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar )  | TensorRT         |    GPU    |  43.83        |  31.57      |      1.39         | 51.4  | 50.7 | 量化蒸馏训练 |
+| [ppyoloe_crn_l_300e_coco](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar )  | ONNX Runtime |    CPU    |      1085.18  |   475.55     |      2.29         |51.4 | 50.0 |量化蒸馏训练 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为COCO val2017中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+## 详细部署文档
+
+- [Python部署](python)
+- [C++部署](cpp)
diff --git a/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt b/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt
new file mode 100644
index 000000000..bd245c9ac
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/cpp/CMakeLists.txt
@@ -0,0 +1,13 @@
+PROJECT(infer_demo C CXX)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_ppyoloe_demo ${PROJECT_SOURCE_DIR}/infer_ppyoloe.cc)
+target_link_libraries(infer_ppyoloe_demo ${FASTDEPLOY_LIBS})
diff --git a/examples/vision/detection/paddledetection/quantize/cpp/README.md b/examples/vision/detection/paddledetection/quantize/cpp/README.md
new file mode 100644
index 000000000..43ccbd33d
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/cpp/README.md
@@ -0,0 +1,33 @@
+# PP-YOLOE-l量化模型 C++部署示例
+
+本目录下提供的`infer_ppyoloe.cc`,可以帮助用户快速完成PP-YOLOE-l量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
+
+## 以量化后的PP-YOLOE-l模型为例, 进行部署
+在本目录执行如下命令即可完成编译,以及量化模型部署.
+```bash
+mkdir build
+cd build
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
+tar xvf fastdeploy-linux-x64-0.2.0.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
+make -j
+
+#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar
+tar -xvf ppyoloe_crn_l_300e_coco_qat.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+# 在CPU上使用ONNX Runtime推理量化模型
+./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 0
+# 在GPU上使用TensorRT推理量化模型
+./infer_ppyoloe_demo ppyoloe_crn_l_300e_coco_qat 000000014439.jpg 1
+```
diff --git a/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc b/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc
new file mode 100644
index 000000000..9ed06b575
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/cpp/infer_ppyoloe.cc
@@ -0,0 +1,80 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision.h"
+#ifdef WIN32
+const char sep = '\\';
+#else
+const char sep = '/';
+#endif
+
+void InitAndInfer(const std::string& model_dir, const std::string& image_file,
+                  const fastdeploy::RuntimeOption& option) {
+  auto model_file = model_dir + sep + "model.pdmodel";
+  auto params_file = model_dir + sep + "model.pdiparams";
+  auto config_file = model_dir + sep + "infer_cfg.yml";
+
+  auto model = fastdeploy::vision::detection::PPYOLOE(model_file, params_file,
+                                                      config_file, option);
+  assert(model.Initialized());
+
+  auto im = cv::imread(image_file);
+  auto im_bak = im.clone();
+
+  fastdeploy::vision::DetectionResult res;
+  if (!model.Predict(&im, &res)) {
+    std::cerr << "Failed to predict." << std::endl;
+    return;
+  }
+
+  std::cout << res.Str() << std::endl;
+
+  auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res, 0.5);
+  cv::imwrite("vis_result.jpg", vis_im);
+  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
+
+}
+
+int main(int argc, char* argv[]) {
+  if (argc < 4) {
+    std::cout << "Usage: infer_demo path/to/quant_model "
+                 "path/to/image "
+                 "run_option, "
+                 "e.g ./infer_demo ./PPYOLOE_L_quant ./test.jpeg 0"
+              << std::endl;
+    std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
+                 "backend; 1: run "
+                 "on gpu with TensorRT backend. "
+              << std::endl;
+    return -1;
+  }
+
+  fastdeploy::RuntimeOption option;
+  int flag = std::atoi(argv[3]);
+
+  if (flag == 0) {
+    option.UseCpu();
+    option.UseOrtBackend();
+  } else if (flag == 1) {
+    option.UseGpu();
+    option.UseTrtBackend();
+    option.SetTrtInputShape("inputs",{1, 3, 640, 640});
+    option.SetTrtInputShape("scale_factor",{1,2});
+  }
+
+  std::string model_dir = argv[1];
+  std::string test_image = argv[2];
+  InitAndInfer(model_dir, test_image, option);
+  return 0;
+}
diff --git a/examples/vision/detection/paddledetection/quantize/python/README.md b/examples/vision/detection/paddledetection/quantize/python/README.md
new file mode 100644
index 000000000..9df40f570
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/python/README.md
@@ -0,0 +1,29 @@
+# PP-YOLOE-l量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成PP-YOLOE量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的infer_cfg.yml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
+
+
+## 以量化后的PP-YOLOE-l模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd /examples/vision/detection/paddledetection/quantize/python
+
+#下载FastDeloy提供的ppyoloe_crn_l_300e_coco量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco_qat.tar
+tar -xvf ppyoloe_crn_l_300e_coco_qat.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+# 在CPU上使用ONNX Runtime推理量化模型
+python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device cpu --backend ort
+# 在GPU上使用TensorRT推理量化模型
+python infer_ppyoloe.py --model ppyoloe_crn_l_300e_coco_qat --image 000000014439.jpg --device gpu --backend trt
+```
diff --git a/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py b/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py
new file mode 100644
index 000000000..85f3c9d55
--- /dev/null
+++ b/examples/vision/detection/paddledetection/quantize/python/infer_ppyoloe.py
@@ -0,0 +1,82 @@
+import fastdeploy as fd
+import cv2
+import os
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of PPYOLOE model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inferences on device GPU."
+        option.use_trt_backend()
+        option.set_trt_cache_file(os.path.join(args.model, "model.trt"))
+        option.set_trt_input_shape("image", min_shape=[1, 3, 640, 640])
+        option.set_trt_input_shape("scale_factor", min_shape=[1, 2])
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+model_file = os.path.join(args.model, "model.pdmodel")
+params_file = os.path.join(args.model, "model.pdiparams")
+config_file = os.path.join(args.model, "infer_cfg.yml")
+
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+model = fd.vision.detection.PPYOLOE(
+    model_file, params_file, config_file, runtime_option=runtime_option)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)
+
+# 预测结果可视化
+vis_im = fd.vision.vis_detection(im, result, score_threshold=0.5)
+cv2.imwrite("visualized_result.jpg", vis_im)
+print("Visualized result save in ./visualized_result.jpg")
diff --git a/examples/vision/detection/yolov5/quantize/README.md b/examples/vision/detection/yolov5/quantize/README.md
new file mode 100644
index 000000000..16dff9e84
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/README.md
@@ -0,0 +1,24 @@
+# YOLOv5量化模型部署
+FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
+用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
+
+## FastDeploy一键模型量化工具
+FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
+详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
+
+## 下载量化完成的YOLOv5s模型
+用户也可以直接下载下表中的量化模型进行部署.
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP |量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar)            | TensorRT         |    GPU    |  14.13        |  11.22      |      1.26         | 37.6  | 36.6 | 量化蒸馏训练 |
+| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar)             | Paddle Inference  |    CPU    |      226.36   |   152.27     |      1.48         |37.6 | 36.8 |量化蒸馏训练 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为COCO val2017中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+## 详细部署文档
+
+- [Python部署](python)
+- [C++部署](cpp)
diff --git a/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt
new file mode 100644
index 000000000..fea1a2888
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/cpp/CMakeLists.txt
@@ -0,0 +1,14 @@
+PROJECT(infer_demo C CXX)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
+# 添加FastDeploy库依赖
+target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
diff --git a/examples/vision/detection/yolov5/quantize/cpp/README.md b/examples/vision/detection/yolov5/quantize/cpp/README.md
new file mode 100644
index 000000000..2a6733768
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/cpp/README.md
@@ -0,0 +1,34 @@
+# YOLOv5量化模型 C++部署示例
+
+本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv5s量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+## 以量化后的YOLOv5s模型为例, 进行部署
+在本目录执行如下命令即可完成编译,以及量化模型部署.
+```bash
+mkdir build
+cd build
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
+tar xvf fastdeploy-linux-x64-0.2.0.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
+make -j
+
+#下载FastDeloy提供的yolov5s量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar
+tar -xvf yolov5s_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+
+# 在CPU上使用Paddle-Inference推理量化模型
+./infer_demo yolov5s_quant 000000014439.jpg 0
+# 在GPU上使用TensorRT推理量化模型
+./infer_demo yolov5s_quant 000000014439.jpg 1
+```
diff --git a/examples/vision/detection/yolov5/quantize/cpp/infer.cc b/examples/vision/detection/yolov5/quantize/cpp/infer.cc
new file mode 100644
index 000000000..88a9e15fc
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/cpp/infer.cc
@@ -0,0 +1,77 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision.h"
+#ifdef WIN32
+const char sep = '\\';
+#else
+const char sep = '/';
+#endif
+
+void InitAndInfer(const std::string& model_dir, const std::string& image_file,
+                  const fastdeploy::RuntimeOption& option) {
+  auto model_file = model_dir + sep + "model.pdmodel";
+  auto params_file = model_dir + sep + "model.pdiparams";
+
+  auto model = fastdeploy::vision::detection::YOLOv5(
+      model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
+  assert(model.Initialized());
+
+  auto im = cv::imread(image_file);
+  auto im_bak = im.clone();
+
+  fastdeploy::vision::DetectionResult res;
+  if (!model.Predict(&im, &res)) {
+    std::cerr << "Failed to predict." << std::endl;
+    return;
+  }
+
+  std::cout << res.Str() << std::endl;
+
+  auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
+  cv::imwrite("vis_result.jpg", vis_im);
+  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
+}
+
+int main(int argc, char* argv[]) {
+  if (argc < 4) {
+    std::cout << "Usage: infer_demo path/to/quant_model "
+                 "path/to/image "
+                 "run_option, "
+                 "e.g ./infer_demo ./yolov5s_quant ./000000014439.jpg 0"
+              << std::endl;
+    std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
+                 "backend; 1: run "
+                 "on cpu with Paddle backend ; 2: run with gpu and use "
+                 "TensorRT backend."
+              << std::endl;
+    return -1;
+  }
+
+  fastdeploy::RuntimeOption option;
+  int flag = std::atoi(argv[3]);
+
+  if (flag == 0) {
+    option.UseCpu();
+    option.UsePaddleBackend();
+  } else if (flag == 1) {
+    option.UseGpu();
+    option.UseTrtBackend();
+  }
+
+  std::string model_dir = argv[1];
+  std::string test_image = argv[2];
+  InitAndInfer(model_dir, test_image, option);
+  return 0;
+}
diff --git a/examples/vision/detection/yolov5/quantize/python/README.md b/examples/vision/detection/yolov5/quantize/python/README.md
new file mode 100644
index 000000000..adc9eeba5
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/python/README.md
@@ -0,0 +1,29 @@
+# YOLOv5s量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv5量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+
+## 以量化后的YOLOv5s模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd examples/vision/detection/yolov5/quantize/python
+
+#下载FastDeloy提供的yolov5s量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar
+tar -xvf yolov5s_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+# 在CPU上使用Paddle-Inference推理量化模型
+python infer.py --model yolov5s_quant --image 000000014439.jpg --device cpu --backend paddle
+# 在GPU上使用TensorRT推理量化模型
+python infer.py --model yolov5s_quant --image 000000014439.jpg --device gpu --backend trt
+```
diff --git a/examples/vision/detection/yolov5/quantize/python/infer.py b/examples/vision/detection/yolov5/quantize/python/infer.py
new file mode 100644
index 000000000..aa56ef18b
--- /dev/null
+++ b/examples/vision/detection/yolov5/quantize/python/infer.py
@@ -0,0 +1,81 @@
+import fastdeploy as fd
+import cv2
+import os
+from fastdeploy import ModelFormat
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of yolov5 onnx model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inference on device GPU."
+        option.use_trt_backend()
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+model_file = os.path.join(args.model, "model.pdmodel")
+params_file = os.path.join(args.model, "model.pdiparams")
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+model = fd.vision.detection.YOLOv5(
+    model_file,
+    params_file,
+    runtime_option=runtime_option,
+    model_format=ModelFormat.PADDLE)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)
+
+# 预测结果可视化
+vis_im = fd.vision.vis_detection(im, result)
+cv2.imwrite("visualized_result.jpg", vis_im)
+print("Visualized result save in ./visualized_result.jpg")
diff --git a/examples/vision/detection/yolov6/quantize/README.md b/examples/vision/detection/yolov6/quantize/README.md
new file mode 100644
index 000000000..594d59e5c
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/README.md
@@ -0,0 +1,25 @@
+# YOLOv6量化模型部署
+FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
+用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
+
+## FastDeploy一键模型量化工具
+FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
+详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
+
+## 下载量化完成的YOLOv6s模型
+用户也可以直接下载下表中的量化模型进行部署.
+
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP | 量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- | ------ |
+| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar)             | TensorRT         |    GPU    |       12.89        |   8.92          |  1.45             | 42.5 | 40.6| 量化蒸馏训练 |
+| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar)            | Paddle Inference  |    CPU    |         366.41      |    131.70         |     2.78          |42.5| 41.2|量化蒸馏训练 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为COCO val2017中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+## 详细部署文档
+
+- [Python部署](python)
+- [C++部署](cpp)
diff --git a/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt
new file mode 100644
index 000000000..fea1a2888
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/cpp/CMakeLists.txt
@@ -0,0 +1,14 @@
+PROJECT(infer_demo C CXX)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
+# 添加FastDeploy库依赖
+target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
diff --git a/examples/vision/detection/yolov6/quantize/cpp/README.md b/examples/vision/detection/yolov6/quantize/cpp/README.md
new file mode 100644
index 000000000..5713abcfb
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/cpp/README.md
@@ -0,0 +1,34 @@
+# YOLOv6量化模型 C++部署示例
+
+本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv6s量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+## 以量化后的YOLOv6s模型为例, 进行部署
+在本目录执行如下命令即可完成编译,以及量化模型部署.
+```bash
+mkdir build
+cd build
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
+tar xvf fastdeploy-linux-x64-0.2.0.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
+make -j
+
+#下载FastDeloy提供的yolov6s量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar
+tar -xvf yolov6s_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+
+# 在CPU上使用Paddle-Inference推理量化模型
+./infer_demo yolov6s_quant 000000014439.jpg 0
+# 在GPU上使用TensorRT推理量化模型
+./infer_demo yolov6s_quant 000000014439.jpg 1
+```
diff --git a/examples/vision/detection/yolov6/quantize/cpp/infer.cc b/examples/vision/detection/yolov6/quantize/cpp/infer.cc
new file mode 100644
index 000000000..f7a9d2c16
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/cpp/infer.cc
@@ -0,0 +1,77 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision.h"
+#ifdef WIN32
+const char sep = '\\';
+#else
+const char sep = '/';
+#endif
+
+void InitAndInfer(const std::string& model_dir, const std::string& image_file,
+                  const fastdeploy::RuntimeOption& option) {
+  auto model_file = model_dir + sep + "model.pdmodel";
+  auto params_file = model_dir + sep + "model.pdiparams";
+
+  auto model = fastdeploy::vision::detection::YOLOv6(
+      model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
+  assert(model.Initialized());
+
+  auto im = cv::imread(image_file);
+  auto im_bak = im.clone();
+
+  fastdeploy::vision::DetectionResult res;
+  if (!model.Predict(&im, &res)) {
+    std::cerr << "Failed to predict." << std::endl;
+    return;
+  }
+
+  std::cout << res.Str() << std::endl;
+
+  auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
+  cv::imwrite("vis_result.jpg", vis_im);
+  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
+}
+
+int main(int argc, char* argv[]) {
+  if (argc < 4) {
+    std::cout << "Usage: infer_demo path/to/quant_model "
+                 "path/to/image "
+                 "run_option, "
+                 "e.g ./infer_demo ./yolov6s_quant ./000000014439.jpg 0"
+              << std::endl;
+    std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
+                 "backend; 1: run "
+                 "on cpu with Paddle backend ; 2: run with gpu and use "
+                 "TensorRT backend."
+              << std::endl;
+    return -1;
+  }
+
+  fastdeploy::RuntimeOption option;
+  int flag = std::atoi(argv[3]);
+
+  if (flag == 0) {
+    option.UseCpu();
+    option.UsePaddleBackend();
+  } else if (flag == 1) {
+    option.UseGpu();
+    option.UseTrtBackend();
+  }
+
+  std::string model_dir = argv[1];
+  std::string test_image = argv[2];
+  InitAndInfer(model_dir, test_image, option);
+  return 0;
+}
diff --git a/examples/vision/detection/yolov6/quantize/python/README.md b/examples/vision/detection/yolov6/quantize/python/README.md
new file mode 100644
index 000000000..48af7a7f6
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/python/README.md
@@ -0,0 +1,28 @@
+# YOLOv6量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv6量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+## 以量化后的YOLOv6s模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd examples/slim/yolov6/python
+
+#下载FastDeloy提供的yolov6s量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar
+tar -xvf yolov6s_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+# 在CPU上使用Paddle-Inference推理量化模型
+python infer.py --model yolov6s_quant --image 000000014439.jpg --device cpu --backend paddle
+# 在GPU上使用TensorRT推理量化模型
+python infer.py --model yolov6s_quant --image 000000014439.jpg --device gpu --backend trt
+```
diff --git a/examples/vision/detection/yolov6/quantize/python/infer.py b/examples/vision/detection/yolov6/quantize/python/infer.py
new file mode 100644
index 000000000..ec0602272
--- /dev/null
+++ b/examples/vision/detection/yolov6/quantize/python/infer.py
@@ -0,0 +1,81 @@
+import fastdeploy as fd
+import cv2
+import os
+from fastdeploy import ModelFormat
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of yolov6 onnx model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inference on device GPU."
+        option.use_trt_backend()
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+model_file = os.path.join(args.model, "model.pdmodel")
+params_file = os.path.join(args.model, "model.pdiparams")
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+model = fd.vision.detection.YOLOv6(
+    model_file,
+    params_file,
+    runtime_option=runtime_option,
+    model_format=ModelFormat.PADDLE)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)
+
+# 预测结果可视化
+vis_im = fd.vision.vis_detection(im, result)
+cv2.imwrite("visualized_result.jpg", vis_im)
+print("Visualized result save in ./visualized_result.jpg")
diff --git a/examples/vision/detection/yolov7/quantize/README.md b/examples/vision/detection/yolov7/quantize/README.md
new file mode 100644
index 000000000..6d29ea3f3
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/README.md
@@ -0,0 +1,25 @@
+# YOLOv7量化模型部署
+FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
+用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
+
+## FastDeploy一键模型量化工具
+FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
+详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
+
+## 下载量化完成的YOLOv7模型
+用户也可以直接下载下表中的量化模型进行部署.
+
+| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP | 量化方式   |
+| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |----- |
+| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar)            | TensorRT          |    GPU    |     30.43          |      15.40       |       1.98        | 51.1| 50.8| 量化蒸馏训练 |
+| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar)          | Paddle Inference  |    CPU    |          1015.70     |      562.41       |    1.82           |51.1 | 46.3| 量化蒸馏训练 |
+
+上表中的数据, 为模型量化前后，在FastDeploy部署的端到端推理性能.
+- 测试图片为COCO val2017中的图片.
+- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
+- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+
+## 详细部署文档
+
+- [Python部署](python)
+- [C++部署](cpp)
diff --git a/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt b/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt
new file mode 100644
index 000000000..fea1a2888
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/cpp/CMakeLists.txt
@@ -0,0 +1,14 @@
+PROJECT(infer_demo C CXX)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.12)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_demo ${PROJECT_SOURCE_DIR}/infer.cc)
+# 添加FastDeploy库依赖
+target_link_libraries(infer_demo ${FASTDEPLOY_LIBS})
diff --git a/examples/vision/detection/yolov7/quantize/cpp/README.md b/examples/vision/detection/yolov7/quantize/cpp/README.md
new file mode 100644
index 000000000..285454e6e
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/cpp/README.md
@@ -0,0 +1,34 @@
+# YOLOv7量化模型 C++部署示例
+
+本目录下提供的`infer.cc`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+## 以量化后的YOLOv7模型为例, 进行部署
+在本目录执行如下命令即可完成编译,以及量化模型部署.
+```bash
+mkdir build
+cd build
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.0.tgz
+tar xvf fastdeploy-linux-x64-0.2.0.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-0.2.0
+make -j
+
+#下载FastDeloy提供的yolov7量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar
+tar -xvf yolov7_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+
+# 在CPU上使用Paddle-Inference推理量化模型
+./infer_demo yolov7_quant 000000014439.jpg 0
+# 在GPU上使用TensorRT推理量化模型
+./infer_demo yolov7_quant 000000014439.jpg 1
+```
diff --git a/examples/vision/detection/yolov7/quantize/cpp/infer.cc b/examples/vision/detection/yolov7/quantize/cpp/infer.cc
new file mode 100644
index 000000000..45cba4b29
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/cpp/infer.cc
@@ -0,0 +1,77 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision.h"
+#ifdef WIN32
+const char sep = '\\';
+#else
+const char sep = '/';
+#endif
+
+void InitAndInfer(const std::string& model_dir, const std::string& image_file,
+                  const fastdeploy::RuntimeOption& option) {
+  auto model_file = model_dir + sep + "model.pdmodel";
+  auto params_file = model_dir + sep + "model.pdiparams";
+
+  auto model = fastdeploy::vision::detection::YOLOv7(
+      model_file, params_file, option, fastdeploy::ModelFormat::PADDLE);
+  assert(model.Initialized());
+
+  auto im = cv::imread(image_file);
+  auto im_bak = im.clone();
+
+  fastdeploy::vision::DetectionResult res;
+  if (!model.Predict(&im, &res)) {
+    std::cerr << "Failed to predict." << std::endl;
+    return;
+  }
+
+  std::cout << res.Str() << std::endl;
+
+  auto vis_im = fastdeploy::vision::Visualize::VisDetection(im_bak, res);
+  cv::imwrite("vis_result.jpg", vis_im);
+  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
+}
+
+int main(int argc, char* argv[]) {
+  if (argc < 4) {
+    std::cout << "Usage: infer_demo path/to/quant_model "
+                 "path/to/image "
+                 "run_option, "
+                 "e.g ./infer_demo ./yolov7s_quant ./000000014439.jpg 0"
+              << std::endl;
+    std::cout << "The data type of run_option is int, 0: run on cpu with ORT "
+                 "backend; 1: run "
+                 "on cpu with Paddle backend ; 2: run with gpu and use "
+                 "TensorRT backend."
+              << std::endl;
+    return -1;
+  }
+
+  fastdeploy::RuntimeOption option;
+  int flag = std::atoi(argv[3]);
+
+  if (flag == 0) {
+    option.UseCpu();
+    option.UsePaddleBackend();
+  } else if (flag == 1) {
+    option.UseGpu();
+    option.UseTrtBackend();
+  }
+
+  std::string model_dir = argv[1];
+  std::string test_image = argv[2];
+  InitAndInfer(model_dir, test_image, option);
+  return 0;
+}
diff --git a/examples/vision/detection/yolov7/quantize/python/README.md b/examples/vision/detection/yolov7/quantize/python/README.md
new file mode 100644
index 000000000..0cf007038
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/python/README.md
@@ -0,0 +1,28 @@
+# YOLOv7量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成YOLOv7量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/environment.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/quick_start)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.
+
+## 以量化后的YOLOv7模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd examples/vision/detection/yolov7/quantize/python
+
+#下载FastDeloy提供的yolov7量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar
+tar -xvf yolov7_quant.tar
+wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
+
+# 在CPU上使用Paddle-Inference推理量化模型
+python infer.py --model yolov7_quant --image 000000014439.jpg --device cpu --backend paddle
+# 在GPU上使用TensorRT推理量化模型
+python infer.py --model yolov7_quant --image 000000014439.jpg --device gpu --backend trt
+```
diff --git a/examples/vision/detection/yolov7/quantize/python/infer.py b/examples/vision/detection/yolov7/quantize/python/infer.py
new file mode 100644
index 000000000..3c42679e7
--- /dev/null
+++ b/examples/vision/detection/yolov7/quantize/python/infer.py
@@ -0,0 +1,81 @@
+import fastdeploy as fd
+import cv2
+import os
+from fastdeploy import ModelFormat
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of yolov7 onnx model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inference on device GPU."
+        option.use_trt_backend()
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+model_file = os.path.join(args.model, "model.pdmodel")
+params_file = os.path.join(args.model, "model.pdiparams")
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+model = fd.vision.detection.YOLOv7(
+    model_file,
+    params_file,
+    runtime_option=runtime_option,
+    model_format=ModelFormat.PADDLE)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)
+
+# 预测结果可视化
+vis_im = fd.vision.vis_detection(im, result)
+cv2.imwrite("visualized_result.jpg", vis_im)
+print("Visualized result save in ./visualized_result.jpg")
diff --git a/fastdeploy/vision/detection/contrib/yolov7.cc b/fastdeploy/vision/detection/contrib/yolov7.cc
index e776a8c6c..51e7a605c 100644
--- a/fastdeploy/vision/detection/contrib/yolov7.cc
+++ b/fastdeploy/vision/detection/contrib/yolov7.cc
@@ -64,12 +64,13 @@ YOLOv7::YOLOv7(const std::string& model_file, const std::string& params_file,
     valid_cpu_backends = {Backend::OPENVINO, Backend::ORT};
     valid_gpu_backends = {Backend::ORT, Backend::TRT};
   } else {
-    valid_cpu_backends = {Backend::PDINFER};
-    valid_gpu_backends = {Backend::PDINFER};
+    valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
+    valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
   }
   runtime_option = custom_option;
   runtime_option.model_format = model_format;
   runtime_option.model_file = model_file;
+  runtime_option.params_file = params_file;
   initialized = Initialize();
 }
 
diff --git a/tools/quantization/readme.md b/tools/quantization/README.md
similarity index 51%
rename from tools/quantization/readme.md
rename to tools/quantization/README.md
index 600f79441..2bf2da7f2 100644
--- a/tools/quantization/readme.md
+++ b/tools/quantization/README.md
@@ -1,5 +1,6 @@
 # FastDeploy 一键模型量化
-FastDeploy 给用户提供了一键量化功能, 支持离线量化和量化蒸馏训练. 本文档已Yolov5s为例, 用户可参考如何安装并执行FastDeploy的一键量化功能.
+FastDeploy基于PaddleSlim, 给用户提供了一键模型量化的工具, 支持离线量化和量化蒸馏训练.
+本文档以Yolov5s为例, 供用户参考如何安装并执行FastDeploy的一键模型量化.
 
 ## 1.安装
 
@@ -24,7 +25,7 @@ python setup.py install
 
 ## 2.使用方式
 
-### 一键离线量化示例
+### 一键量化示例
 
 #### 离线量化
 
@@ -34,7 +35,7 @@ python setup.py install
 
 ```shell
 # 下载yolov5.onnx
-wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
+wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx
 
 # 下载数据集, 此Calibration数据集为COCO val2017中的前320张图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
@@ -42,20 +43,21 @@ tar -xvf COCO_val_320.tar.gz
 ```
 
 ##### 2.使用fastdeploy_quant命令，执行一键模型量化:
-
+以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可.
 ```shell
 fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/'
 ```
 
 ##### 3.参数说明
 
+目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化.
+
 | 参数                 | 作用                                                         |
 | -------------------- | ------------------------------------------------------------ |
-| --config_path          | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md)                        |
+| --config_path          | 一键量化所需要的量化配置文件.[详解](./configs/README.md)                        |
 | --method               | 量化方式选择, 离线量化选PTQ，量化蒸馏训练选QAT     |
 | --save_dir             | 产出的量化后模型路径, 该模型可直接在FastDeploy部署     |
 
-注意：目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
 
 
 #### 量化蒸馏训练
@@ -63,10 +65,11 @@ fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='
 ##### 1.准备待量化模型和训练数据集
 FastDeploy目前的量化蒸馏训练，只支持无标注图片训练，训练过程中不支持评估模型精度.
 数据集为真实预测场景下的图片，图片数量依据数据集大小来定，尽量覆盖所有部署场景. 此例中，我们为用户准备了COCO2017验证集中的前320张图片.
+注: 如果用户想通过量化蒸馏训练的方法,获得精度更高的量化模型, 可以自行准备更多的数据, 以及训练更多的轮数.
 
 ```shell
 # 下载yolov5.onnx
-wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx
+wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx
 
 # 下载数据集, 此Calibration数据集为COCO2017验证集中的前320张图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz
@@ -74,47 +77,31 @@ tar -xvf COCO_val_320.tar.gz
 ```
 
 ##### 2.使用fastdeploy_quant命令，执行一键模型量化:
-
+以下命令是对yolov5s模型进行量化, 用户若想量化其他模型, 替换config_path为configs文件夹下的其他模型配置文件即可.
 ```shell
+# 执行命令默认为单卡训练，训练前请指定单卡GPU, 否则在训练过程中可能会卡住.
 export CUDA_VISIBLE_DEVICES=0
 fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/'
 ```
 
 ##### 3.参数说明
 
+目前用户只需要提供一个定制的模型config文件,并指定量化方法和量化后的模型保存路径即可完成量化.
+
 | 参数                 | 作用                                                         |
 | -------------------- | ------------------------------------------------------------ |
-| --config_path          | 一键量化所需要的量化配置文件.[详解](./fdquant/configs/readme.md) |
+| --config_path          | 一键量化所需要的量化配置文件.[详解](./configs/README.md)|
 | --method               | 量化方式选择, 离线量化选PTQ，量化蒸馏训练选QAT     |
 | --save_dir             | 产出的量化后模型路径, 该模型可直接在FastDeploy部署     |
 
-注意：目前fastdeploy_quant暂时只支持YOLOv5,YOLOv6和YOLOv7模型的量化
-
 
 ## 3. FastDeploy 部署量化模型
-用户在获得量化模型之后，只需要简单地传入量化后的模型路径及相应参数，即可以使用FastDeploy进行部署.
+用户在获得量化模型之后，即可以使用FastDeploy进行部署, 部署文档请参考:
 具体请用户参考示例文档:
-- [YOLOv5s 量化模型Python部署](../examples/slim/yolov5s/python/)
-- [YOLOv5s 量化模型C++部署](../examples/slim/yolov5s/cpp/)
-- [YOLOv6s 量化模型Python部署](../examples/slim/yolov6s/python/)
-- [YOLOv6s 量化模型C++部署](../examples/slim/yolov6s/cpp/)
-- [YOLOv7 量化模型Python部署](../examples/slim/yolov7/python/)
-- [YOLOv7 量化模型C++部署](../examples/slim/yolov7/cpp/)
+- [YOLOv5 量化模型部署](../../examples/vision/detection/yolov5/quantize/)
 
-## 4.Benchmark
-下表为模型量化前后，在FastDeploy部署的端到端推理性能.
-- 测试图片为COCO val2017中的图片.
-- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
-- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
+- [YOLOv6 量化模型部署](../../examples/vision/detection/yolov6/quantize/)
 
-| 模型                 |推理后端            |部署硬件    | FP32推理时延    | INT8推理时延  | 加速比    | FP32 mAP | INT8 mAP |
-| ------------------- | -----------------|-----------|  --------     |--------      |--------      | --------- |-------- |
-| YOLOv5s             | TensorRT         |    GPU    |  14.13        |  11.22      |      1.26         | 37.6  | 36.6 |
-| YOLOv5s             | ONNX Runtime     |    CPU    |  183.68       |    100.39   |      1.83         | 37.6  | 33.1 |
-| YOLOv5s             | Paddle Inference  |    CPU    |      226.36   |   152.27     |      1.48         |37.6 | 36.8 |
-| YOLOv6s             | TensorRT         |    GPU    |       12.89        |   8.92          |  1.45             | 42.5 | 40.6|
-| YOLOv6s             | ONNX Runtime     |    CPU    |   345.85            |  131.81           |      2.60         |42.5| 36.1|
-| YOLOv6s             | Paddle Inference  |    CPU    |         366.41      |    131.70         |     2.78          |42.5| 41.2|
-| YOLOv7             | TensorRT          |    GPU    |     30.43          |      15.40       |       1.98        | 51.1| 50.8|
-| YOLOv7             | ONNX Runtime     |    CPU    |     971.27          |  471.88           |  2.06             | 51.1 | 42.5|
-| YOLOv7             | Paddle Inference  |    CPU    |          1015.70     |      562.41       |    1.82           |51.1 | 46.3|
+- [YOLOv7 量化模型部署](../../examples/vision/detection/yolov7/quantize/)
+
+- [PadddleClas 量化模型部署](../../examples/vision/classification/paddleclas/quantize/)
diff --git a/tools/quantization/configs/README.md b/tools/quantization/configs/README.md
new file mode 100644
index 000000000..7bab2de34
--- /dev/null
+++ b/tools/quantization/configs/README.md
@@ -0,0 +1,51 @@
+# FastDeploy 量化配置文件说明
+FastDeploy 量化配置文件中，包含了全局配置，量化蒸馏训练配置，离线量化配置和训练配置.
+用户除了直接使用FastDeploy提供在本目录的配置文件外，可以按需求自行修改相关配置文件
+
+## 实例解读
+
+```
+# 全局配置
+Global:
+  model_dir: ./yolov5s.onnx                   #输入模型的路径
+  format: 'onnx'                              #输入模型的格式, paddle模型请选择'paddle'
+  model_filename: model.pdmodel               #量化后转为paddle格式模型的模型名字
+  params_filename: model.pdiparams            #量化后转为paddle格式模型的参数名字
+  image_path: ./COCO_val_320                  #离线量化或者量化蒸馏训练使用的数据集路径
+  arch: YOLOv5                                #模型结构
+  input_list: ['x2paddle_images']             #待量化的模型的输入名字
+  preprocess: yolo_image_preprocess           #模型量化时,对数据做的预处理函数, 用户可以在 ../fdquant/dataset.py 中修改或自行编写新的预处理函数
+
+#量化蒸馏训练配置
+Distillation:
+  alpha: 1.0                                  #蒸馏loss所占权重
+  loss: soft_label                            #蒸馏loss算法
+
+Quantization:
+  onnx_format: true                           #是否采用ONNX量化标准格式, 要在FastDeploy上部署, 必须选true
+  use_pact: true                              #量化训练是否使用PACT方法
+  activation_quantize_type: 'moving_average_abs_max'     #激活量化方式
+  quantize_op_types:                          #需要进行量化的OP
+  - conv2d
+  - depthwise_conv2d
+
+#离线量化配置
+PTQ:
+  calibration_method: 'avg'                   #离线量化的激活校准算法, 可选: avg, abs_max, hist, KL, mse, emd
+  skip_tensor_list: None                      #用户可指定跳过某些conv层,不进行量化
+
+#训练参数配置
+TrainConfig:
+  train_iter: 3000
+  learning_rate: 0.00001
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
+  target_metric: 0.365
+
+```
+## 更多详细配置方法
+
+FastDeploy一键量化功能由PaddeSlim助力, 更详细的量化配置方法请参考:
+[自动化压缩超参详细教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/hyperparameter_tutorial.md)
diff --git a/tools/quantization/configs/detection/ppyoloe_l_quant.yaml b/tools/quantization/configs/detection/ppyoloe_l_quant.yaml
new file mode 100644
index 000000000..43cbab4f9
--- /dev/null
+++ b/tools/quantization/configs/detection/ppyoloe_l_quant.yaml
@@ -0,0 +1,37 @@
+Global:
+  model_dir: ./ppyoloe_crn_l_300e_coco
+  format: paddle
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+  image_path: ./COCO_val_320
+  arch: PPYOLOE
+  input_list: ['image','scale_factor']
+  preprocess: ppdet_image_preprocess
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+Quantization:
+  onnx_format: true
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+
+PTQ:
+  calibration_method: 'avg'   # option: avg, abs_max, hist, KL, mse
+  skip_tensor_list: None
+
+TrainConfig:
+  train_iter: 5000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
diff --git a/tools/quantization/configs/readme.md b/tools/quantization/configs/readme.md
deleted file mode 100644
index 782584815..000000000
--- a/tools/quantization/configs/readme.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# FastDeploy 量化配置文件说明
-FastDeploy 量化配置文件中，包含了全局配置，量化蒸馏训练配置，离线量化配置和训练配置.
-用户除了直接使用FastDeploy提供在本目录的配置文件外，可以按需求自行修改相关配置文件
-
-## 实例解读
-
-```
-#全局信息
-Global:
-  model_dir: ./yolov7-tiny.onnx     #输入模型路径
-  format: 'onnx'                    #输入模型格式，选项为 onnx 或者 paddle
-  model_filename: model.pdmodel     #paddle模型的模型文件名
-  params_filename: model.pdiparams  #paddle模型的参数文件名
-  image_path: ./COCO_val_320        #PTQ所有的Calibration数据集或者量化训练所用的训练集
-  arch: YOLOv7                      #模型系列
-
-#量化蒸馏训练中的蒸馏参数设置
-Distillation:
-  alpha: 1.0
-  loss: soft_label
-
-#量化蒸馏训练中的量化参数设置
-Quantization:
-  onnx_format: true
-  activation_quantize_type: 'moving_average_abs_max'
-  quantize_op_types:
-  - conv2d
-  - depthwise_conv2d
-
-#离线量化参数配置
-PTQ:
-  calibration_method: 'avg' #Calibraion算法，可选为 avg, abs_max, hist, KL, mse
-  skip_tensor_list: None    #不进行离线量化的tensor
-
-
-#训练参数
-TrainConfig:
-  train_iter: 3000  
-  learning_rate:
-    type: CosineAnnealingDecay
-    learning_rate: 0.00003
-    T_max: 8000
-  optimizer_builder:
-    optimizer:
-      type: SGD
-    weight_decay: 0.00004
-
-```