[Model] Add Solov2 For PaddleDetection (#1435)

* update solov2 * Repair note * update solov2 postprocess * update * update solov2 * update solov2 * fixed bug * fixed bug * update solov2 * update solov2 * fix build android bug * update docs * update docs * update docs * update * update * update arch and docs * update * update * update solov2 python --------- Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
2025-10-24 17:10:35 +08:00 · 2023-03-08 10:01:32 +08:00
parent 96a3698271
commit 0687d3b0ad
21 changed files with 840 additions and 474 deletions
--- a/examples/vision/detection/paddledetection/jetson/README.md
+++ b/examples/vision/detection/paddledetection/jetson/README.md
@@ -0,0 +1,21 @@
 English | [简体中文](README_CN.md)
 # PaddleDetection Model Deployment
 FastDeploy supports the SOLOV2 model of [PaddleDetection version 2.6](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6).
 You can enter the following command to get the static diagram model of SOLOV2.
 ```bash
 # install PaddleDetection
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
 cd PaddleDetection
 python tools/export_model.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --output_dir=./inference_model \
 -o weights=https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams
 ```
 ## Detailed Deployment Documents
 - [Python Deployment](python)
 - [C++ Deployment](cpp)
--- a/examples/vision/detection/paddledetection/jetson/README_CN.md
+++ b/examples/vision/detection/paddledetection/jetson/README_CN.md
@@ -0,0 +1,20 @@
 [English](README.md) | 简体中文
 # PaddleDetection模型部署
 FastDeploy支持[PaddleDetection 2.6](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6)版本的SOLOv2模型，
 你可以输入以下命令得到SOLOv2的静态图模型。
 ```bash
 # install PaddleDetection
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
 cd PaddleDetection
 python tools/export_model.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --output_dir=./inference_model \
 -o weights=https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams
 ```
 ## 详细部署文档
 - [Python部署](python)
 - [C++部署](cpp)
--- a/examples/vision/detection/paddledetection/jetson/cpp/CMakeLists.txt
+++ b/examples/vision/detection/paddledetection/jetson/cpp/CMakeLists.txt
@@ -0,0 +1,11 @@
 PROJECT(infer_demo C CXX)
 CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
 option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
 include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
 include_directories(${FASTDEPLOY_INCS})
 add_executable(infer_solov2_demo ${PROJECT_SOURCE_DIR}/infer_solov2.cc)
 target_link_libraries(infer_solov2_demo ${FASTDEPLOY_LIBS})
--- a/examples/vision/detection/paddledetection/jetson/cpp/README.md
+++ b/examples/vision/detection/paddledetection/jetson/cpp/README.md
@@ -0,0 +1,28 @@
 English | [简体中文](README_CN.md)
 # PaddleDetection C++ Deployment Example
 This directory provides examples that `infer_xxx.cc` fast finishes the deployment of PaddleDetection models, including SOLOv2 on CPU/GPU and GPU accelerated by TensorRT.
 Before deployment, two steps require confirmation
 - 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
 - 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
 Taking inference on Linux as an example, the compilation test can be completed by executing the following command in this directory. FastDeploy version 0.7.0 or above (x.x.x>=0.7.0) is required to support this model.
 ```bash
 mkdir build
 cd build
 # Download the FastDeploy precompiled library. Users can choose your appropriate version in the `FastDeploy Precompiled Library` mentioned above
 wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
 tar xvf fastdeploy-linux-x64-x.x.x.tgz
 cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
 make -j
 wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 # CPU inference
 ./infer_solov2_demo ./solov2_r50_fpn_1x_coco 000000014439.jpg 0
 # GPU inference
 ./infer_ppyoloe_demo ./ppyoloe_crn_l_300e_coco 000000014439.jpg 1
 ```
--- a/examples/vision/detection/paddledetection/jetson/cpp/README_CN.md
+++ b/examples/vision/detection/paddledetection/jetson/cpp/README_CN.md
@@ -0,0 +1,29 @@
 [English](README.md) | 简体中文
 # PaddleDetection C++部署示例
 本目录下提供`infer_xxx.cc`快速完成PaddleDetection模型包括SOLOv2在CPU/GPU，以及GPU上通过TensorRT加速部署的示例。
 在部署前，需确认以下两个步骤
 - 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
 - 2. 根据开发环境，下载预编译部署库和examples代码，参考[FastDeploy预编译库](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
 以Linux上推理为例，在本目录执行如下命令即可完成编译测试，支持此模型需保证FastDeploy版本0.7.0以上(x.x.x>=0.7.0)
 ```bash
 mkdir build
 cd build
 # 下载FastDeploy预编译库，用户可在上文提到的`FastDeploy预编译库`中自行选择合适的版本使用
 wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
 tar xvf fastdeploy-linux-x64-x.x.x.tgz
 cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
 make -j
 wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 # CPU推理
 ./infer_solov2_demo ./solov2_r50_fpn_1x_coco 000000014439.jpg 0
 # GPU推理
 ./infer_ppyoloe_demo ./ppyoloe_crn_l_300e_coco 000000014439.jpg 1
 ```
--- a/examples/vision/detection/paddledetection/jetson/cpp/infer_solov2.cc
+++ b/examples/vision/detection/paddledetection/jetson/cpp/infer_solov2.cc
@@ -0,0 +1,96 @@
 // Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #include "fastdeploy/vision.h"
 #ifdef WIN32
 const char sep = '\\';
 #else
 const char sep = '/';
 #endif
 void CpuInfer(const std::string& model_dir, const std::string& image_file) {
  auto model_file = model_dir + sep + "model.pdmodel";
  auto params_file = model_dir + sep + "model.pdiparams";
  auto config_file = model_dir + sep + "infer_cfg.yml";
  auto option = fastdeploy::RuntimeOption();
  option.UseCpu();
  auto model = fastdeploy::vision::detection::SOLOv2(model_file, params_file,
                                                     config_file, option);
  if (!model.Initialized()) {
    std::cerr << "Failed to initialize." << std::endl;
    return;
  }
  auto im = cv::imread(image_file);
  fastdeploy::vision::DetectionResult res;
  if (!model.Predict(im, &res)) {
    std::cerr << "Failed to predict." << std::endl;
    return;
  }
  std::cout << res.Str() << std::endl;
  auto vis_im = fastdeploy::vision::VisDetection(im, res, 0.5);
  cv::imwrite("vis_result.jpg", vis_im);
  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
 }
 void GpuInfer(const std::string& model_dir, const std::string& image_file) {
  auto model_file = model_dir + sep + "model.pdmodel";
  auto params_file = model_dir + sep + "model.pdiparams";
  auto config_file = model_dir + sep + "infer_cfg.yml";
  auto option = fastdeploy::RuntimeOption();
  option.UseGpu();
  auto model = fastdeploy::vision::detection::SOLOv2(model_file, params_file,
                                                     config_file, option);
  if (!model.Initialized()) {
    std::cerr << "Failed to initialize." << std::endl;
    return;
  }
  auto im = cv::imread(image_file);
  fastdeploy::vision::DetectionResult res;
  if (!model.Predict(im, &res)) {
    std::cerr << "Failed to predict." << std::endl;
    return;
  }
  std::cout << res.Str() << std::endl;
  auto vis_im = fastdeploy::vision::VisDetection(im, res, 0.5);
  cv::imwrite("vis_result.jpg", vis_im);
  std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
 }
 int main(int argc, char* argv[]) {
  if (argc < 4) {
    std::cout
        << "Usage: infer_demo path/to/model_dir path/to/image run_option, "
           "e.g ./infer_model ./ppyolo_dirname ./test.jpeg 0"
        << std::endl;
    std::cout << "The data type of run_option is int, 0: run with cpu; 1: run "
                 "with gpu; 2: run with kunlunxin."
              << std::endl;
    return -1;
  }
  if (std::atoi(argv[3]) == 0) {
    CpuInfer(argv[1], argv[2]);
  } else if (std::atoi(argv[3]) == 1) {
    GpuInfer(argv[1], argv[2]);
  }
  return 0;
 }
--- a/examples/vision/detection/paddledetection/jetson/python/README.md
+++ b/examples/vision/detection/paddledetection/jetson/python/README.md
@@ -0,0 +1,96 @@
 English | [简体中文](README_CN.md)
 # PaddleDetection Python Deployment Example
 Before deployment, two steps require confirmation.
 - 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
 - 2. Install FastDeploy Python whl package. Refer to [FastDeploy Python Installation](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
 This directory provides examples that `infer_xxx.py` fast finishes the deployment of PPYOLOE/PicoDet models on CPU/GPU and GPU accelerated by TensorRT. The script is as follows
 ```bash
 # Download deployment example code
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy/examples/vision/detection/paddledetection/python/
 # Download the PPYOLOE model file and test images
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
 wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 tar xvf ppyoloe_crn_l_300e_coco.tgz
 # CPU inference
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device cpu
 # GPU inference
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device gpu
 # TensorRT inference on GPU  （Attention: It is somewhat time-consuming for the operation of model serialization when running TensorRT inference for the first time. Please be patient.）
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device gpu --use_trt True
 # Kunlunxin XPU Inference
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device kunlunxin
 # Huawei Ascend Inference
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device ascend
 ```
 The visualized result after running is as follows
 <div  align="center">  
 <img src="https://user-images.githubusercontent.com/19339784/184326520-7075e907-10ed-4fad-93f8-52d0e35d4964.jpg", width=480px, height=320px />
 </div>
 ## PaddleDetection Python Interface
 ```python
 fastdeploy.vision.detection.PPYOLOE(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PicoDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOX(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.YOLOv3(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PPYOLO(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.FasterRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.MaskRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.SSD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv5(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv6(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv7(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.RTMDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.CascadeRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PSSDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.RetinaNet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PPYOLOESOD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.FCOS(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.TTFNet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.TOOD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.GFL(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.SOLOv2(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 ```
 PaddleDetection model loading and initialization, among which model_file and params_file are the exported Paddle model format. config_file is the configuration yaml file exported by PaddleDetection simultaneously
 **Parameter**
 > * **model_file**(str): Model file path
 > * **params_file**(str): Parameter file path
 > * **config_file**(str): Inference configuration yaml file path
 > * **runtime_option**(RuntimeOption): Backend inference configuration. None by default. (use the default configuration)
 > * **model_format**(ModelFormat): Model format. Paddle format by default
 ### predict Function
 PaddleDetection models, including PPYOLOE/PicoDet/PaddleYOLOX/YOLOv3/PPYOLO/FasterRCNN, all provide the following member functions for image detection
 > ```python
 > PPYOLOE.predict(image_data, conf_threshold=0.25, nms_iou_threshold=0.5)
 > ```
 >
 > Model prediction interface. Input images and output results directly.
 >
 > **Parameter**
 >
 > > * **image_data**(np.ndarray): Input data in HWC or BGR format
 > **Return**
 >
 > > Return `fastdeploy.vision.DetectionResult` structure. Refer to [Vision Model Prediction Results](../../../../../docs/api/vision_results/) for the description of the structure.
 ## Other Documents
 - [PaddleDetection Model Description](..)
 - [PaddleDetection C++ Deployment](../cpp)
 - [Model Prediction Results](../../../../../docs/api/vision_results/)
 - [How to switch the model inference backend engine](../../../../../docs/cn/faq/how_to_change_backend.md)
--- a/examples/vision/detection/paddledetection/jetson/python/README_CN.md
+++ b/examples/vision/detection/paddledetection/jetson/python/README_CN.md
@@ -0,0 +1,96 @@
 [English](README.md) | 简体中文
 # PaddleDetection Python部署示例
 在部署前，需确认以下两个步骤
 - 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
 - 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
 本目录下提供`infer_xxx.py`快速完成PPYOLOE/PicoDet等模型在CPU/GPU，以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成
 ```bash
 #下载部署示例代码
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy/examples/vision/detection/paddledetection/python/
 #下载PPYOLOE模型文件和测试图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
 wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 tar xvf ppyoloe_crn_l_300e_coco.tgz
 # CPU推理
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device cpu
 # GPU推理
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device gpu
 # GPU上使用TensorRT推理 （注意：TensorRT推理第一次运行，有序列化模型的操作，有一定耗时，需要耐心等待）
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device gpu --use_trt True
 # 昆仑芯XPU推理
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device kunlunxin
 # 华为昇腾推理
 python infer_ppyoloe.py --model_dir ppyoloe_crn_l_300e_coco --image 000000014439.jpg --device ascend
 ```
 运行完成可视化结果如下图所示
 <div  align="center">  
 <img src="https://user-images.githubusercontent.com/19339784/184326520-7075e907-10ed-4fad-93f8-52d0e35d4964.jpg", width=480px, height=320px />
 </div>
 ## PaddleDetection Python接口
 ```python
 fastdeploy.vision.detection.PPYOLOE(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PicoDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOX(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.YOLOv3(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PPYOLO(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.FasterRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.MaskRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.SSD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv5(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv6(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PaddleYOLOv7(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.RTMDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.CascadeRCNN(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PSSDet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.RetinaNet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.PPYOLOESOD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.FCOS(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.TTFNet(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.TOOD(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.GFL(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 fastdeploy.vision.detection.SOLOv2(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
 ```
 PaddleDetection模型加载和初始化，其中model_file， params_file为导出的Paddle部署模型格式, config_file为PaddleDetection同时导出的部署配置yaml文件
 **参数**
 > * **model_file**(str): 模型文件路径
 > * **params_file**(str): 参数文件路径
 > * **config_file**(str): 推理配置yaml文件路径
 > * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
 > * **model_format**(ModelFormat): 模型格式，默认为Paddle
 ### predict函数
 PaddleDetection中各个模型，包括PPYOLOE/PicoDet/PaddleYOLOX/YOLOv3/PPYOLO/FasterRCNN，均提供如下同样的成员函数用于进行图像的检测
 > ```python
 > PPYOLOE.predict(image_data, conf_threshold=0.25, nms_iou_threshold=0.5)
 > ```
 >
 > 模型预测结口，输入图像直接输出检测结果。
 >
 > **参数**
 >
 > > * **image_data**(np.ndarray): 输入数据，注意需为HWC，BGR格式
 > **返回**
 >
 > > 返回`fastdeploy.vision.DetectionResult`结构体，结构体说明参考文档[视觉模型预测结果](../../../../../docs/api/vision_results/)
 ## 其它文档
 - [PaddleDetection 模型介绍](..)
 - [PaddleDetection C++部署](../cpp)
 - [模型预测结果说明](../../../../../docs/api/vision_results/)
 - [如何切换模型推理后端引擎](../../../../../docs/cn/faq/how_to_change_backend.md)
--- a/examples/vision/detection/paddledetection/jetson/python/infer_solov2.py
+++ b/examples/vision/detection/paddledetection/jetson/python/infer_solov2.py
@@ -0,0 +1,68 @@
 import fastdeploy as fd
 import cv2
 import os
 def parse_arguments():
    import argparse
    import ast
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_dir",
        default=None,
        help="Path of PaddleDetection model directory")
    parser.add_argument(
        "--image", default=None, help="Path of test image file.")
    parser.add_argument(
        "--device",
        type=str,
        default='cpu',
        help="Type of inference device, support 'kunlunxin', 'cpu' or 'gpu'.")
    parser.add_argument(
        "--use_trt",
        type=ast.literal_eval,
        default=False,
        help="Wether to use tensorrt.")
    return parser.parse_args()
 def build_option(args):
    option = fd.RuntimeOption()
    if args.device.lower() == "gpu":
        option.use_gpu()
    if args.use_trt:
        option.use_trt_backend()
    return option
 args = parse_arguments()
 if args.model_dir is None:
    model_dir = fd.download_model(name='picodet_l_320_coco_lcnet')
 else:
    model_dir = args.model_dir
 model_file = os.path.join(model_dir, "model.pdmodel")
 params_file = os.path.join(model_dir, "model.pdiparams")
 config_file = os.path.join(model_dir, "infer_cfg.yml")
 # 配置runtime，加载模型
 runtime_option = build_option(args)
 model = fd.vision.detection.SOLOv2(
    model_file, params_file, config_file, runtime_option=runtime_option)
 # 预测图片检测结果
 if args.image is None:
    image = fd.utils.get_detection_test_image()
 else:
    image = args.image
 im = cv2.imread(image)
 result = model.predict(im)
 print(result)
 # 预测结果可视化
 vis_im = fd.vision.vis_detection(im, result, score_threshold=0.5)
 cv2.imwrite("visualized_result.jpg", vis_im)
 print("Visualized result save in ./visualized_result.jpg")
--- a/fastdeploy/runtime/backends/paddle/util.cc
+++ b/fastdeploy/runtime/backends/paddle/util.cc
@@ -115,6 +115,12 @@ void PaddleTensorToFDTensor(std::unique_ptr<paddle_infer::Tensor>& tensor,
    } else if (fd_tensor->dtype == FDDataType::INT64) {
      tensor->CopyToCpu(static_cast<int64_t*>(fd_tensor->MutableData()));
      return;
    } else if (fd_tensor->dtype == FDDataType::INT8) {
      tensor->CopyToCpu(static_cast<int8_t*>(fd_tensor->MutableData()));
      return;
    } else if (fd_tensor->dtype == FDDataType::UINT8) {
      tensor->CopyToCpu(static_cast<uint8_t*>(fd_tensor->MutableData()));
      return;
    }
    FDASSERT(false, "Unexpected data type(%s) while infer with PaddleBackend.",
             Str(fd_tensor->dtype).c_str());
--- a/fastdeploy/vision/detection/ppdet/base.cc
+++ b/fastdeploy/vision/detection/ppdet/base.cc
@@ -13,7 +13,7 @@ PPDetBase::PPDetBase(const std::string& model_file,
                     const std::string& config_file,
                     const RuntimeOption& custom_option,
                     const ModelFormat& model_format)
-    : preprocessor_(config_file), postprocessor_(config_file) {
+    : preprocessor_(config_file), postprocessor_(preprocessor_.GetArch()) {
  runtime_option = custom_option;
  runtime_option.model_format = model_format;
  runtime_option.model_file = model_file;
--- a/fastdeploy/vision/detection/ppdet/model.h
+++ b/fastdeploy/vision/detection/ppdet/model.h
@@ -49,6 +49,30 @@ class FASTDEPLOY_DECL PicoDet : public PPDetBase {
  virtual std::string ModelName() const { return "PicoDet"; }
 };
 class FASTDEPLOY_DECL SOLOv2 : public PPDetBase {
 public:
  /** \brief Set path of model file and configuration file, and the configuration of runtime
   *
   * \param[in] model_file Path of model file, e.g picodet/model.pdmodel
   * \param[in] params_file Path of parameter file, e.g picodet/model.pdiparams, if the model format is ONNX, this parameter will be ignored
   * \param[in] config_file Path of configuration file for deployment, e.g picodet/infer_cfg.yml
   * \param[in] custom_option RuntimeOption for inference, the default will use cpu, and choose the backend defined in `valid_cpu_backends`
   * \param[in] model_format Model format of the loaded model, default is Paddle format
   */
  SOLOv2(const std::string& model_file, const std::string& params_file,
          const std::string& config_file,
          const RuntimeOption& custom_option = RuntimeOption(),
          const ModelFormat& model_format = ModelFormat::PADDLE)
      : PPDetBase(model_file, params_file, config_file, custom_option,
                  model_format) {
    valid_cpu_backends = { Backend::PDINFER};
    valid_gpu_backends = {Backend::PDINFER, Backend::TRT};
    initialized = Initialize();
  }
  virtual std::string ModelName() const { return "SOLOv2"; }
 };
 class FASTDEPLOY_DECL PPYOLOE : public PPDetBase {
 public:
  /** \brief Set path of model file and configuration file, and the configuration of runtime
--- a/fastdeploy/vision/detection/ppdet/postprocessor.cc
+++ b/fastdeploy/vision/detection/ppdet/postprocessor.cc
@@ -15,6 +15,7 @@
 #include "fastdeploy/vision/detection/ppdet/postprocessor.h"
 #include "fastdeploy/vision/utils/utils.h"
 #include "yaml-cpp/yaml.h"
 namespace fastdeploy {
 namespace vision {
@@ -23,12 +24,6 @@ namespace detection {
 bool PaddleDetPostprocessor::ProcessMask(
    const FDTensor& tensor, std::vector<DetectionResult>* results) {
  auto shape = tensor.Shape();
  if (tensor.Dtype() != FDDataType::INT32) {
    FDERROR << "The data type of out mask tensor should be INT32, but now it's "
            << tensor.Dtype() << std::endl;
    return false;
  }
  int64_t out_mask_h = shape[1];
  int64_t out_mask_w = shape[2];
  int64_t out_mask_numel = shape[1] * shape[2];
  const auto* data = reinterpret_cast<const uint8_t*>(tensor.CpuData());
@@ -63,12 +58,9 @@ bool PaddleDetPostprocessor::ProcessMask(
  return true;
 }
-bool PaddleDetPostprocessor::Run(const std::vector<FDTensor>& tensors,
+bool PaddleDetPostprocessor::ProcessWithNMS(
-                                 std::vector<DetectionResult>* results) {
+    const std::vector<FDTensor>& tensors,
-  if (DecodeAndNMSApplied()) {
+    std::vector<DetectionResult>* results) {
    return ProcessUnDecodeResults(tensors, results);
  }
  // Get number of boxes for each input image
  std::vector<int> num_boxes(tensors[1].shape[0]);
  int total_num_boxes = 0;
@@ -127,31 +119,53 @@ bool PaddleDetPostprocessor::Run(const std::vector<FDTensor>& tensors,
      offset += static_cast<int>(num_boxes[i] * 6);
    }
  }
  return true;
 }
-  // Only detection
+bool PaddleDetPostprocessor::ProcessWithoutNMS(
-  if (tensors.size() <= 2) {
+    const std::vector<FDTensor>& tensors,
-    return true;
+    std::vector<DetectionResult>* results) {
-  }
+  int boxes_index = 0;
  int scores_index = 1;
-  if (tensors[2].Shape()[0] != num_output_boxes) {
+  // Judge the index of the input Tensor
-    FDERROR << "The first dimension of output mask tensor:"
+  if (tensors[0].shape[1] == tensors[1].shape[2]) {
-            << tensors[2].Shape()[0]
+    boxes_index = 0;
-            << " is not equal to the first dimension of output boxes tensor:"
+    scores_index = 1;
-            << num_output_boxes << "." << std::endl;
+  } else if (tensors[0].shape[2] == tensors[1].shape[1]) {
    boxes_index = 1;
    scores_index = 0;
  } else {
    FDERROR << "The shape of boxes and scores should be [batch, boxes_num, "
               "4], [batch, classes_num, boxes_num]"
            << std::endl;
    return false;
  }
-  // process for maskrcnn
+  // do multi class nms
-  return ProcessMask(tensors[2], results);
+  multi_class_nms_.Compute(
-}
+      static_cast<const float*>(tensors[boxes_index].Data()),
      static_cast<const float*>(tensors[scores_index].Data()),
      tensors[boxes_index].shape, tensors[scores_index].shape);
  auto num_boxes = multi_class_nms_.out_num_rois_data;
  auto box_data =
      static_cast<const float*>(multi_class_nms_.out_box_data.data());
-bool PaddleDetPostprocessor::ProcessUnDecodeResults(
+  // Get boxes for each input image
-    const std::vector<FDTensor>& tensors,
+  results->resize(num_boxes.size());
-    std::vector<DetectionResult>* results) {
+  int offset = 0;
-  results->resize(tensors[0].Shape()[0]);
+  for (size_t i = 0; i < num_boxes.size(); ++i) {
-
+    const float* ptr = box_data + offset;
-  // do decode and nms
+    (*results)[i].Reserve(num_boxes[i]);
-  ppdet_decoder_.DecodeAndNMS(tensors, results);
+    for (size_t j = 0; j < num_boxes[i]; ++j) {
      (*results)[i].label_ids.push_back(
          static_cast<int32_t>(round(ptr[j * 6])));
      (*results)[i].scores.push_back(ptr[j * 6 + 1]);
      (*results)[i].boxes.emplace_back(std::array<float, 4>(
          {ptr[j * 6 + 2], ptr[j * 6 + 3], ptr[j * 6 + 4], ptr[j * 6 + 5]}));
    }
    offset += (num_boxes[i] * 6);
  }
  // do scale
  if (GetScaleFactor()[0] != 0) {
@@ -166,6 +180,127 @@ bool PaddleDetPostprocessor::ProcessUnDecodeResults(
  }
  return true;
 }
 bool PaddleDetPostprocessor::ProcessSolov2(
    const std::vector<FDTensor>& tensors,
    std::vector<DetectionResult>* results) {
  if (tensors.size() != 4) {
    FDERROR << "The size of tensors for solov2 must be 4." << std::endl;
    return false;
  }
  if (tensors[0].shape[0] != 1) {
    FDERROR << "SOLOv2 temporarily only supports batch size is 1." << std::endl;
    return false;
  }
  results->clear();
  results->resize(1);
  (*results)[0].contain_masks = true;
  // tensor[0] means bbox data
  const auto bbox_data = static_cast<const int*>(tensors[0].CpuData());
  // tensor[1] means label data
  const auto label_data_ = static_cast<const int64_t*>(tensors[1].CpuData());
  // tensor[2] means score data
  const auto score_data_ = static_cast<const float*>(tensors[2].CpuData());
  // tensor[3] is mask data and its shape is the same as that of the image.
  const auto mask_data_ = static_cast<const uint8_t*>(tensors[3].CpuData());
  int rows = static_cast<int>(tensors[3].shape[1]);
  int cols = static_cast<int>(tensors[3].shape[2]);
  for (int bbox_id = 0; bbox_id < bbox_data[0]; ++bbox_id) {
    if (score_data_[bbox_id] >= multi_class_nms_.score_threshold) {
      DetectionResult& result_item = (*results)[0];
      result_item.label_ids.emplace_back(label_data_[bbox_id]);
      result_item.scores.emplace_back(score_data_[bbox_id]);
      std::vector<int> global_mask;
      for (int k = 0; k < rows * cols; ++k) {
        global_mask.push_back(
            static_cast<int>(mask_data_[k + bbox_id * rows * cols]));
      }
      // find minimize bounding box from mask
      cv::Mat mask(rows, cols, CV_32SC1);
      std::memcpy(mask.data, global_mask.data(),
                  global_mask.size() * sizeof(int));
      cv::Mat mask_fp;
      mask.convertTo(mask_fp, CV_32FC1);
      cv::Mat rowSum;
      cv::Mat colSum;
      std::vector<float> sum_of_row(rows);
      std::vector<float> sum_of_col(cols);
      cv::reduce(mask_fp, colSum, 0, cv::REDUCE_SUM, CV_32FC1);
      cv::reduce(mask_fp, rowSum, 1, cv::REDUCE_SUM, CV_32FC1);
      for (int row_id = 0; row_id < rows; ++row_id) {
        sum_of_row[row_id] = rowSum.at<float>(row_id, 0);
      }
      for (int col_id = 0; col_id < cols; ++col_id) {
        sum_of_col[col_id] = colSum.at<float>(0, col_id);
      }
      auto it = std::find_if(sum_of_row.begin(), sum_of_row.end(),
                             [](int x) { return x > 0.5; });
      float y1 = std::distance(sum_of_row.begin(), it);
      auto it2 = std::find_if(sum_of_col.begin(), sum_of_col.end(),
                              [](int x) { return x > 0.5; });
      float x1 = std::distance(sum_of_col.begin(), it2);
      auto rit = std::find_if(sum_of_row.rbegin(), sum_of_row.rend(),
                              [](int x) { return x > 0.5; });
      float y2 = std::distance(rit, sum_of_row.rend());
      auto rit2 = std::find_if(sum_of_col.rbegin(), sum_of_col.rend(),
                               [](int x) { return x > 0.5; });
      float x2 = std::distance(rit2, sum_of_col.rend());
      result_item.boxes.emplace_back(std::array<float, 4>({x1, y1, x2, y2}));
    }
  }
  return true;
 }
 bool PaddleDetPostprocessor::Run(const std::vector<FDTensor>& tensors,
                                 std::vector<DetectionResult>* results) {
  if (arch_ == "SOLOv2") {
    // process for SOLOv2
    ProcessSolov2(tensors, results);
    // The fourth output of solov2 is mask
    return ProcessMask(tensors[3], results);
  } else {
    // Do process according to whether NMS exists.
    if (with_nms_) {
      if (!ProcessWithNMS(tensors, results)) {
        return false;
      }
    } else {
      if (!ProcessWithoutNMS(tensors, results)) {
        return false;
      }
    }
    // for only detection
    if (tensors.size() <= 2) {
      return true;
    }
    // for maskrcnn
    if (tensors[2].Shape()[0] != tensors[0].Shape()[0]) {
      FDERROR << "The first dimension of output mask tensor:"
              << tensors[2].Shape()[0]
              << " is not equal to the first dimension of output boxes tensor:"
              << tensors[0].Shape()[0] << "." << std::endl;
      return false;
    }
    // The third output of mask-rcnn is mask
    return ProcessMask(tensors[2], results);
  }
 }
 }  // namespace detection
 }  // namespace vision
 }  // namespace fastdeploy
--- a/fastdeploy/vision/detection/ppdet/postprocessor.h
+++ b/fastdeploy/vision/detection/ppdet/postprocessor.h
@@ -16,7 +16,6 @@
 #include "fastdeploy/vision/common/processors/transform.h"
 #include "fastdeploy/vision/common/result.h"
 #include "fastdeploy/vision/detection/ppdet/multiclass_nms.h"
 #include "fastdeploy/vision/detection/ppdet/ppdet_decode.h"
 namespace fastdeploy {
 namespace vision {
@@ -25,14 +24,23 @@ namespace detection {
 */
 class FASTDEPLOY_DECL PaddleDetPostprocessor {
 public:
-  PaddleDetPostprocessor() = default;
+  PaddleDetPostprocessor() {
    // There may be no NMS config in the yaml file,
    // so we need to give a initial value to multi_class_nms_.
    multi_class_nms_.SetNMSOption(NMSOption());
  }
  /** \brief Create a preprocessor instance for PaddleDet serials model
   *
   * \param[in] config_file Path of configuration file for deployment, e.g ppyoloe/infer_cfg.yml
   */
-  explicit PaddleDetPostprocessor(const std::string& config_file)
+  explicit PaddleDetPostprocessor(const std::string& arch) {
-      : ppdet_decoder_(config_file) {}
+    // Used to differentiate models
    arch_ = arch;
    // There may be no NMS config in the yaml file,
    // so we need to give a initial value to multi_class_nms_.
    multi_class_nms_.SetNMSOption(NMSOption());
  }
  /** \brief Process the result of runtime and fill to ClassifyResult structure
   *
@@ -45,26 +53,44 @@ class FASTDEPLOY_DECL PaddleDetPostprocessor {
  /// Apply box decoding and nms step for the outputs for the model.This is
  /// only available for those model exported without box decoding and nms.
-  void ApplyDecodeAndNMS(const NMSOption& option = NMSOption()) {
+  void ApplyNMS() { with_nms_ = false; }
-    apply_decode_and_nms_ = true;
+
-    ppdet_decoder_.SetNMSOption(option);
+  /// If you do not want to modify the Yaml configuration file,
  /// you can use this function to set NMS parameters.
  void SetNMSOption(const NMSOption& option) {
    multi_class_nms_.SetNMSOption(option);
  }
  // Set scale_factor_ value.This is only available for those model exported
-  // without box decoding and nms.
+  // without nms.
  void SetScaleFactor(const std::vector<float>& scale_factor_value) {
    scale_factor_ = scale_factor_value;
  }
 private:
  // for model without decode and nms.
  bool apply_decode_and_nms_ = false;
  bool DecodeAndNMSApplied() const { return apply_decode_and_nms_; }
  bool ProcessUnDecodeResults(const std::vector<FDTensor>& tensors,
                              std::vector<DetectionResult>* results);
  PPDetDecode ppdet_decoder_;
  std::vector<float> scale_factor_{0.0, 0.0};
  std::vector<float> GetScaleFactor() { return scale_factor_; }
  // for model without nms.
  bool with_nms_ = true;
  // Used to differentiate models
  std::string arch_;
  PaddleMultiClassNMS multi_class_nms_{};
  // Process for General tensor without nms.
  bool ProcessWithoutNMS(const std::vector<FDTensor>& tensors,
                         std::vector<DetectionResult>* results);
  // Process for General tensor with nms.
  bool ProcessWithNMS(const std::vector<FDTensor>& tensors,
                      std::vector<DetectionResult>* results);
  // Process SOLOv2
  bool ProcessSolov2(const std::vector<FDTensor>& tensors,
                     std::vector<DetectionResult>* results);
  // Process mask tensor for MaskRCNN
  bool ProcessMask(const FDTensor& tensor,
                   std::vector<DetectionResult>* results);
--- a/fastdeploy/vision/detection/ppdet/ppdet_decode.cc
+++ b/fastdeploy/vision/detection/ppdet/ppdet_decode.cc
@@ -1,296 +0,0 @@
 // Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #include "ppdet_decode.h"
 #include "fastdeploy/vision/utils/utils.h"
 #include "yaml-cpp/yaml.h"
 namespace fastdeploy {
 namespace vision {
 namespace detection {
 PPDetDecode::PPDetDecode(const std::string& config_file) {
  config_file_ = config_file;
  ReadPostprocessConfigFromYaml();
 }
 /***************************************************************
 *  @name       ReadPostprocessConfigFromYaml
 *  @brief      Read decode config from yaml.
 *  @note       read arch
 *              read fpn_stride
 *              read nms_threshold on NMS
 *              read score_threshold on NMS
 *              read target_size
 ***************************************************************/
 bool PPDetDecode::ReadPostprocessConfigFromYaml() {
  YAML::Node config;
  try {
    config = YAML::LoadFile(config_file_);
  } catch (YAML::BadFile& e) {
    FDERROR << "Failed to load yaml file " << config_file_
            << ", maybe you should check this file." << std::endl;
    return false;
  }
  if (config["arch"].IsDefined()) {
    arch_ = config["arch"].as<std::string>();
  } else {
    FDERROR << "Please set model arch,"
            << "support value : YOLO, SSD, RetinaNet, RCNN, Face." << std::endl;
    return false;
  }
  if (config["fpn_stride"].IsDefined()) {
    fpn_stride_ = config["fpn_stride"].as<std::vector<int>>();
  }
  if (config["NMS"].IsDefined()) {
    for (const auto& op : config["NMS"]) {
      if (config["background_label"].IsDefined()) {
        multi_class_nms_.background_label =
            op["background_label"].as<int64_t>();
      }
      if (config["keep_top_k"].IsDefined()) {
        multi_class_nms_.keep_top_k = op["keep_top_k"].as<int64_t>();
      }
      if (config["nms_eta"].IsDefined()) {
        multi_class_nms_.nms_eta = op["nms_eta"].as<float>();
      }
      if (config["nms_threshold"].IsDefined()) {
        multi_class_nms_.nms_threshold = op["nms_threshold"].as<float>();
      }
      if (config["nms_top_k"].IsDefined()) {
        multi_class_nms_.nms_top_k = op["nms_top_k"].as<int64_t>();
      }
      if (config["normalized"].IsDefined()) {
        multi_class_nms_.normalized = op["normalized"].as<bool>();
      }
      if (config["score_threshold"].IsDefined()) {
        multi_class_nms_.score_threshold = op["score_threshold"].as<float>();
      }
    }
  }
  if (config["Preprocess"].IsDefined()) {
    for (const auto& op : config["Preprocess"]) {
      std::string op_name = op["type"].as<std::string>();
      if (op_name == "Resize") {
        im_shape_ = op["target_size"].as<std::vector<float>>();
      }
    }
  }
  return true;
 }
 /***************************************************************
 *  @name       DecodeAndNMS
 *  @brief      Read batch and call different decode functions.
 *  @param      tensors: model output tensor
 *              results: detection results
 *  @note       Only support arch is Picodet.
 ***************************************************************/
 bool PPDetDecode::DecodeAndNMS(const std::vector<FDTensor>& tensors,
                               std::vector<DetectionResult>* results) {
  if (tensors.size() == 2) {
    int boxes_index = 0;
    int scores_index = 1;
    if (tensors[0].shape[1] == tensors[1].shape[2]) {
      boxes_index = 0;
      scores_index = 1;
    } else if (tensors[0].shape[2] == tensors[1].shape[1]) {
      boxes_index = 1;
      scores_index = 0;
    } else {
      FDERROR << "The shape of boxes and scores should be [batch, boxes_num, "
                 "4], [batch, classes_num, boxes_num]"
              << std::endl;
      return false;
    }
    multi_class_nms_.Compute(
        static_cast<const float*>(tensors[boxes_index].Data()),
        static_cast<const float*>(tensors[scores_index].Data()),
        tensors[boxes_index].shape, tensors[scores_index].shape);
    auto num_boxes = multi_class_nms_.out_num_rois_data;
    auto box_data =
        static_cast<const float*>(multi_class_nms_.out_box_data.data());
    // Get boxes for each input image
    results->resize(num_boxes.size());
    int offset = 0;
    for (size_t i = 0; i < num_boxes.size(); ++i) {
      const float* ptr = box_data + offset;
      (*results)[i].Reserve(num_boxes[i]);
      for (size_t j = 0; j < num_boxes[i]; ++j) {
        (*results)[i].label_ids.push_back(
            static_cast<int32_t>(round(ptr[j * 6])));
        (*results)[i].scores.push_back(ptr[j * 6 + 1]);
        (*results)[i].boxes.emplace_back(std::array<float, 4>(
            {ptr[j * 6 + 2], ptr[j * 6 + 3], ptr[j * 6 + 4], ptr[j * 6 + 5]}));
      }
      offset += (num_boxes[i] * 6);
    }
    return true;
  } else {
    FDASSERT(tensors.size() == fpn_stride_.size() * 2,
             "The size of output must be fpn_stride * 2.")
    batchs_ = static_cast<int>(tensors[0].shape[0]);
    if (arch_ == "PicoDet") {
      int num_class, reg_max;
      for (int i = 0; i < tensors.size(); i++) {
        if (i == 0) {
          num_class = static_cast<int>(tensors[i].Shape()[2]);
        }
        if (i == fpn_stride_.size()) {
          reg_max = static_cast<int>(tensors[i].Shape()[2] / 4);
        }
      }
      for (int i = 0; i < results->size(); ++i) {
        PicoDetPostProcess(tensors, results, reg_max, num_class);
      }
    } else {
      FDERROR << "ProcessUnDecodeResults only supported when arch is PicoDet."
              << std::endl;
      return false;
    }
    return true;
  }
 }
 /***************************************************************
 *  @name       PicoDetPostProcess
 *  @brief      Do decode and NMS for Picodet.
 *  @param      outs: model output tensor
 *              results: detection results
 *  @note       Only support PPYOLOE and Picodet.
 ***************************************************************/
 bool PPDetDecode::PicoDetPostProcess(const std::vector<FDTensor>& outs,
                                     std::vector<DetectionResult>* results,
                                     int reg_max, int num_class) {
  for (int batch = 0; batch < batchs_; ++batch) {
    auto& result = (*results)[batch];
    result.Clear();
    for (int i = batch * batchs_ * fpn_stride_.size();
         i < fpn_stride_.size() * (batch + 1); ++i) {
      int feature_h =
          std::ceil(im_shape_[0] / static_cast<float>(fpn_stride_[i]));
      int feature_w =
          std::ceil(im_shape_[1] / static_cast<float>(fpn_stride_[i]));
      for (int idx = 0; idx < feature_h * feature_w; idx++) {
        const auto* scores =
            static_cast<const float*>(outs[i].Data()) + (idx * num_class);
        int row = idx / feature_w;
        int col = idx % feature_w;
        float score = 0;
        int cur_label = 0;
        for (int label = 0; label < num_class; label++) {
          if (scores[label] > score) {
            score = scores[label];
            cur_label = label;
          }
        }
        if (score > multi_class_nms_.score_threshold) {
          const auto* bbox_pred =
              static_cast<const float*>(outs[i + fpn_stride_.size()].Data()) +
              (idx * 4 * (reg_max));
          DisPred2Bbox(bbox_pred, cur_label, score, col, row, fpn_stride_[i],
                       &result, reg_max, num_class);
        }
      }
    }
    fastdeploy::vision::utils::NMS(&result, multi_class_nms_.nms_threshold);
  }
  return results;
 }
 /***************************************************************
 *  @name       FastExp
 *  @brief      Do exp op
 *  @param      x: input data
 *  @return     float
 ***************************************************************/
 float FastExp(float x) {
  union {
    uint32_t i;
    float f;
  } v{};
  v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
  return v.f;
 }
 /***************************************************************
 *  @name       ActivationFunctionSoftmax
 *  @brief      Do Softmax with reg_max.
 *  @param      src: input data
 *              dst: output data
 *  @return     float
 ***************************************************************/
 int PPDetDecode::ActivationFunctionSoftmax(const float* src, float* dst,
                                           int reg_max) {
  const float alpha = *std::max_element(src, src + reg_max);
  float denominator{0};
  for (int i = 0; i < reg_max; ++i) {
    dst[i] = FastExp(src[i] - alpha);
    denominator += dst[i];
  }
  for (int i = 0; i < reg_max; ++i) {
    dst[i] /= denominator;
  }
  return 0;
 }
 /***************************************************************
 *  @name       DisPred2Bbox
 *  @brief      Do Decode.
 *  @param      dfl_det: detection data
 *              label: label id
 *              score: confidence
 *              x: col
 *              y: row
 *              stride: stride
 *              results: detection results
 ***************************************************************/
 void PPDetDecode::DisPred2Bbox(const float*& dfl_det, int label, float score,
                               int x, int y, int stride,
                               fastdeploy::vision::DetectionResult* results,
                               int reg_max, int num_class) {
  float ct_x = static_cast<float>(x + 0.5) * static_cast<float>(stride);
  float ct_y = static_cast<float>(y + 0.5) * static_cast<float>(stride);
  std::vector<float> dis_pred{0, 0, 0, 0};
  for (int i = 0; i < 4; i++) {
    float dis = 0;
    auto* dis_after_sm = new float[reg_max];
    ActivationFunctionSoftmax(dfl_det + i * (reg_max), dis_after_sm, reg_max);
    for (int j = 0; j < reg_max; j++) {
      dis += static_cast<float>(j) * dis_after_sm[j];
    }
    dis *= static_cast<float>(stride);
    dis_pred[i] = dis;
    delete[] dis_after_sm;
  }
  float xmin = (float)(std::max)(ct_x - dis_pred[0], .0f);
  float ymin = (float)(std::max)(ct_y - dis_pred[1], .0f);
  float xmax = (float)(std::min)(ct_x + dis_pred[2], (float)im_shape_[0]);
  float ymax = (float)(std::min)(ct_y + dis_pred[3], (float)im_shape_[1]);
  results->boxes.emplace_back(std::array<float, 4>{xmin, ymin, xmax, ymax});
  results->label_ids.emplace_back(label);
  results->scores.emplace_back(score);
 }
 }  // namespace detection
 }  // namespace vision
 }  // namespace fastdeploy
--- a/fastdeploy/vision/detection/ppdet/ppdet_decode.h
+++ b/fastdeploy/vision/detection/ppdet/ppdet_decode.h
@@ -1,50 +0,0 @@
 // Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #pragma once
 #include "fastdeploy/vision/common/processors/transform.h"
 #include "fastdeploy/vision/common/result.h"
 #include "fastdeploy/vision/detection/ppdet/multiclass_nms.h"
 namespace fastdeploy {
 namespace vision {
 namespace detection {
 class FASTDEPLOY_DECL PPDetDecode {
 public:
  PPDetDecode() = default;
  explicit PPDetDecode(const std::string& config_file);
  bool DecodeAndNMS(const std::vector<FDTensor>& tensors,
                    std::vector<DetectionResult>* results);
  void SetNMSOption(const NMSOption& option = NMSOption()) {
    multi_class_nms_.SetNMSOption(option);
  }
 private:
  std::string config_file_;
  std::string arch_;
  std::vector<int> fpn_stride_{8, 16, 32, 64};
  std::vector<float> im_shape_{416, 416};
  int batchs_ = 1;
  bool ReadPostprocessConfigFromYaml();
  void DisPred2Bbox(const float*& dfl_det, int label, float score, int x, int y,
                    int stride, fastdeploy::vision::DetectionResult* results,
                    int reg_max, int num_class);
  bool PicoDetPostProcess(const std::vector<FDTensor>& outs,
                          std::vector<DetectionResult>* results, int reg_max,
                          int num_class);
  int ActivationFunctionSoftmax(const float* src, float* dst, int reg_max);
  PaddleMultiClassNMS multi_class_nms_;
 };
 }  // namespace detection
 }  // namespace vision
 }  // namespace fastdeploy
--- a/fastdeploy/vision/detection/ppdet/ppdet_pybind.cc
+++ b/fastdeploy/vision/detection/ppdet/ppdet_pybind.cc
@@ -73,14 +73,15 @@ void BindPPDet(pybind11::module& m) {
             }
             return results;
           })
-      .def(
+      .def("set_nms_option",
-          "apply_decode_and_nms",
+           [](vision::detection::PaddleDetPostprocessor& self,
-          [](vision::detection::PaddleDetPostprocessor& self,
+              vision::detection::NMSOption option) {
-             vision::detection::NMSOption option) {
+             self.SetNMSOption(option);
-            self.ApplyDecodeAndNMS(option);
+           })
-          },
+      .def("apply_nms",
-          "A function which adds two numbers",
+           [](vision::detection::PaddleDetPostprocessor& self) {
-          pybind11::arg("option") = vision::detection::NMSOption())
+             self.ApplyNMS();
           })
      .def("run", [](vision::detection::PaddleDetPostprocessor& self,
                     std::vector<pybind11::array>& input_array) {
        std::vector<vision::DetectionResult> results;
@@ -123,9 +124,6 @@ void BindPPDet(pybind11::module& m) {
      .def_property_readonly("postprocessor",
                             &vision::detection::PPDetBase::GetPostprocessor);
  pybind11::class_<vision::detection::PPDetDecode>(m, "PPDetDecode")
      .def(pybind11::init<std::string>());
  pybind11::class_<vision::detection::PPYOLO, vision::detection::PPDetBase>(
      m, "PPYOLO")
      .def(pybind11::init<std::string, std::string, std::string, RuntimeOption,
@@ -230,5 +228,10 @@ void BindPPDet(pybind11::module& m) {
                                                                         "GFL")
      .def(pybind11::init<std::string, std::string, std::string, RuntimeOption,
                          ModelFormat>());
  pybind11::class_<vision::detection::SOLOv2, vision::detection::PPDetBase>(
      m, "SOLOv2")
      .def(pybind11::init<std::string, std::string, std::string, RuntimeOption,
                          ModelFormat>());
 }
 }  // namespace fastdeploy
--- a/fastdeploy/vision/detection/ppdet/preprocessor.cc
+++ b/fastdeploy/vision/detection/ppdet/preprocessor.cc
@@ -40,6 +40,16 @@ bool PaddleDetPreprocessor::BuildPreprocessPipelineFromConfig() {
    return false;
  }
  // read for postprocess
  if (cfg["arch"].IsDefined()) {
    arch_ = cfg["arch"].as<std::string>();
  } else {
    FDERROR << "Please set model arch,"
            << "support value : SOLOv2, YOLO, SSD, RetinaNet, RCNN, Face." << std::endl;
    return false;
  }
  // read for preprocess
  processors_.push_back(std::make_shared<BGR2RGB>());
  bool has_permute = false;
--- a/fastdeploy/vision/detection/ppdet/preprocessor.h
+++ b/fastdeploy/vision/detection/ppdet/preprocessor.h
@@ -44,6 +44,10 @@ class FASTDEPLOY_DECL PaddleDetPreprocessor {
  /// This function will disable hwc2chw in preprocessing step.
  void DisablePermute();
  std::string GetArch() {
    return arch_;
  }
 private:
  bool BuildPreprocessPipelineFromConfig();
  std::vector<std::shared_ptr<Processor>> processors_;
@@ -54,6 +58,8 @@ class FASTDEPLOY_DECL PaddleDetPreprocessor {
  bool disable_normalize_ = false;
  // read config file
  std::string config_file_;
  // read arch_ for postprocess
  std::string arch_;
 };
 }  // namespace detection
--- a/fastdeploy/vision/tracking/pptracking/model.cc
+++ b/fastdeploy/vision/tracking/pptracking/model.cc
@@ -13,6 +13,7 @@
 // limitations under the License.
 #include "fastdeploy/vision/tracking/pptracking/model.h"
 #include "fastdeploy/vision/tracking/pptracking/letter_box_resize.h"
 #include "yaml-cpp/yaml.h"
@@ -24,8 +25,8 @@ PPTracking::PPTracking(const std::string& model_file,
                       const std::string& params_file,
                       const std::string& config_file,
                       const RuntimeOption& custom_option,
-                       const ModelFormat& model_format){
+                       const ModelFormat& model_format) {
-  config_file_=config_file;
+  config_file_ = config_file;
  valid_cpu_backends = {Backend::PDINFER, Backend::ORT};
  valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
@@ -37,30 +38,29 @@ PPTracking::PPTracking(const std::string& model_file,
  initialized = Initialize();
 }
-bool PPTracking::BuildPreprocessPipelineFromConfig(){
+bool PPTracking::BuildPreprocessPipelineFromConfig() {
  processors_.clear();
  YAML::Node cfg;
  try {
-      cfg = YAML::LoadFile(config_file_);
+    cfg = YAML::LoadFile(config_file_);
  } catch (YAML::BadFile& e) {
-      FDERROR << "Failed to load yaml file " << config_file_
+    FDERROR << "Failed to load yaml file " << config_file_
-              << ", maybe you should check this file." << std::endl;
+            << ", maybe you should check this file." << std::endl;
-      return false;
+    return false;
  }
  // Get draw_threshold for visualization
  if (cfg["draw_threshold"].IsDefined()) {
-      draw_threshold_ = cfg["draw_threshold"].as<float>();
+    draw_threshold_ = cfg["draw_threshold"].as<float>();
  } else {
-      FDERROR << "Please set draw_threshold." << std::endl;
+    FDERROR << "Please set draw_threshold." << std::endl;
-      return false;
+    return false;
  }
  // Get config for tracker
  if (cfg["tracker"].IsDefined()) {
    if (cfg["tracker"]["conf_thres"].IsDefined()) {
      conf_thresh_ = cfg["tracker"]["conf_thres"].as<float>();
-    }
+    } else {
    else {
      std::cerr << "Please set conf_thres in tracker." << std::endl;
      return false;
    }
@@ -86,48 +86,47 @@ bool PPTracking::BuildPreprocessPipelineFromConfig(){
        int width = target_size[1];
        int height = target_size[0];
        processors_.push_back(
-                std::make_shared<Resize>(width, height, -1.0, -1.0, interp, false));
+            std::make_shared<Resize>(width, height, -1.0, -1.0, interp, false));
      } else {
        int min_target_size = std::min(target_size[0], target_size[1]);
        int max_target_size = std::max(target_size[0], target_size[1]);
        std::vector<int> max_size;
        if (max_target_size > 0) {
-            max_size.push_back(max_target_size);
+          max_size.push_back(max_target_size);
-            max_size.push_back(max_target_size);
+          max_size.push_back(max_target_size);
        }
        processors_.push_back(std::make_shared<ResizeByShort>(
-                min_target_size, interp, true, max_size));
+            min_target_size, interp, true, max_size));
      }
-    }
+    } else if (op_name == "LetterBoxResize") {
    else if(op_name == "LetterBoxResize"){
      auto target_size = op["target_size"].as<std::vector<int>>();
-      FDASSERT(target_size.size() == 2,"Require size of target_size be 2, but now it's %lu.",
+      FDASSERT(target_size.size() == 2,
               "Require size of target_size be 2, but now it's %lu.",
               target_size.size());
-      std::vector<float> color{127.0f,127.0f,127.0f};
+      std::vector<float> color{127.0f, 127.0f, 127.0f};
-      if (op["fill_value"].IsDefined()){
+      if (op["fill_value"].IsDefined()) {
-          color =op["fill_value"].as<std::vector<float>>();
+        color = op["fill_value"].as<std::vector<float>>();
      }
-      processors_.push_back(std::make_shared<LetterBoxResize>(target_size, color));
+      processors_.push_back(
-    }
+          std::make_shared<LetterBoxResize>(target_size, color));
-    else if (op_name == "NormalizeImage") {
+    } else if (op_name == "NormalizeImage") {
      auto mean = op["mean"].as<std::vector<float>>();
      auto std = op["std"].as<std::vector<float>>();
      bool is_scale = true;
      if (op["is_scale"]) {
-          is_scale = op["is_scale"].as<bool>();
+        is_scale = op["is_scale"].as<bool>();
      }
      std::string norm_type = "mean_std";
      if (op["norm_type"]) {
-          norm_type = op["norm_type"].as<std::string>();
+        norm_type = op["norm_type"].as<std::string>();
      }
      if (norm_type != "mean_std") {
-          std::fill(mean.begin(), mean.end(), 0.0);
+        std::fill(mean.begin(), mean.end(), 0.0);
-          std::fill(std.begin(), std.end(), 1.0);
+        std::fill(std.begin(), std.end(), 1.0);
      }
      processors_.push_back(std::make_shared<Normalize>(mean, std, is_scale));
-    }
+    } else if (op_name == "Permute") {
    else if (op_name == "Permute") {
      // Do nothing, do permute as the last operation
      continue;
      // processors_.push_back(std::make_shared<HWC2CHW>());
@@ -136,11 +135,11 @@ bool PPTracking::BuildPreprocessPipelineFromConfig(){
      auto value = op["fill_value"].as<std::vector<float>>();
      processors_.push_back(std::make_shared<Cast>("float"));
      processors_.push_back(
-              std::make_shared<PadToSize>(size[1], size[0], value));
+          std::make_shared<PadToSize>(size[1], size[0], value));
    } else if (op_name == "PadStride") {
      auto stride = op["stride"].as<int>();
      processors_.push_back(
-              std::make_shared<StridePad>(stride, std::vector<float>(3, 0)));
+          std::make_shared<StridePad>(stride, std::vector<float>(3, 0)));
    } else {
      FDERROR << "Unexcepted preprocess operator: " << op_name << "."
              << std::endl;
@@ -168,7 +167,7 @@ bool PPTracking::Initialize() {
  return true;
 }
-bool PPTracking::Predict(cv::Mat *img, MOTResult *result) {
+bool PPTracking::Predict(cv::Mat* img, MOTResult* result) {
  Mat mat(*img);
  std::vector<FDTensor> input_tensors;
@@ -189,9 +188,7 @@ bool PPTracking::Predict(cv::Mat *img, MOTResult *result) {
  return true;
 }
 bool PPTracking::Preprocess(Mat* mat, std::vector<FDTensor>* outputs) {
  int origin_w = mat->Width();
  int origin_h = mat->Height();
@@ -203,9 +200,9 @@ bool PPTracking::Preprocess(Mat* mat, std::vector<FDTensor>* outputs) {
    }
  }
-//  LetterBoxResize(mat);
+  //  LetterBoxResize(mat);
-//  Normalize::Run(mat,mean_,scale_,is_scale_);
+  //  Normalize::Run(mat,mean_,scale_,is_scale_);
-//  HWC2CHW::Run(mat);
+  //  HWC2CHW::Run(mat);
  Cast::Run(mat, "float");
  outputs->resize(3);
@@ -226,8 +223,8 @@ bool PPTracking::Preprocess(Mat* mat, std::vector<FDTensor>* outputs) {
  return true;
 }
-
+void FilterDets(const float conf_thresh, const cv::Mat& dets,
-void FilterDets(const float conf_thresh,const cv::Mat& dets,std::vector<int>* index) {
+                std::vector<int>* index) {
  for (int i = 0; i < dets.rows; ++i) {
    float score = *dets.ptr<float>(i, 4);
    if (score > conf_thresh) {
@@ -236,7 +233,8 @@ void FilterDets(const float conf_thresh,const cv::Mat& dets,std::vector<int>* in
  }
 }
-bool PPTracking::Postprocess(std::vector<FDTensor>& infer_result, MOTResult *result){
+bool PPTracking::Postprocess(std::vector<FDTensor>& infer_result,
                             MOTResult* result) {
  auto bbox_shape = infer_result[0].shape;
  auto bbox_data = static_cast<float*>(infer_result[0].Data());
@@ -252,15 +250,14 @@ bool PPTracking::Postprocess(std::vector<FDTensor>& infer_result, MOTResult *res
  FilterDets(conf_thresh_, dets, &valid);
  cv::Mat new_dets, new_emb;
  for (int i = 0; i < valid.size(); ++i) {
-      new_dets.push_back(dets.row(valid[i]));
+    new_dets.push_back(dets.row(valid[i]));
-      new_emb.push_back(emb.row(valid[i]));
+    new_emb.push_back(emb.row(valid[i]));
  }
  jdeTracker_->update(new_dets, new_emb, &tracks);
  if (tracks.size() == 0) {
-    std::array<int ,4> box={int(*dets.ptr<float>(0, 0)),
+    std::array<int, 4> box = {
-                            int(*dets.ptr<float>(0, 1)),
+        int(*dets.ptr<float>(0, 0)), int(*dets.ptr<float>(0, 1)),
-                            int(*dets.ptr<float>(0, 2)),
+        int(*dets.ptr<float>(0, 2)), int(*dets.ptr<float>(0, 3))};
                            int(*dets.ptr<float>(0, 3))};
    result->boxes.push_back(box);
    result->ids.push_back(1);
    result->scores.push_back(*dets.ptr<float>(0, 4));
@@ -275,8 +272,8 @@ bool PPTracking::Postprocess(std::vector<FDTensor>& infer_result, MOTResult *res
        bool vertical = w / h > 1.6;
        float area = w * h;
        if (area > min_box_area_ && !vertical) {
-          std::array<int ,4> box = {
+          std::array<int, 4> box = {int(titer->ltrb[0]), int(titer->ltrb[1]),
-                      int(titer->ltrb[0]), int(titer->ltrb[1]), int(titer->ltrb[2]), int(titer->ltrb[3])};
+                                    int(titer->ltrb[2]), int(titer->ltrb[3])};
          result->boxes.push_back(box);
          result->ids.push_back(titer->id);
          result->scores.push_back(titer->score);
@@ -286,34 +283,33 @@ bool PPTracking::Postprocess(std::vector<FDTensor>& infer_result, MOTResult *res
  }
  if (!is_record_trail_) return true;
  int nums = result->boxes.size();
-  for (int i=0; i<nums; i++) {
+  for (int i = 0; i < nums; i++) {
    float center_x = (result->boxes[i][0] + result->boxes[i][2]) / 2;
    float center_y = (result->boxes[i][1] + result->boxes[i][3]) / 2;
    int id = result->ids[i];
-    recorder_->Add(id,{int(center_x), int(center_y)});
+    recorder_->Add(id, {int(center_x), int(center_y)});
  }
  return true;
 }
-void PPTracking::BindRecorder(TrailRecorder* recorder){
+void PPTracking::BindRecorder(TrailRecorder* recorder) {
-
+  recorder_ = recorder;
-    recorder_ = recorder;
+  is_record_trail_ = true;
    is_record_trail_ = true;
 }
-void PPTracking::UnbindRecorder(){
+void PPTracking::UnbindRecorder() {
-
+  is_record_trail_ = false;
-    is_record_trail_ = false;
+  std::map<int, std::vector<std::array<int, 2>>>::iterator iter;
-    std::map<int, std::vector<std::array<int, 2>>>::iterator iter;
+  for (iter = recorder_->records.begin(); iter != recorder_->records.end();
-    for(iter = recorder_->records.begin(); iter != recorder_->records.end(); iter++){
+       iter++) {
-      iter->second.clear();
+    iter->second.clear();
-      iter->second.shrink_to_fit();
+    iter->second.shrink_to_fit();
-    }
+  }
-    recorder_->records.clear();
+  recorder_->records.clear();
-    std::map<int, std::vector<std::array<int, 2>>>().swap(recorder_->records);
+  std::map<int, std::vector<std::array<int, 2>>>().swap(recorder_->records);
-    recorder_ = nullptr;
+  recorder_ = nullptr;
 }
-} // namespace tracking
+}  // namespace tracking
-} // namespace vision
+}  // namespace vision
-} // namespace fastdeploy
+}  // namespace fastdeploy
--- a/python/fastdeploy/vision/detection/ppdet/init.py
+++ b/python/fastdeploy/vision/detection/ppdet/init.py
@@ -73,7 +73,10 @@ class PaddleDetPostprocessor:
        """
        return self._postprocessor.run(runtime_results)
-    def apply_decode_and_nms(self, nms_option=None):
+    def apply_nms(self):
        self.apply_nms()
    def set_nms_option(self, nms_option=None):
        """This function will enable decode and nms in postprocess step.
        """
        if nms_option is None:
@@ -340,6 +343,44 @@ class YOLOv3(PPYOLOE):
        return clone_model
 class SOLOv2(PPYOLOE):
    def __init__(self,
                 model_file,
                 params_file,
                 config_file,
                 runtime_option=None,
                 model_format=ModelFormat.PADDLE):
        """Load a SOLOv2 model exported by PaddleDetection.
        :param model_file: (str)Path of model file, e.g solov2/model.pdmodel
        :param params_file: (str)Path of parameters file, e.g solov2/model.pdiparams, if the model_fomat is ModelFormat.ONNX, this param will be ignored, can be set as empty string
        :param config_file: (str)Path of configuration file for deployment, e.g solov2/infer_cfg.yml
        :param runtime_option: (fastdeploy.RuntimeOption)RuntimeOption for inference this model, if it's None, will use the default backend on CPU
        :param model_format: (fastdeploy.ModelForamt)Model format of the loaded model
        """
        super(PPYOLOE, self).__init__(runtime_option)
        assert model_format == ModelFormat.PADDLE, "SOLOv2 model only support model format of ModelFormat.Paddle now."
        self._model = C.vision.detection.SOLOv2(
            model_file, params_file, config_file, self._runtime_option,
            model_format)
        assert self.initialized, "SOLOv2 model initialize failed."
    def clone(self):
        """Clone SOLOv2 object
        :return: a new SOLOv2 object
        """
        class SOLOv2Clone(SOLOv2):
            def __init__(self, model):
                self._model = model
        clone_model = SOLOv2Clone(self._model.clone())
        return clone_model
 class MaskRCNN(PPYOLOE):
    def __init__(self,
                 model_file,