Add PP-ModNet and PP-HumanMatting Support (#240)

* first commit for yolov7 * pybind for yolov7 * CPP README.md * CPP README.md * modified yolov7.cc * README.md * python file modify * delete license in fastdeploy/ * repush the conflict part * README.md modified * README.md modified * file path modified * file path modified * file path modified * file path modified * file path modified * README modified * README modified * move some helpers to private * add examples for yolov7 * api.md modified * api.md modified * api.md modified * YOLOv7 * yolov7 release link * yolov7 release link * yolov7 release link * copyright * change some helpers to private * change variables to const and fix documents. * gitignore * Transfer some funtions to private member of class * Transfer some funtions to private member of class * Merge from develop (#9) * Fix compile problem in different python version (#26) * fix some usage problem in linux * Fix compile problem Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> * Add PaddleDetetion/PPYOLOE model support (#22) * add ppdet/ppyoloe * Add demo code and documents * add convert processor to vision (#27) * update .gitignore * Added checking for cmake include dir * fixed missing trt_backend option bug when init from trt * remove un-need data layout and add pre-check for dtype * changed RGB2BRG to BGR2RGB in ppcls model * add model_zoo yolov6 c++/python demo * fixed CMakeLists.txt typos * update yolov6 cpp/README.md * add yolox c++/pybind and model_zoo demo * move some helpers to private * fixed CMakeLists.txt typos * add normalize with alpha and beta * add version notes for yolov5/yolov6/yolox * add copyright to yolov5.cc * revert normalize * fixed some bugs in yolox * fixed examples/CMakeLists.txt to avoid conflicts * add convert processor to vision * format examples/CMakeLists summary * Fix bug while the inference result is empty with YOLOv5 (#29) * Add multi-label function for yolov5 * Update README.md Update doc * Update fastdeploy_runtime.cc fix variable option.trt_max_shape wrong name * Update runtime_option.md Update resnet model dynamic shape setting name from images to x * Fix bug when inference result boxes are empty * Delete detection.py Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> * first commit for yolor * for merge * Develop (#11) * Fix compile problem in different python version (#26) * fix some usage problem in linux * Fix compile problem Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> * Add PaddleDetetion/PPYOLOE model support (#22) * add ppdet/ppyoloe * Add demo code and documents * add convert processor to vision (#27) * update .gitignore * Added checking for cmake include dir * fixed missing trt_backend option bug when init from trt * remove un-need data layout and add pre-check for dtype * changed RGB2BRG to BGR2RGB in ppcls model * add model_zoo yolov6 c++/python demo * fixed CMakeLists.txt typos * update yolov6 cpp/README.md * add yolox c++/pybind and model_zoo demo * move some helpers to private * fixed CMakeLists.txt typos * add normalize with alpha and beta * add version notes for yolov5/yolov6/yolox * add copyright to yolov5.cc * revert normalize * fixed some bugs in yolox * fixed examples/CMakeLists.txt to avoid conflicts * add convert processor to vision * format examples/CMakeLists summary * Fix bug while the inference result is empty with YOLOv5 (#29) * Add multi-label function for yolov5 * Update README.md Update doc * Update fastdeploy_runtime.cc fix variable option.trt_max_shape wrong name * Update runtime_option.md Update resnet model dynamic shape setting name from images to x * Fix bug when inference result boxes are empty * Delete detection.py Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> * Yolor (#16) * Develop (#11) (#12) * Fix compile problem in different python version (#26) * fix some usage problem in linux * Fix compile problem Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> * Add PaddleDetetion/PPYOLOE model support (#22) * add ppdet/ppyoloe * Add demo code and documents * add convert processor to vision (#27) * update .gitignore * Added checking for cmake include dir * fixed missing trt_backend option bug when init from trt * remove un-need data layout and add pre-check for dtype * changed RGB2BRG to BGR2RGB in ppcls model * add model_zoo yolov6 c++/python demo * fixed CMakeLists.txt typos * update yolov6 cpp/README.md * add yolox c++/pybind and model_zoo demo * move some helpers to private * fixed CMakeLists.txt typos * add normalize with alpha and beta * add version notes for yolov5/yolov6/yolox * add copyright to yolov5.cc * revert normalize * fixed some bugs in yolox * fixed examples/CMakeLists.txt to avoid conflicts * add convert processor to vision * format examples/CMakeLists summary * Fix bug while the inference result is empty with YOLOv5 (#29) * Add multi-label function for yolov5 * Update README.md Update doc * Update fastdeploy_runtime.cc fix variable option.trt_max_shape wrong name * Update runtime_option.md Update resnet model dynamic shape setting name from images to x * Fix bug when inference result boxes are empty * Delete detection.py Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> * Develop (#13) * Fix compile problem in different python version (#26) * fix some usage problem in linux * Fix compile problem Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> * Add PaddleDetetion/PPYOLOE model support (#22) * add ppdet/ppyoloe * Add demo code and documents * add convert processor to vision (#27) * update .gitignore * Added checking for cmake include dir * fixed missing trt_backend option bug when init from trt * remove un-need data layout and add pre-check for dtype * changed RGB2BRG to BGR2RGB in ppcls model * add model_zoo yolov6 c++/python demo * fixed CMakeLists.txt typos * update yolov6 cpp/README.md * add yolox c++/pybind and model_zoo demo * move some helpers to private * fixed CMakeLists.txt typos * add normalize with alpha and beta * add version notes for yolov5/yolov6/yolox * add copyright to yolov5.cc * revert normalize * fixed some bugs in yolox * fixed examples/CMakeLists.txt to avoid conflicts * add convert processor to vision * format examples/CMakeLists summary * Fix bug while the inference result is empty with YOLOv5 (#29) * Add multi-label function for yolov5 * Update README.md Update doc * Update fastdeploy_runtime.cc fix variable option.trt_max_shape wrong name * Update runtime_option.md Update resnet model dynamic shape setting name from images to x * Fix bug when inference result boxes are empty * Delete detection.py Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> * documents * documents * documents * documents * documents * documents * documents * documents * documents * documents * documents * documents * Develop (#14) * Fix compile problem in different python version (#26) * fix some usage problem in linux * Fix compile problem Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> * Add PaddleDetetion/PPYOLOE model support (#22) * add ppdet/ppyoloe * Add demo code and documents * add convert processor to vision (#27) * update .gitignore * Added checking for cmake include dir * fixed missing trt_backend option bug when init from trt * remove un-need data layout and add pre-check for dtype * changed RGB2BRG to BGR2RGB in ppcls model * add model_zoo yolov6 c++/python demo * fixed CMakeLists.txt typos * update yolov6 cpp/README.md * add yolox c++/pybind and model_zoo demo * move some helpers to private * fixed CMakeLists.txt typos * add normalize with alpha and beta * add version notes for yolov5/yolov6/yolox * add copyright to yolov5.cc * revert normalize * fixed some bugs in yolox * fixed examples/CMakeLists.txt to avoid conflicts * add convert processor to vision * format examples/CMakeLists summary * Fix bug while the inference result is empty with YOLOv5 (#29) * Add multi-label function for yolov5 * Update README.md Update doc * Update fastdeploy_runtime.cc fix variable option.trt_max_shape wrong name * Update runtime_option.md Update resnet model dynamic shape setting name from images to x * Fix bug when inference result boxes are empty * Delete detection.py Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> Co-authored-by: Jason <928090362@qq.com> * add is_dynamic for YOLO series (#22) * modify ppmatting backend and docs * modify ppmatting docs * fix the PPMatting size problem * fix LimitShort's log * retrigger ci * modify PPMatting docs * modify the way for dealing with LimitShort * add pphumanmatting and modnet series * docs of PPMatting series * add explanation of newly added processors and fix processors * Modify LimitShort function and ppmatting.cc * modify ResizeByShort and ppmatting.cc * change resize_to_int_mult to limit_by_stride and delete resize_by_input_shape * retrigger ci * retrigger ci * fix problem produced by ResizeByShort * Update eigen.cmake * Delete eigen.cmake * refine code * add test file for ppmatting series * add squeeze for fd_tensor and modify ppmatting.cc Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: huangjianhui <852142024@qq.com> Co-authored-by: Jason <928090362@qq.com>
2025-10-05 00:33:03 +08:00 · 2022-10-07 21:44:16 +08:00
parent 1005a09ff1
commit 0692dcc405
21 changed files with 523 additions and 132 deletions
--- a/examples/vision/matting/README.md
+++ b/examples/vision/matting/README.md
@@ -6,3 +6,5 @@ FastDeploy目前支持如下抠图模型部署
 | :--- | :--- | :------- | :--- |
 | [ZHKKKe/MODNet](./modnet) | MODNet 系列模型 | ONNX | [CommitID:28165a4](https://github.com/ZHKKKe/MODNet/commit/28165a4) |
 | [PaddleSeg/PPMatting](./ppmatting) | PPMatting 系列模型 | Paddle | [Release/2.6](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting) |
+| [PaddleSeg/PPHumanMatting](./ppmatting) | PPHumanMatting 系列模型 | Paddle | [Release/2.6](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting) |
+| [PaddleSeg/ModNet](./ppmatting) | ModNet 系列模型 | Paddle | [Release/2.6](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting) |
--- a/examples/vision/matting/ppmatting/README.md
+++ b/examples/vision/matting/ppmatting/README.md
@@ -9,11 +9,13 @@
 目前FastDeploy支持如下模型的部署

 - [PPMatting系列模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting)
+- [PPHumanMatting系列模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting)
+- [ModNet系列模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting)


 ## 导出部署模型

-在部署前，需要先将PPMatting导出成部署模型，导出步骤参考文档[导出模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting)
+在部署前，需要先将PPMatting导出成部署模型，导出步骤参考文档[导出模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/Matting)(Tips:导出PPMatting系列模型和PPHumanMatting系列模型需要设置导出脚本的`--input_shape`参数)


 ## 下载预训练模型
@@ -25,8 +27,12 @@

 | 模型                                                               | 参数大小    | 精度    | 备注 |
 |:---------------------------------------------------------------- |:----- |:----- | :------ |
-| [PPMatting-512](https://bj.bcebos.com/paddlehub/fastdeploy/PP-Matting-512.tgz) | 87MB | - |
-| [PPMatting-1024](https://bj.bcebos.com/paddlehub/fastdeploy/PP-Matting-1024.tgz) | 87MB | - |
+| [PPMatting-512](https://bj.bcebos.com/paddlehub/fastdeploy/PP-Matting-512.tgz) | 106MB | - |
+| [PPMatting-1024](https://bj.bcebos.com/paddlehub/fastdeploy/PP-Matting-1024.tgz) | 106MB | - |
+| [PPHumanMatting](https://bj.bcebos.com/paddlehub/fastdeploy/PPHumanMatting.tgz) | 247MB | - |
+| [Modnet_ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/PPModnet_ResNet50_vd.tgz) | 355MB | - |
+| [Modnet_MobileNetV2](https://bj.bcebos.com/paddlehub/fastdeploy/PPModnet_MobileNetV2.tgz) | 28MB | - |
+| [Modnet_HRNet_w18](https://bj.bcebos.com/paddlehub/fastdeploy/PPModnet_HRNet_w18.tgz) | 51MB | - |



--- a/examples/vision/matting/ppmatting/cpp/infer.cc
+++ b/examples/vision/matting/ppmatting/cpp/infer.cc
@@ -81,7 +81,7 @@ void GpuInfer(const std::string& model_dir, const std::string& image_file,
  cv::imwrite("visualized_result.jpg", vis_im_with_bg);
  cv::imwrite("visualized_result_fg.jpg", vis_im);
  std::cout << "Visualized result save in ./visualized_result_replaced_bg.jpg "
-               "and ./visualized_result_fg.jpgg"
+               "and ./visualized_result_fg.jpg"
            << std::endl;
 }

--- a/fastdeploy/core/fd_tensor.cc
+++ b/fastdeploy/core/fd_tensor.cc
@@ -85,6 +85,13 @@ void FDTensor::ExpandDim(int64_t axis) {
  shape.insert(shape.begin() + axis, 1);
 }

+void FDTensor::Squeeze(int64_t axis) {
+  size_t ndim = shape.size();
+  FDASSERT(axis >= 0 && axis < ndim,
+           "The allowed 'axis' must be in range of (0, %lu)!", ndim);
+  shape.erase(shape.begin() + axis);
+}
+
 void FDTensor::Allocate(const std::vector<int64_t>& new_shape,
                        const FDDataType& data_type,
                        const std::string& tensor_name,
--- a/fastdeploy/core/fd_tensor.h
+++ b/fastdeploy/core/fd_tensor.h
@@ -71,6 +71,10 @@ struct FASTDEPLOY_DECL FDTensor {
  // at the `axis` position in the expanded Tensor shape.
  void ExpandDim(int64_t axis = 0);

+  // Squeeze the shape of a Tensor. Erase the axis that will appear
+  // at the `axis` position in the squeezed Tensor shape.
+  void Squeeze(int64_t axis = 0);
+
  // Initialize Tensor
  // Include setting attribute for tensor
  // and allocate cpu memory buffer
--- a/fastdeploy/vision/common/processors/crop.cc
+++ b/fastdeploy/vision/common/processors/crop.cc
@@ -0,0 +1,65 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision/common/processors/crop.h"
+
+namespace fastdeploy {
+namespace vision {
+
+bool Crop::CpuRun(Mat* mat) {
+  cv::Mat* im = mat->GetCpuMat();
+  int height = static_cast<int>(im->rows);
+  int width = static_cast<int>(im->cols);
+  if (height < height_ + offset_h_ || width < width_ + offset_w_) {
+    FDERROR << "[Crop] Cannot crop [" << height_ << ", " << width_
+            << "] from the input image [" << height << ", " << width
+            << "], with offset [" << offset_h_ << ", " << offset_w_ << "]."
+            << std::endl;
+    return false;
+  }
+  cv::Rect crop_roi(offset_w_, offset_h_, width_, height_);
+  *im = (*im)(crop_roi);
+  mat->SetWidth(width_);
+  mat->SetHeight(height_);
+  return true;
+}
+
+#ifdef ENABLE_OPENCV_CUDA
+bool Crop::GpuRun(Mat* mat) {
+  cv::cuda::GpuMat* im = mat->GetGpuMat();
+  int height = static_cast<int>(im->rows);
+  int width = static_cast<int>(im->cols);
+  if (height < height_ + offset_h_ || width < width_ + offset_w_) {
+    FDERROR << "[Crop] Cannot crop [" << height_ << ", " << width_
+            << "] from the input image [" << height << ", " << width
+            << "], with offset [" << offset_h_ << ", " << offset_w_ << "]."
+            << std::endl;
+    return false;
+  }
+  cv::Rect crop_roi(offset_w_, offset_h_, width_, height_);
+  *im = (*im)(crop_roi);
+  mat->SetWidth(width_);
+  mat->SetHeight(height_);
+  return true;
+}
+#endif
+
+bool Crop::Run(Mat* mat, int offset_w, int offset_h, int width, int height,
+               ProcLib lib) {
+  auto c = Crop(offset_w, offset_h, width, height);
+  return c(mat, lib);
+}
+
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/crop.h
+++ b/fastdeploy/vision/common/processors/crop.h
@@ -0,0 +1,47 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "fastdeploy/vision/common/processors/base.h"
+
+namespace fastdeploy {
+namespace vision {
+
+class Crop : public Processor {
+ public:
+  Crop(int offset_w, int offset_h, int width, int height) {
+    offset_w_ = offset_w;
+    offset_h_ = offset_h;
+    width_ = width;
+    height_ = height;
+  }
+  bool CpuRun(Mat* mat);
+#ifdef ENABLE_OPENCV_CUDA
+  bool GpuRun(Mat* mat);
+#endif
+  std::string Name() { return "Crop"; }
+
+  static bool Run(Mat* mat, int offset_w, int offset_h, int width, int height,
+                  ProcLib lib = ProcLib::OPENCV_CPU);
+
+ private:
+  int offset_w_;
+  int offset_h_;
+  int height_;
+  int width_;
+};
+
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/resize_to_int_mult.cc
+++ b/fastdeploy/vision/common/processors/resize_to_int_mult.cc
@@ -12,17 +12,17 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-#include "fastdeploy/vision/common/processors/resize_to_int_mult.h"
+#include "fastdeploy/vision/common/processors/limit_by_stride.h"

 namespace fastdeploy {
 namespace vision {

-bool ResizeToIntMult::CpuRun(Mat* mat) {
+bool LimitByStride::CpuRun(Mat* mat) {
  cv::Mat* im = mat->GetCpuMat();
  int origin_w = im->cols;
  int origin_h = im->rows;
-  int rw = origin_w - origin_w % mult_int_;
-  int rh = origin_h - origin_h % mult_int_;
+  int rw = origin_w - origin_w % stride_;
+  int rh = origin_h - origin_h % stride_;
  if (rw != origin_w || rh != origin_w) {
    cv::resize(*im, *im, cv::Size(rw, rh), 0, 0, interp_);
    mat->SetWidth(im->cols);
@@ -32,13 +32,13 @@ bool ResizeToIntMult::CpuRun(Mat* mat) {
 }

 #ifdef ENABLE_OPENCV_CUDA
-bool ResizeToIntMult::GpuRun(Mat* mat) {
+bool LimitByStride::GpuRun(Mat* mat) {
  cv::cuda::GpuMat* im = mat->GetGpuMat();
  int origin_w = im->cols;
  int origin_h = im->rows;
  im->convertTo(*im, CV_32FC(im->channels()));
-  int rw = origin_w - origin_w % mult_int_;
-  int rh = origin_h - origin_h % mult_int_;
+  int rw = origin_w - origin_w % stride_;
+  int rh = origin_h - origin_h % stride_;
  if (rw != origin_w || rh != origin_w) {
    cv::cuda::resize(*im, *im, cv::Size(rw, rh), 0, 0, interp_);
    mat->SetWidth(im->cols);
@@ -48,8 +48,8 @@ bool ResizeToIntMult::GpuRun(Mat* mat) {
 }
 #endif

-bool ResizeToIntMult::Run(Mat* mat, int mult_int, int interp, ProcLib lib) {
-  auto r = ResizeToIntMult(mult_int, interp);
+bool LimitByStride::Run(Mat* mat, int stride, int interp, ProcLib lib) {
+  auto r = LimitByStride(stride, interp);
  return r(mat, lib);
 }
 }  // namespace vision
--- a/fastdeploy/vision/common/processors/resize_to_int_mult.h
+++ b/fastdeploy/vision/common/processors/resize_to_int_mult.h
@@ -19,24 +19,27 @@
 namespace fastdeploy {
 namespace vision {

-class ResizeToIntMult : public Processor {
+class LimitByStride : public Processor {
 public:
-  explicit ResizeToIntMult(int mult_int = 32, int interp = 1) {
-    mult_int_ = mult_int;
+  explicit LimitByStride(int stride = 32, int interp = 1) {
+    stride_ = stride;
    interp_ = interp;
  }
+
+  // Resize Mat* mat to make the size divisible by stride_.
+
  bool CpuRun(Mat* mat);
 #ifdef ENABLE_OPENCV_CUDA
  bool GpuRun(Mat* mat);
 #endif
-  std::string Name() { return "ResizeToIntMult"; }
+  std::string Name() { return "LimitByStride"; }

-  static bool Run(Mat* mat, int mult_int = 32, int interp = 1,
+  static bool Run(Mat* mat, int stride = 32, int interp = 1,
                  ProcLib lib = ProcLib::OPENCV_CPU);

 private:
  int interp_;
-  int mult_int_;
+  int stride_;
 };
 }  // namespace vision
 }  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/limit_long.cc
+++ b/fastdeploy/vision/common/processors/limit_long.cc
@@ -0,0 +1,70 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/vision/common/processors/limit_long.h"
+
+namespace fastdeploy {
+namespace vision {
+
+bool LimitLong::CpuRun(Mat* mat) {
+  cv::Mat* im = mat->GetCpuMat();
+  int origin_w = im->cols;
+  int origin_h = im->rows;
+  int im_size_max = std::max(origin_w, origin_h);
+  int target = im_size_max;
+  if (max_long_ > 0 && im_size_max > max_long_) {
+    target = max_long_;
+  } else if (min_long_ > 0 && im_size_max < min_long_) {
+    target = min_long_;
+  }
+  if (target != im_size_max) {
+    double scale =
+        static_cast<double>(target) / static_cast<double>(im_size_max);
+    cv::resize(*im, *im, cv::Size(), scale, scale, interp_);
+    mat->SetWidth(im->cols);
+    mat->SetHeight(im->rows);
+  }
+  return true;
+}
+
+#ifdef ENABLE_OPENCV_CUDA
+bool LimitLong::GpuRun(Mat* mat) {
+  cv::cuda::GpuMat* im = mat->GetGpuMat();
+  int origin_w = im->cols;
+  int origin_h = im->rows;
+  im->convertTo(*im, CV_32FC(im->channels()));
+  int im_size_max = std::max(origin_w, origin_h);
+  int target = im_size_max;
+  if (max_long_ > 0 && im_size_max > max_long_) {
+    target = max_long_;
+  } else if (min_long_ > 0 && im_size_max < min_long_) {
+    target = min_long_;
+  }
+  if (target != im_size_max) {
+    double scale =
+        static_cast<double>(target) / static_cast<double>(im_size_max);
+    cv::cuda::resize(*im, *im, cv::Size(), scale, scale, interp_);
+    mat->SetWidth(im->cols);
+    mat->SetHeight(im->rows);
+  }
+  return true;
+}
+#endif
+
+bool LimitLong::Run(Mat* mat, int max_long, int min_long, ProcLib lib) {
+  auto l = LimitLong(max_long, min_long);
+  return l(mat, lib);
+}
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/limit_long.h
+++ b/fastdeploy/vision/common/processors/limit_long.h
@@ -0,0 +1,51 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "fastdeploy/vision/common/processors/base.h"
+
+namespace fastdeploy {
+namespace vision {
+
+class LimitLong : public Processor {
+ public:
+  explicit LimitLong(int max_long = -1, int min_long = -1, int interp = 1) {
+    max_long_ = max_long;
+    min_long_ = min_long;
+    interp_ = interp;
+  }
+
+  // Limit the long edge of image.
+  // If the long edge is larger than max_long_, resize the long edge
+  // to max_long_, while scale the short edge proportionally.
+  // If the long edge is smaller than min_long_, resize the long edge
+  // to min_long_, while scale the short edge proportionally.
+  bool CpuRun(Mat* mat);
+#ifdef ENABLE_OPENCV_CUDA
+  bool GpuRun(Mat* mat);
+#endif
+  std::string Name() { return "LimitLong"; }
+
+  static bool Run(Mat* mat, int max_long = -1, int min_long = -1,
+                  ProcLib lib = ProcLib::OPENCV_CPU);
+  int GetMaxLong() const { return max_long_; }
+
+ private:
+  int max_long_;
+  int min_long_;
+  int interp_;
+};
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/limit_short.cc
+++ b/fastdeploy/vision/common/processors/limit_short.cc
@@ -28,9 +28,11 @@ bool LimitShort::CpuRun(Mat* mat) {
  } else if (min_short_ > 0 && im_size_min < min_short_) {
    target = min_short_;
  }
+  double scale = -1.f;
  if (target != im_size_min) {
-    double scale =
-        static_cast<double>(target) / static_cast<double>(im_size_min);
+    scale = static_cast<double>(target) / static_cast<double>(im_size_min);
+  }
+  if (scale > 0) {
    cv::resize(*im, *im, cv::Size(), scale, scale, interp_);
    mat->SetWidth(im->cols);
    mat->SetHeight(im->rows);
@@ -51,9 +53,11 @@ bool LimitShort::GpuRun(Mat* mat) {
  } else if (min_short_ > 0 && im_size_min < min_short_) {
    target = min_short_;
  }
+  double scale = -1.f;
  if (target != im_size_min) {
-    double scale =
-        static_cast<double>(target) / static_cast<double>(im_size_min);
+    scale = static_cast<double>(target) / static_cast<double>(im_size_min);
+  }
+  if (scale > 0) {
    cv::cuda::resize(*im, *im, cv::Size(), scale, scale, interp_);
    mat->SetWidth(im->cols);
    mat->SetHeight(im->rows);
--- a/fastdeploy/vision/common/processors/limit_short.h
+++ b/fastdeploy/vision/common/processors/limit_short.h
@@ -26,6 +26,12 @@ class LimitShort : public Processor {
    min_short_ = min_short;
    interp_ = interp;
  }
+
+  // Limit the short edge of image.
+  // If the short edge is larger than max_short_, resize the short edge
+  // to max_short_, while scale the long edge proportionally.
+  // If the short edge is smaller than min_short_, resize the short edge
+  // to min_short_, while scale the long edge proportionally.
  bool CpuRun(Mat* mat);
 #ifdef ENABLE_OPENCV_CUDA
  bool GpuRun(Mat* mat);
@@ -34,7 +40,7 @@ class LimitShort : public Processor {

  static bool Run(Mat* mat, int max_short = -1, int min_short = -1,
                  ProcLib lib = ProcLib::OPENCV_CPU);
-  int GetMaxShort() { return max_short_; }
+  int GetMaxShort() const { return max_short_; }

 private:
  int max_short_;
--- a/fastdeploy/vision/common/processors/pad_to_size.cc
+++ b/fastdeploy/vision/common/processors/pad_to_size.cc
@@ -18,6 +18,9 @@ namespace fastdeploy {
 namespace vision {

 bool PadToSize::CpuRun(Mat* mat) {
+  if (width_ == -1 || height_ == -1) {
+    return true;
+  }
  if (mat->layout != Layout::HWC) {
    FDERROR << "PadToSize: The input data must be Layout::HWC format!"
            << std::endl;
@@ -74,6 +77,9 @@ bool PadToSize::CpuRun(Mat* mat) {

 #ifdef ENABLE_OPENCV_CUDA
 bool PadToSize::GpuRun(Mat* mat) {
+  if (width_ == -1 || height_ == -1) {
+    return true;
+  }
  if (mat->layout != Layout::HWC) {
    FDERROR << "PadToSize: The input data must be Layout::HWC format!"
            << std::endl;
--- a/fastdeploy/vision/common/processors/pad_to_size.h
+++ b/fastdeploy/vision/common/processors/pad_to_size.h
@@ -21,7 +21,7 @@ namespace vision {

 class PadToSize : public Processor {
 public:
-  // only support pad with left-top padding mode
+  // only support pad with right-bottom padding mode
  PadToSize(int width, int height, const std::vector<float>& value) {
    width_ = width;
    height_ = height;
--- a/fastdeploy/vision/common/processors/resize_by_short.cc
+++ b/fastdeploy/vision/common/processors/resize_by_short.cc
@@ -22,12 +22,14 @@ bool ResizeByShort::CpuRun(Mat* mat) {
  int origin_w = im->cols;
  int origin_h = im->rows;
  double scale = GenerateScale(origin_w, origin_h);
-  if (use_scale_) {
+  if (use_scale_ && fabs(scale - 1.0) >= 1e-06) {
    cv::resize(*im, *im, cv::Size(), scale, scale, interp_);
  } else {
    int width = static_cast<int>(round(scale * im->cols));
    int height = static_cast<int>(round(scale * im->rows));
-    cv::resize(*im, *im, cv::Size(width, height), 0, 0, interp_);
+    if (width != origin_w || height != origin_h) {
+      cv::resize(*im, *im, cv::Size(width, height), 0, 0, interp_);
+    }
  }
  mat->SetWidth(im->cols);
  mat->SetHeight(im->rows);
@@ -41,12 +43,14 @@ bool ResizeByShort::GpuRun(Mat* mat) {
  int origin_h = im->rows;
  double scale = GenerateScale(origin_w, origin_h);
  im->convertTo(*im, CV_32FC(im->channels()));
-  if (use_scale_) {
+  if (use_scale_ && fabs(scale - 1.0) >= 1e-06) {
    cv::cuda::resize(*im, *im, cv::Size(), scale, scale, interp_);
  } else {
    int width = static_cast<int>(round(scale * im->cols));
    int height = static_cast<int>(round(scale * im->rows));
-    cv::cuda::resize(*im, *im, cv::Size(width, height), 0, 0, interp_);
+    if (width != origin_w || height != origin_h) {
+      cv::cuda::resize(*im, *im, cv::Size(width, height), 0, 0, interp_);
+    }
  }
  mat->SetWidth(im->cols);
  mat->SetHeight(im->rows);
@@ -59,18 +63,31 @@ double ResizeByShort::GenerateScale(const int origin_w, const int origin_h) {
  int im_size_min = std::min(origin_w, origin_h);
  double scale =
      static_cast<double>(target_size_) / static_cast<double>(im_size_min);
-  if (max_size_ > 0) {
-    if (round(scale * im_size_max) > max_size_) {
-      scale = static_cast<double>(max_size_) / static_cast<double>(im_size_max);
+
+  if (max_hw_.size() > 0) {
+    FDASSERT(max_hw_.size() == 2,
+             "Require size of max_hw_ be 2, but now it's %zu.", max_hw_.size());
+    FDASSERT(
+        max_hw_[0] > 0 && max_hw_[1] > 0,
+        "Require elements in max_hw_ greater than 0, but now it's [%d, %d].",
+        max_hw_[0], max_hw_[1]);
+
+    double scale_h =
+        static_cast<double>(max_hw_[0]) / static_cast<double>(origin_h);
+    double scale_w =
+        static_cast<double>(max_hw_[1]) / static_cast<double>(origin_w);
+    double min_scale = std::min(scale_h, scale_w);
+    if (min_scale < scale) {
+      scale = min_scale;
    }
  }
  return scale;
 }

 bool ResizeByShort::Run(Mat* mat, int target_size, int interp, bool use_scale,
-                        int max_size, ProcLib lib) {
-  auto r = ResizeByShort(target_size, interp, use_scale, max_size);
+                        const std::vector<int>& max_hw, ProcLib lib) {
+  auto r = ResizeByShort(target_size, interp, use_scale, max_hw);
  return r(mat, lib);
 }
-} // namespace vision
-} // namespace fastdeploy
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/resize_by_short.h
+++ b/fastdeploy/vision/common/processors/resize_by_short.h
@@ -22,9 +22,9 @@ namespace vision {
 class ResizeByShort : public Processor {
 public:
  ResizeByShort(int target_size, int interp = 1, bool use_scale = true,
-                int max_size = -1) {
+                const std::vector<int>& max_hw = std::vector<int>()) {
    target_size_ = target_size;
-    max_size_ = max_size;
+    max_hw_ = max_hw;
    interp_ = interp;
    use_scale_ = use_scale;
  }
@@ -35,15 +35,16 @@ class ResizeByShort : public Processor {
  std::string Name() { return "ResizeByShort"; }

  static bool Run(Mat* mat, int target_size, int interp = 1,
-                  bool use_scale = true, int max_size = -1,
+                  bool use_scale = true,
+                  const std::vector<int>& max_hw = std::vector<int>(),
                  ProcLib lib = ProcLib::OPENCV_CPU);

 private:
  double GenerateScale(const int origin_w, const int origin_h);
  int target_size_;
-  int max_size_;
+  std::vector<int> max_hw_;
  int interp_;
  bool use_scale_;
 };
-} // namespace vision
-} // namespace fastdeploy
+}  // namespace vision
+}  // namespace fastdeploy
--- a/fastdeploy/vision/common/processors/transform.h
+++ b/fastdeploy/vision/common/processors/transform.h
@@ -18,7 +18,10 @@
 #include "fastdeploy/vision/common/processors/center_crop.h"
 #include "fastdeploy/vision/common/processors/color_space_convert.h"
 #include "fastdeploy/vision/common/processors/convert.h"
+#include "fastdeploy/vision/common/processors/crop.h"
 #include "fastdeploy/vision/common/processors/hwc2chw.h"
+#include "fastdeploy/vision/common/processors/limit_by_stride.h"
+#include "fastdeploy/vision/common/processors/limit_long.h"
 #include "fastdeploy/vision/common/processors/limit_short.h"
 #include "fastdeploy/vision/common/processors/normalize.h"
 #include "fastdeploy/vision/common/processors/pad.h"
@@ -26,5 +29,4 @@
 #include "fastdeploy/vision/common/processors/resize.h"
 #include "fastdeploy/vision/common/processors/resize_by_long.h"
 #include "fastdeploy/vision/common/processors/resize_by_short.h"
-#include "fastdeploy/vision/common/processors/resize_to_int_mult.h"
 #include "fastdeploy/vision/common/processors/stride_pad.h"
--- a/fastdeploy/vision/detection/ppdet/ppyoloe.cc
+++ b/fastdeploy/vision/detection/ppdet/ppyoloe.cc
@@ -122,8 +122,13 @@ bool PPYOLOE::BuildPreprocessPipelineFromConfig() {
      } else {
        int min_target_size = std::min(target_size[0], target_size[1]);
        int max_target_size = std::max(target_size[0], target_size[1]);
+        std::vector<int> max_size;
+        if (max_target_size > 0) {
+          max_size.push_back(max_target_size);
+          max_size.push_back(max_target_size);
+        }
        processors_.push_back(std::make_shared<ResizeByShort>(
-            min_target_size, interp, true, max_target_size));
+            min_target_size, interp, true, max_size));
      }
    } else if (op_name == "Permute") {
      // Do nothing, do permute as the last operation
--- a/fastdeploy/vision/matting/ppmatting/ppmatting.cc
+++ b/fastdeploy/vision/matting/ppmatting/ppmatting.cc
@@ -60,33 +60,54 @@ bool PPMatting::BuildPreprocessPipelineFromConfig() {
    return false;
  }

+  FDASSERT((cfg["Deploy"]["input_shape"]),
+           "The yaml file should include input_shape parameters");
+  // input_shape
+  // b c h w
+  auto input_shape = cfg["Deploy"]["input_shape"].as<std::vector<int>>();
+  FDASSERT(input_shape.size() == 4,
+           "The input_shape in yaml file need to be 4-dimensions, but now its "
+           "dimension is %zu.",
+           input_shape.size());
+
+  bool is_fixed_input_shape = false;
+  if (input_shape[2] > 0 && input_shape[3] > 0) {
+    is_fixed_input_shape = true;
+  }
+  if (input_shape[2] < 0 || input_shape[3] < 0) {
+    FDWARNING << "Detected dynamic input shape of your model, only Paddle "
+                 "Inference / OpenVINO support this model now."
+              << std::endl;
+  }
  if (cfg["Deploy"]["transforms"]) {
    auto preprocess_cfg = cfg["Deploy"]["transforms"];
+    int long_size = -1;
    for (const auto& op : preprocess_cfg) {
      FDASSERT(op.IsMap(),
               "Require the transform information in yaml be Map type.");
      if (op["type"].as<std::string>() == "LimitShort") {
-        int max_short = -1;
-        int min_short = -1;
-        if (op["max_short"]) {
-          max_short = op["max_short"].as<int>();
+        int max_short = op["max_short"] ? op["max_short"].as<int>() : -1;
+        int min_short = op["min_short"] ? op["min_short"].as<int>() : -1;
+        if (is_fixed_input_shape) {
+          // if the input shape is fixed, will resize by scale, and the max
+          // shape will not exceed input_shape
+          long_size = max_short;
+          std::vector<int> max_size = {input_shape[2], input_shape[3]};
+          processors_.push_back(
+              std::make_shared<ResizeByShort>(long_size, 1, true, max_size));
+        } else {
+          processors_.push_back(
+              std::make_shared<LimitShort>(max_short, min_short));
        }
-        if (op["min_short"]) {
-          min_short = op["min_short"].as<int>();
-        }
-        FDINFO << "Detected LimitShort processing step in yaml file, if the "
-                  "model is exported from PaddleSeg, please make sure the "
-                  "input of your model is fixed with a square shape, and "
-                  "greater than or equal to "
-               << max_short << "." << std::endl;
-        processors_.push_back(
-            std::make_shared<LimitShort>(max_short, min_short));
      } else if (op["type"].as<std::string>() == "ResizeToIntMult") {
-        int mult_int = 32;
-        if (op["mult_int"]) {
-          mult_int = op["mult_int"].as<int>();
+        if (is_fixed_input_shape) {
+          std::vector<int> max_size = {input_shape[2], input_shape[3]};
+          processors_.push_back(
+              std::make_shared<ResizeByShort>(long_size, 1, true, max_size));
+        } else {
+          int mult_int = op["mult_int"] ? op["mult_int"].as<int>() : 32;
+          processors_.push_back(std::make_shared<LimitByStride>(mult_int));
        }
-        processors_.push_back(std::make_shared<ResizeToIntMult>(mult_int));
      } else if (op["type"].as<std::string>() == "Normalize") {
        std::vector<float> mean = {0.5, 0.5, 0.5};
        std::vector<float> std = {0.5, 0.5, 0.5};
@@ -97,58 +118,40 @@ bool PPMatting::BuildPreprocessPipelineFromConfig() {
          std = op["std"].as<std::vector<float>>();
        }
        processors_.push_back(std::make_shared<Normalize>(mean, std));
-      } else if (op["type"].as<std::string>() == "ResizeByLong") {
-        int target_size = op["long_size"].as<int>();
-        processors_.push_back(std::make_shared<ResizeByLong>(target_size));
-      } else if (op["type"].as<std::string>() == "Pad") {
-        // size: (w, h)
-        auto size = op["size"].as<std::vector<int>>();
-        std::vector<float> value = {127.5, 127.5, 127.5};
-        if (op["fill_value"]) {
-          auto value = op["fill_value"].as<std::vector<float>>();
-        }
-        processors_.push_back(std::make_shared<Cast>("float"));
-        processors_.push_back(
-            std::make_shared<PadToSize>(size[1], size[0], value));
      } else if (op["type"].as<std::string>() == "ResizeByShort") {
-        int target_size = op["short_size"].as<int>();
-        processors_.push_back(std::make_shared<ResizeByShort>(target_size));
+        long_size = op["short_size"].as<int>();
+        if (is_fixed_input_shape) {
+          std::vector<int> max_size = {input_shape[2], input_shape[3]};
+          processors_.push_back(
+              std::make_shared<ResizeByShort>(long_size, 1, true, max_size));
+        } else {
+          processors_.push_back(std::make_shared<ResizeByShort>(long_size));
+        }
      }
    }
+    // the default padding value is {127.5,127.5,127.5} so after normalizing,
+    // ((127.5/255)-0.5)/0.5 = 0.0
+    std::vector<float> value = {0.0, 0.0, 0.0};
+    processors_.push_back(std::make_shared<Cast>("float"));
+    processors_.push_back(
+        std::make_shared<PadToSize>(input_shape[3], input_shape[2], value));
    processors_.push_back(std::make_shared<HWC2CHW>());
  }
+
  return true;
 }

 bool PPMatting::Preprocess(Mat* mat, FDTensor* output,
                           std::map<std::string, std::array<int, 2>>* im_info) {
+  (*im_info)["input_shape"] = {mat->Height(), mat->Width()};
  for (size_t i = 0; i < processors_.size(); ++i) {
-    if (processors_[i]->Name().compare("LimitShort") == 0) {
-      int input_h = static_cast<int>(mat->Height());
-      int input_w = static_cast<int>(mat->Width());
-      auto processor = dynamic_cast<LimitShort*>(processors_[i].get());
-      int max_short = processor->GetMaxShort();
-      if (runtime_option.backend != Backend::PDINFER) {
-        if (input_w != input_h || input_h < max_short || input_w < max_short) {
-          Resize::Run(mat, max_short, max_short);
-        }
-      }
-    }
    if (!(*(processors_[i].get()))(mat)) {
      FDERROR << "Failed to process image data in " << processors_[i]->Name()
              << "." << std::endl;
      return false;
    }
-    if (processors_[i]->Name().compare("ResizeByLong") == 0) {
-      (*im_info)["resize_by_long"] = {static_cast<int>(mat->Height()),
-                                      static_cast<int>(mat->Width())};
-    }
  }
-
-  // Record output shape of preprocessed image
-  (*im_info)["output_shape"] = {static_cast<int>(mat->Height()),
-                                static_cast<int>(mat->Width())};
-
+  (*im_info)["output_shape"] = {mat->Height(), mat->Width()};
  mat->ShareWithTensor(output);
  output->shape.insert(output->shape.begin(), 1);
  output->name = InputInfoOfRuntime(0).name;
@@ -159,8 +162,7 @@ bool PPMatting::Postprocess(
    std::vector<FDTensor>& infer_result, MattingResult* result,
    const std::map<std::string, std::array<int, 2>>& im_info) {
  FDASSERT((infer_result.size() == 1),
-           "The default number of output tensor must be 1 according to "
-           "modnet.");
+           "The default number of output tensor must be 1 ");
  FDTensor& alpha_tensor = infer_result.at(0);  // (1,h,w,1)
  FDASSERT((alpha_tensor.shape[0] == 1), "Only support batch =1 now.");
  if (alpha_tensor.dtype != FDDataType::FP32) {
@@ -170,41 +172,31 @@ bool PPMatting::Postprocess(

  auto iter_ipt = im_info.find("input_shape");
  auto iter_out = im_info.find("output_shape");
-  auto resize_by_long = im_info.find("resize_by_long");
-  FDASSERT(iter_out != im_info.end() && iter_ipt != im_info.end(),
-           "Cannot find input_shape or output_shape from im_info.");
-  int out_h = iter_out->second[0];
-  int out_w = iter_out->second[1];
-  int ipt_h = iter_ipt->second[0];
-  int ipt_w = iter_ipt->second[1];

-  float* alpha_ptr = static_cast<float*>(alpha_tensor.Data());
-  cv::Mat alpha_zero_copy_ref(out_h, out_w, CV_32FC1, alpha_ptr);
-  cv::Mat cropped_alpha;
-  if (resize_by_long != im_info.end()) {
-    int resize_h = resize_by_long->second[0];
-    int resize_w = resize_by_long->second[1];
-    alpha_zero_copy_ref(cv::Rect(0, 0, resize_w, resize_h))
-        .copyTo(cropped_alpha);
-  } else {
-    cropped_alpha = alpha_zero_copy_ref;
-  }
-  Mat alpha_resized(cropped_alpha);  // ref-only, zero copy.
+  double scale_h = static_cast<double>(iter_out->second[0]) /
+                   static_cast<double>(iter_ipt->second[0]);
+  double scale_w = static_cast<double>(iter_out->second[1]) /
+                   static_cast<double>(iter_ipt->second[1]);
+  double actual_scale = std::min(scale_h, scale_w);

-  if ((out_h != ipt_h) || (out_w != ipt_w)) {
-    // already allocated a new continuous memory after resize.
-    // cv::resize(alpha_resized, alpha_resized, cv::Size(ipt_w, ipt_h));
-    Resize::Run(&alpha_resized, ipt_w, ipt_h, -1, -1);
-  }
+  int size_before_pad_h = round(actual_scale * iter_ipt->second[0]);
+  int size_before_pad_w = round(actual_scale * iter_ipt->second[1]);
+  std::vector<int64_t> dim{0, 2, 3, 1};
+  Transpose(alpha_tensor, &alpha_tensor, dim);
+  alpha_tensor.Squeeze(0);
+  Mat mat = CreateFromTensor(alpha_tensor);
+
+  Crop::Run(&mat, 0, 0, size_before_pad_w, size_before_pad_h);
+  Resize::Run(&mat, iter_ipt->second[1], iter_ipt->second[0]);

  result->Clear();
  // note: must be setup shape before Resize
  result->contain_foreground = false;
-  result->shape = {static_cast<int64_t>(ipt_h), static_cast<int64_t>(ipt_w)};
-  int numel = ipt_h * ipt_w;
+  result->shape = {iter_ipt->second[0], iter_ipt->second[1]};
+  int numel = iter_ipt->second[0] * iter_ipt->second[1];
  int nbytes = numel * sizeof(float);
  result->Resize(numel);
-  std::memcpy(result->alpha.data(), alpha_resized.GetCpuMat()->data, nbytes);
+  std::memcpy(result->alpha.data(), mat.GetCpuMat()->data, nbytes);
  return true;
 }

@@ -214,12 +206,6 @@ bool PPMatting::Predict(cv::Mat* im, MattingResult* result) {

  std::map<std::string, std::array<int, 2>> im_info;

-  // Record the shape of image and the shape of preprocessed image
-  im_info["input_shape"] = {static_cast<int>(mat.Height()),
-                            static_cast<int>(mat.Width())};
-  im_info["output_shape"] = {static_cast<int>(mat.Height()),
-                             static_cast<int>(mat.Width())};
-
  if (!Preprocess(&mat, &(processed_data[0]), &im_info)) {
    FDERROR << "Failed to preprocess input data while using model:"
            << ModelName() << "." << std::endl;
--- a/tests/eval_example/test_ppmatting.py
+++ b/tests/eval_example/test_ppmatting.py
@@ -0,0 +1,109 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import fastdeploy as fd
+import cv2
+import os
+import pickle
+import numpy as np
+
+
+def test_matting_ppmatting():
+    model_url = "https://bj.bcebos.com/paddlehub/fastdeploy/PP-Matting-512.tgz"
+    input_url = "https://bj.bcebos.com/paddlehub/fastdeploy/matting_input.jpg"
+    fd.download_and_decompress(model_url, ".")
+    fd.download(input_url, ".")
+    model_path = "./PP-Matting-512"
+    # 配置runtime，加载模型
+    runtime_option = fd.RuntimeOption()
+    model_file = os.path.join(model_path, "model.pdmodel")
+    params_file = os.path.join(model_path, "model.pdiparams")
+    config_file = os.path.join(model_path, "deploy.yaml")
+    model = fd.vision.matting.PPMatting(
+        model_file, params_file, config_file, runtime_option=runtime_option)
+
+    # 预测图片抠图结果
+    im = cv2.imread("./matting_input.jpg")
+    result = model.predict(im.copy())
+    pkl_url = ""
+    if pkl_url:
+        fd.download("ppmatting_result.pkl", ".")
+    with open("./ppmatting_result.pkl", "rb") as f:
+        baseline = pickle.load(f)
+
+    diff = np.fabs(np.array(result.alpha) - np.array(baseline))
+    thres = 1e-05
+    assert diff.max() < thres, "The diff is %f, which is bigger than %f" % (
+        diff.max(), thres)
+
+
+def test_matting_ppmodnet():
+    model_url = "https://bj.bcebos.com/paddlehub/fastdeploy/PPModnet_MobileNetV2.tgz"
+    input_url = "https://bj.bcebos.com/paddlehub/fastdeploy/matting_input.jpg"
+    fd.download_and_decompress(model_url, ".")
+    fd.download(input_url, ".")
+    model_path = "./PPModnet_MobileNetV2"
+    # 配置runtime，加载模型
+    runtime_option = fd.RuntimeOption()
+    model_file = os.path.join(model_path, "model.pdmodel")
+    params_file = os.path.join(model_path, "model.pdiparams")
+    config_file = os.path.join(model_path, "deploy.yaml")
+    model = fd.vision.matting.PPMatting(
+        model_file, params_file, config_file, runtime_option=runtime_option)
+
+    # 预测图片抠图结果
+    im = cv2.imread("./matting_input.jpg")
+    result = model.predict(im.copy())
+
+    pkl_url = ""
+    if pkl_url:
+        fd.download("ppmodnet_result.pkl", ".")
+    with open("./ppmodnet_result.pkl", "rb") as f:
+        baseline = pickle.load(f)
+
+    diff = np.fabs(np.array(result.alpha) - np.array(baseline))
+    thres = 1e-05
+    assert diff.max() < thres, "The diff is %f, which is bigger than %f" % (
+        diff.max(), thres)
+
+
+def test_matting_pphumanmatting():
+    model_url = "https://bj.bcebos.com/paddlehub/fastdeploy/PPHumanMatting.tgz"
+    input_url = "https://bj.bcebos.com/paddlehub/fastdeploy/matting_input.jpg"
+    fd.download_and_decompress(model_url, ".")
+    fd.download(input_url, ".")
+    model_path = "./PPHumanMatting"
+    # 配置runtime，加载模型
+    runtime_option = fd.RuntimeOption()
+    model_file = os.path.join(model_path, "model.pdmodel")
+    params_file = os.path.join(model_path, "model.pdiparams")
+    config_file = os.path.join(model_path, "deploy.yaml")
+    model = fd.vision.matting.PPMatting(
+        model_file, params_file, config_file, runtime_option=runtime_option)
+
+    # 预测图片抠图结果
+    im = cv2.imread("./matting_input.jpg")
+    result = model.predict(im.copy())
+
+    pkl_url = ""
+    if pkl_url:
+        fd.download("pphumanmatting_result.pkl", ".")
+
+    with open("./pphumanmatting_result.pkl", "rb") as f:
+        baseline = pickle.load(f)
+
+    diff = np.fabs(np.array(result.alpha) - np.array(baseline))
+    thres = 1e-05
+    assert diff.max() < thres, "The diff is %f, which is bigger than %f" % (
+        diff.max(), thres)