Add custom operator for onnxruntime and fix paddle backend (#35)

Add custom operator for onnxruntime ans fix paddle backend
2025-10-22 00:02:10 +08:00 · 2022-07-23 22:21:36 +08:00
parent 51ecb407d4
commit 4b681581b1
12 changed files with 666 additions and 60 deletions
--- a/ThirdPartyNotices.txt
+++ b/ThirdPartyNotices.txt
@@ -732,3 +732,212 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
+
+---------
+7. https://github.com/oneapi-src/oneDNN/
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   ============================================================================
+
+   Copyright 2016-2021 Intel Corporation
+   Copyright 2018 YANDEX LLC
+   Copyright 2019-2021 FUJITSU LIMITED
+   Copyright 2020 Arm Limited and affiliates
+   Copyright 2020 Codeplay Software Limited
+   Copyright 2021 Alanna Tempest
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+   This distribution includes third party software ("third party programs").
+   This third party software, even if included with the distribution of
+   the Intel software, may be governed by separate license terms, including
+   without limitation, third party license terms, other Intel software license
+   terms, and open source software license terms. These separate license terms
+   govern your use of the third party programs as set forth in the
+   "THIRD-PARTY-PROGRAMS" file.
--- a/external/paddle_inference.cmake
+++ b/external/paddle_inference.cmake
@@ -83,12 +83,7 @@ ExternalProject_Add(
  BUILD_COMMAND ""
  UPDATE_COMMAND ""
  INSTALL_COMMAND
-    ${CMAKE_COMMAND} -E remove_directory ${PADDLEINFERENCE_INSTALL_DIR} &&
-    ${CMAKE_COMMAND} -E make_directory ${PADDLEINFERENCE_INSTALL_DIR} &&
-    ${CMAKE_COMMAND} -E rename ${PADDLEINFERENCE_SOURCE_DIR}/paddle/
-    ${PADDLEINFERENCE_INSTALL_DIR}/paddle && ${CMAKE_COMMAND} -E rename 
-    ${PADDLEINFERENCE_SOURCE_DIR}/third_party ${PADDLEINFERENCE_INSTALL_DIR}/third_party &&
-    ${CMAKE_COMMAND} -E rename ${PADDLEINFERENCE_SOURCE_DIR}/version.txt ${PADDLEINFERENCE_INSTALL_DIR}/version.txt
+    ${CMAKE_COMMAND} -E copy_directory ${PADDLEINFERENCE_SOURCE_DIR} ${PADDLEINFERENCE_INSTALL_DIR}
  BUILD_BYPRODUCTS ${PADDLEINFERENCE_COMPILE_LIB})

 add_library(external_paddle_inference STATIC IMPORTED GLOBAL)
--- a/external/utils.cmake
+++ b/external/utils.cmake
@@ -13,3 +13,16 @@ function(redefine_file_macro targetname)
            )
    endforeach()
 endfunction()
+
+function(download_and_decompress url filename decompress_dir)
+  if(NOT EXISTS ${filename})
+    message("Downloading file from ${url} ...")
+    file(DOWNLOAD ${url} "${filename}.tmp" SHOW_PROGRESS)
+    file(RENAME "${filename}.tmp" ${filename})
+  endif()
+  if(NOT EXISTS ${decompress_dir})
+    file(MAKE_DIRECTORY ${decompress_dir})
+  endif()
+  message("Decompress file ${filename} ...")
+  execute_process(COMMAND ${CMAKE_COMMAND} -E tar -xf ${filename} WORKING_DIRECTORY ${decompress_dir})
+endfunction()
--- a/fastdeploy/backends/ort/ops/multiclass_nms.cc
+++ b/fastdeploy/backends/ort/ops/multiclass_nms.cc
@@ -0,0 +1,260 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "fastdeploy/backends/ort/ops/multiclass_nms.h"
+#include <algorithm>
+#include "fastdeploy/core/fd_tensor.h"
+#include "fastdeploy/utils/utils.h"
+
+namespace fastdeploy {
+
+struct OrtTensorDimensions : std::vector<int64_t> {
+  OrtTensorDimensions(Ort::CustomOpApi ort, const OrtValue* value) {
+    OrtTensorTypeAndShapeInfo* info = ort.GetTensorTypeAndShape(value);
+    std::vector<int64_t>::operator=(ort.GetTensorShape(info));
+    ort.ReleaseTensorTypeAndShapeInfo(info);
+  }
+};
+
+template <class T>
+bool SortScorePairDescend(const std::pair<float, T>& pair1,
+                          const std::pair<float, T>& pair2) {
+  return pair1.first > pair2.first;
+}
+
+void GetMaxScoreIndex(const float* scores, const int& score_size,
+                      const float& threshold, const int& top_k,
+                      std::vector<std::pair<float, int>>* sorted_indices) {
+  for (size_t i = 0; i < score_size; ++i) {
+    if (scores[i] > threshold) {
+      sorted_indices->push_back(std::make_pair(scores[i], i));
+    }
+  }
+  // Sort the score pair according to the scores in descending order
+  std::stable_sort(sorted_indices->begin(), sorted_indices->end(),
+                   SortScorePairDescend<int>);
+  // Keep top_k scores if needed.
+  if (top_k > -1 && top_k < static_cast<int>(sorted_indices->size())) {
+    sorted_indices->resize(top_k);
+  }
+}
+
+float BBoxArea(const float* box, const bool& normalized) {
+  if (box[2] < box[0] || box[3] < box[1]) {
+    // If coordinate values are is invalid
+    // (e.g. xmax < xmin or ymax < ymin), return 0.
+    return 0.f;
+  } else {
+    const float w = box[2] - box[0];
+    const float h = box[3] - box[1];
+    if (normalized) {
+      return w * h;
+    } else {
+      // If coordinate values are not within range [0, 1].
+      return (w + 1) * (h + 1);
+    }
+  }
+}
+
+float JaccardOverlap(const float* box1, const float* box2,
+                     const bool& normalized) {
+  if (box2[0] > box1[2] || box2[2] < box1[0] || box2[1] > box1[3] ||
+      box2[3] < box1[1]) {
+    return 0.f;
+  } else {
+    const float inter_xmin = std::max(box1[0], box2[0]);
+    const float inter_ymin = std::max(box1[1], box2[1]);
+    const float inter_xmax = std::min(box1[2], box2[2]);
+    const float inter_ymax = std::min(box1[3], box2[3]);
+    float norm = normalized ? 0.0f : 1.0f;
+    float inter_w = inter_xmax - inter_xmin + norm;
+    float inter_h = inter_ymax - inter_ymin + norm;
+    const float inter_area = inter_w * inter_h;
+    const float bbox1_area = BBoxArea(box1, normalized);
+    const float bbox2_area = BBoxArea(box2, normalized);
+    return inter_area / (bbox1_area + bbox2_area - inter_area);
+  }
+}
+
+void MultiClassNmsKernel::FastNMS(const float* boxes, const float* scores,
+                                  const int& num_boxes,
+                                  std::vector<int>* keep_indices) {
+  std::vector<std::pair<float, int>> sorted_indices;
+  GetMaxScoreIndex(scores, num_boxes, score_threshold, nms_top_k,
+                   &sorted_indices);
+
+  float adaptive_threshold = nms_threshold;
+  while (sorted_indices.size() != 0) {
+    const int idx = sorted_indices.front().second;
+    bool keep = true;
+    for (size_t k = 0; k < keep_indices->size(); ++k) {
+      if (!keep) {
+        break;
+      }
+      const int kept_idx = (*keep_indices)[k];
+      float overlap =
+          JaccardOverlap(boxes + idx * 4, boxes + kept_idx * 4, normalized);
+      keep = overlap <= adaptive_threshold;
+    }
+    if (keep) {
+      keep_indices->push_back(idx);
+    }
+    sorted_indices.erase(sorted_indices.begin());
+    if (keep && nms_eta<1.0 & adaptive_threshold> 0.5) {
+      adaptive_threshold *= nms_eta;
+    }
+  }
+}
+
+int MultiClassNmsKernel::NMSForEachSample(
+    const float* boxes, const float* scores, int num_boxes, int num_classes,
+    std::map<int, std::vector<int>>* keep_indices) {
+  for (int i = 0; i < num_classes; ++i) {
+    if (i == background_label) {
+      continue;
+    }
+    const float* score_for_class_i = scores + i * num_boxes;
+    FastNMS(boxes, score_for_class_i, num_boxes, &((*keep_indices)[i]));
+  }
+  int num_det = 0;
+  for (auto iter = keep_indices->begin(); iter != keep_indices->end(); ++iter) {
+    num_det += iter->second.size();
+  }
+
+  if (keep_top_k > -1 && num_det > keep_top_k) {
+    std::vector<std::pair<float, std::pair<int, int>>> score_index_pairs;
+    for (const auto& it : *keep_indices) {
+      int label = it.first;
+      const float* current_score = scores + label * num_boxes;
+      auto& label_indices = it.second;
+      for (size_t j = 0; j < label_indices.size(); ++j) {
+        int idx = label_indices[j];
+        score_index_pairs.push_back(
+            std::make_pair(current_score[idx], std::make_pair(label, idx)));
+      }
+    }
+    std::stable_sort(score_index_pairs.begin(), score_index_pairs.end(),
+                     SortScorePairDescend<std::pair<int, int>>);
+    score_index_pairs.resize(keep_top_k);
+
+    std::map<int, std::vector<int>> new_indices;
+    for (size_t j = 0; j < score_index_pairs.size(); ++j) {
+      int label = score_index_pairs[j].second.first;
+      int idx = score_index_pairs[j].second.second;
+      new_indices[label].push_back(idx);
+    }
+    new_indices.swap(*keep_indices);
+    num_det = keep_top_k;
+  }
+  return num_det;
+}
+
+void MultiClassNmsKernel::Compute(OrtKernelContext* context) {
+  const OrtValue* boxes = ort_.KernelContext_GetInput(context, 0);
+  const OrtValue* scores = ort_.KernelContext_GetInput(context, 1);
+  const float* boxes_data =
+      reinterpret_cast<const float*>(ort_.GetTensorData<float>(boxes));
+  const float* scores_data =
+      reinterpret_cast<const float*>(ort_.GetTensorData<float>(scores));
+  OrtTensorDimensions boxes_dim(ort_, boxes);
+  OrtTensorDimensions scores_dim(ort_, scores);
+  int score_size = scores_dim.size();
+
+  int64_t batch_size = scores_dim[0];
+  int64_t box_dim = boxes_dim[2];
+  int64_t out_dim = box_dim + 2;
+
+  int num_nmsed_out = 0;
+  FDASSERT(score_size == 3, "Require rank of input scores be 3, but now it's " +
+                                std::to_string(score_size) + ".");
+  FDASSERT(boxes_dim[2] == 4,
+           "Require the 3-dimension of input boxes be 4, but now it's " +
+               std::to_string(boxes_dim[2]) + ".");
+  std::vector<int64_t> out_num_rois_dims = {batch_size};
+  OrtValue* out_num_rois = ort_.KernelContext_GetOutput(
+      context, 2, out_num_rois_dims.data(), out_num_rois_dims.size());
+  int32_t* out_num_rois_data = ort_.GetTensorMutableData<int32_t>(out_num_rois);
+
+  std::vector<std::map<int, std::vector<int>>> all_indices;
+  for (size_t i = 0; i < batch_size; ++i) {
+    std::map<int, std::vector<int>> indices;  // indices kept for each class
+    const float* current_boxes_ptr =
+        boxes_data + i * boxes_dim[1] * boxes_dim[2];
+    const float* current_scores_ptr =
+        scores_data + i * scores_dim[1] * scores_dim[2];
+    int num = NMSForEachSample(current_boxes_ptr, current_scores_ptr,
+                               boxes_dim[1], scores_dim[1], &indices);
+    num_nmsed_out += num;
+    out_num_rois_data[i] = num;
+    all_indices.emplace_back(indices);
+  }
+  std::vector<int64_t> out_box_dims = {num_nmsed_out, 6};
+  std::vector<int64_t> out_index_dims = {num_nmsed_out, 1};
+  OrtValue* out_box = ort_.KernelContext_GetOutput(
+      context, 0, out_box_dims.data(), out_box_dims.size());
+  OrtValue* out_index = ort_.KernelContext_GetOutput(
+      context, 1, out_index_dims.data(), out_index_dims.size());
+  if (num_nmsed_out == 0) {
+    int32_t* out_num_rois_data =
+        ort_.GetTensorMutableData<int32_t>(out_num_rois);
+    for (size_t i = 0; i < batch_size; ++i) {
+      out_num_rois_data[i] = 0;
+    }
+    return;
+  }
+  float* out_box_data = ort_.GetTensorMutableData<float>(out_box);
+  int32_t* out_index_data = ort_.GetTensorMutableData<int32_t>(out_index);
+
+  int count = 0;
+  for (size_t i = 0; i < batch_size; ++i) {
+    const float* current_boxes_ptr =
+        boxes_data + i * boxes_dim[1] * boxes_dim[2];
+    const float* current_scores_ptr =
+        scores_data + i * scores_dim[1] * scores_dim[2];
+    for (const auto& it : all_indices[i]) {
+      int label = it.first;
+      const auto& indices = it.second;
+      const float* current_scores_class_ptr =
+          current_scores_ptr + label * scores_dim[2];
+      for (size_t j = 0; j < indices.size(); ++j) {
+        int start = count * 6;
+        out_box_data[start] = label;
+        out_box_data[start + 1] = current_scores_class_ptr[indices[j]];
+
+        out_box_data[start + 2] = current_boxes_ptr[indices[j] * 4];
+        out_box_data[start + 3] = current_boxes_ptr[indices[j] * 4 + 1];
+        out_box_data[start + 4] = current_boxes_ptr[indices[j] * 4 + 2];
+
+        out_box_data[start + 5] = current_boxes_ptr[indices[j] * 4 + 3];
+        out_index_data[count] = i * boxes_dim[1] + indices[j];
+        count += 1;
+      }
+    }
+  }
+}
+
+void MultiClassNmsKernel::GetAttribute(const OrtKernelInfo* info) {
+  background_label =
+      ort_.KernelInfoGetAttribute<int64_t>(info, "background_label");
+  keep_top_k = ort_.KernelInfoGetAttribute<int64_t>(info, "keep_top_k");
+  nms_eta = ort_.KernelInfoGetAttribute<float>(info, "nms_eta");
+  nms_threshold = ort_.KernelInfoGetAttribute<float>(info, "nms_threshold");
+  nms_top_k = ort_.KernelInfoGetAttribute<int64_t>(info, "nms_top_k");
+  normalized = ort_.KernelInfoGetAttribute<int64_t>(info, "normalized");
+  score_threshold = ort_.KernelInfoGetAttribute<float>(info, "score_threshold");
+  std::cout << background_label << " " << keep_top_k << " " << nms_eta << " "
+            << nms_threshold << " " << nms_top_k << " " << normalized << " "
+            << score_threshold << " " << std::endl;
+}
+}  // namespace fastdeploy
--- a/fastdeploy/backends/ort/ops/multiclass_nms.h
+++ b/fastdeploy/backends/ort/ops/multiclass_nms.h
@@ -0,0 +1,76 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+#include <map>
+#include "onnxruntime_cxx_api.h"  // NOLINT
+
+namespace fastdeploy {
+
+struct MultiClassNmsKernel {
+ protected:
+  int64_t background_label = -1;
+  int64_t keep_top_k = -1;
+  float nms_eta;
+  float nms_threshold = 0.7;
+  int64_t nms_top_k;
+  bool normalized;
+  float score_threshold;
+  Ort::CustomOpApi ort_;
+
+ public:
+  MultiClassNmsKernel(Ort::CustomOpApi ort, const OrtKernelInfo* info)
+      : ort_(ort) {
+    GetAttribute(info);
+  }
+
+  void GetAttribute(const OrtKernelInfo* info);
+
+  void Compute(OrtKernelContext* context);
+  void FastNMS(const float* boxes, const float* scores, const int& num_boxes,
+               std::vector<int>* keep_indices);
+  int NMSForEachSample(const float* boxes, const float* scores, int num_boxes,
+                       int num_classes,
+                       std::map<int, std::vector<int>>* keep_indices);
+};
+
+struct MultiClassNmsOp
+    : Ort::CustomOpBase<MultiClassNmsOp, MultiClassNmsKernel> {
+  void* CreateKernel(Ort::CustomOpApi api, const OrtKernelInfo* info) const {
+    return new MultiClassNmsKernel(api, info);
+  }
+
+  const char* GetName() const { return "MultiClassNMS"; }
+
+  size_t GetInputTypeCount() const { return 2; }
+
+  ONNXTensorElementDataType GetInputType(size_t index) const {
+    return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT;
+  }
+
+  size_t GetOutputTypeCount() const { return 3; }
+
+  ONNXTensorElementDataType GetOutputType(size_t index) const {
+    if (index == 0) {
+      return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT;
+    }
+    return ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32;
+  }
+
+  const char* GetExecutionProviderType() const {
+    return "CPUExecutionProvider";
+  }
+};
+
+}  // namespace fastdeploy
--- a/fastdeploy/backends/ort/ort_backend.cc
+++ b/fastdeploy/backends/ort/ort_backend.cc
@@ -13,15 +13,19 @@
 // limitations under the License.

 #include "fastdeploy/backends/ort/ort_backend.h"
+#include <memory>
+#include "fastdeploy/backends/ort/ops/multiclass_nms.h"
 #include "fastdeploy/backends/ort/utils.h"
 #include "fastdeploy/utils/utils.h"
-#include <memory>
 #ifdef ENABLE_PADDLE_FRONTEND
 #include "paddle2onnx/converter.h"
 #endif

 namespace fastdeploy {

+std::vector<OrtCustomOp*> OrtBackend::custom_operators_ =
+    std::vector<OrtCustomOp*>();
+
 ONNXTensorElementDataType GetOrtDtype(FDDataType fd_dtype) {
  if (fd_dtype == FDDataType::FP32) {
    return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT;
@@ -131,7 +135,9 @@ bool OrtBackend::InitFromOnnx(const std::string& model_file,
            << std::endl;
    return false;
  }
+
  BuildOption(option);
+  InitCustomOperators();
  if (from_memory_buffer) {
    session_ = {env_, model_file.data(), model_file.size(), session_options_};
  } else {
@@ -275,4 +281,15 @@ TensorInfo OrtBackend::GetOutputInfo(int index) {
  return info;
 }

+void OrtBackend::InitCustomOperators() {
+  if (custom_operators_.size() == 0) {
+    MultiClassNmsOp* custom_op = new MultiClassNmsOp{};
+    custom_operators_.push_back(custom_op);
+  }
+  for (size_t i = 0; i < custom_operators_.size(); ++i) {
+    custom_op_domain_.Add(custom_operators_[i]);
+  }
+  session_options_.Add(custom_op_domain_);
+}
+
 }  // namespace fastdeploy
--- a/fastdeploy/backends/ort/ort_backend.h
+++ b/fastdeploy/backends/ort/ort_backend.h
@@ -68,6 +68,8 @@ class OrtBackend : public BaseBackend {

  TensorInfo GetInputInfo(int index);
  TensorInfo GetOutputInfo(int index);
+  static std::vector<OrtCustomOp*> custom_operators_;
+  void InitCustomOperators();

 private:
  Ort::Env env_;
@@ -76,9 +78,8 @@ class OrtBackend : public BaseBackend {
  std::shared_ptr<Ort::IoBinding> binding_;
  std::vector<OrtValueInfo> inputs_desc_;
  std::vector<OrtValueInfo> outputs_desc_;
-
+  Ort::CustomOpDomain custom_op_domain_ = Ort::CustomOpDomain("Paddle");
  OrtBackendOption option_;
-
  void CopyToCpu(const Ort::Value& value, FDTensor* tensor);
 };
 }  // namespace fastdeploy
--- a/fastdeploy/backends/paddle/util.cc
+++ b/fastdeploy/backends/paddle/util.cc
@@ -17,6 +17,7 @@
 namespace fastdeploy {
 void ShareTensorFromCpu(paddle_infer::Tensor* tensor, FDTensor& fd_tensor) {
  std::vector<int> shape(fd_tensor.shape.begin(), fd_tensor.shape.end());
+  tensor->Reshape(shape);
  if (fd_tensor.dtype == FDDataType::FP32) {
    tensor->ShareExternalData(static_cast<const float*>(fd_tensor.Data()),
                              shape, paddle_infer::PlaceType::kCPU);
--- a/fastdeploy/fastdeploy_model.cc
+++ b/fastdeploy/fastdeploy_model.cc
@@ -18,7 +18,7 @@ namespace fastdeploy {

 bool FastDeployModel::InitRuntime() {
  FDASSERT(
-      ModelFormatCheck(runtime_option.model_file, runtime_option.model_format),
+      CheckModelFormat(runtime_option.model_file, runtime_option.model_format),
      "ModelFormatCheck Failed.");
  if (runtime_initialized_) {
    FDERROR << "The model is already initialized, cannot be initliazed again."
--- a/fastdeploy/fastdeploy_runtime.cc
+++ b/fastdeploy/fastdeploy_runtime.cc
@@ -72,7 +72,7 @@ std::string Str(const Frontend& f) {
  return "UNKNOWN-Frontend";
 }

-bool ModelFormatCheck(const std::string& model_file,
+bool CheckModelFormat(const std::string& model_file,
                      const Frontend& model_format) {
  if (model_format == Frontend::PADDLE) {
    if (model_file.size() < 8 ||
@@ -99,8 +99,28 @@ bool ModelFormatCheck(const std::string& model_file,
  return true;
 }

+Frontend GuessModelFormat(const std::string& model_file) {
+  if (model_file.size() > 8 &&
+      model_file.substr(model_file.size() - 8, 8) == ".pdmodel") {
+    FDLogger() << "Model Format: PaddlePaddle." << std::endl;
+    return Frontend::PADDLE;
+  } else if (model_file.size() > 5 &&
+             model_file.substr(model_file.size() - 5, 5) == ".onnx") {
+    FDLogger() << "Model Format: ONNX." << std::endl;
+    return Frontend::ONNX;
+  }
+
+  FDERROR << "Cannot guess which model format you are using, please set "
+             "RuntimeOption::model_format manually."
+          << std::endl;
+  return Frontend::PADDLE;
+}
+
 bool Runtime::Init(const RuntimeOption& _option) {
  option = _option;
+  if (option.model_format == Frontend::AUTOREC) {
+    option.model_format = GuessModelFormat(_option.model_file);
+  }
  if (option.backend == Backend::UNKNOWN) {
    if (IsBackendAvailable(Backend::ORT)) {
      option.backend = Backend::ORT;
@@ -124,6 +144,9 @@ bool Runtime::Init(const RuntimeOption& _option) {
  } else if (option.backend == Backend::PDINFER) {
    FDASSERT(option.device == Device::CPU || option.device == Device::GPU,
             "Backend::TRT only supports Device::CPU/Device::GPU.");
+    FDASSERT(
+        option.model_format == Frontend::PADDLE,
+        "Backend::PDINFER only supports model format of Frontend::PADDLE.");
    CreatePaddleBackend();
  } else {
    FDERROR << "Runtime only support "
@@ -163,8 +186,8 @@ void Runtime::CreatePaddleBackend() {
           "Load model from Paddle failed while initliazing PaddleBackend.");
 #else
  FDASSERT(false,
-           "OrtBackend is not available, please compiled with "
-           "ENABLE_ORT_BACKEND=ON.");
+           "PaddleBackend is not available, please compiled with "
+           "ENABLE_PADDLE_BACKEND=ON.");
 #endif
 }

--- a/fastdeploy/fastdeploy_runtime.h
+++ b/fastdeploy/fastdeploy_runtime.h
@@ -21,7 +21,9 @@
 namespace fastdeploy {

 enum FASTDEPLOY_DECL Backend { UNKNOWN, ORT, TRT, PDINFER };
-enum FASTDEPLOY_DECL Frontend { PADDLE, ONNX };
+// AUTOREC will according to the name of model file
+// to decide which Frontend is
+enum FASTDEPLOY_DECL Frontend { AUTOREC, PADDLE, ONNX };

 FASTDEPLOY_DECL std::string Str(const Backend& b);
 FASTDEPLOY_DECL std::string Str(const Frontend& f);
@@ -29,8 +31,9 @@ FASTDEPLOY_DECL std::vector<Backend> GetAvailableBackends();

 FASTDEPLOY_DECL bool IsBackendAvailable(const Backend& backend);

-bool ModelFormatCheck(const std::string& model_file,
+bool CheckModelFormat(const std::string& model_file,
                      const Frontend& model_format);
+Frontend GuessModelFormat(const std::string& model_file);

 struct FASTDEPLOY_DECL RuntimeOption {
  Backend backend = Backend::UNKNOWN;
@@ -71,7 +74,7 @@ struct FASTDEPLOY_DECL RuntimeOption {

  std::string model_file = "";   // Path of model file
  std::string params_file = "";  // Path of parameters file, can be empty
-  Frontend model_format = Frontend::PADDLE;  // format of input model
+  Frontend model_format = Frontend::AUTOREC;  // format of input model
 };

 struct FASTDEPLOY_DECL Runtime {
--- a/setup.py
+++ b/setup.py
@@ -126,6 +126,15 @@ class ONNXCommand(setuptools.Command):
        pass


+def GetAllFiles(dirname):
+    files = list()
+    for root, dirs, filenames in os.walk(dirname):
+        for f in filenames:
+            fullname = os.path.join(root, f)
+            files.append(fullname)
+    return files
+
+
 class create_version(ONNXCommand):
    def run(self):
        with open(os.path.join(SRC_DIR, 'version.py'), 'w') as f:
@@ -326,50 +335,49 @@ if sys.argv[1] == "install" or sys.argv[1] == "bdist_wheel":
    shutil.copy("LICENSE", "fastdeploy")
    depend_libs = list()

-    if platform.system().lower() == "linux":
-        for f in os.listdir(".setuptools-cmake-build"):
-            full_name = os.path.join(".setuptools-cmake-build", f)
-            if not os.path.isfile(full_name):
-                continue
-            if not full_name.count("fastdeploy_main.cpython-"):
-                continue
-            if not full_name.endswith(".so"):
-                continue
-            # modify the search path of libraries
-            command = "patchelf --set-rpath '$ORIGIN/libs/' {}".format(
-                full_name)
-            # The sw_64 not suppot patchelf, so we just disable that.
-            if platform.machine() != 'sw_64' and platform.machine(
-            ) != 'mips64':
-                assert os.system(
-                    command
-                ) == 0, "patch fastdeploy_main.cpython-36m-x86_64-linux-gnu.so failed, the command: {}".format(
-                    command)
-
+    # copy fastdeploy library
+    pybind_so_file = None
    for f in os.listdir(".setuptools-cmake-build"):
        if not os.path.isfile(os.path.join(".setuptools-cmake-build", f)):
            continue
-        if f.count("libfastdeploy") > 0:
+        if f.count("fastdeploy") > 0:
            shutil.copy(
                os.path.join(".setuptools-cmake-build", f), "fastdeploy/libs")
-    for dirname in os.listdir(".setuptools-cmake-build/third_libs/install"):
-        for lib in os.listdir(
-                os.path.join(".setuptools-cmake-build/third_libs/install",
-                             dirname, "lib")):
-            if lib.count(".so") == 0 and lib.count(
-                    ".dylib") == 0 and lib.count(".a") == 0:
-                continue
-            if not os.path.isfile(
-                    os.path.join(".setuptools-cmake-build/third_libs/install",
-                                 dirname, "lib", lib)):
-                continue
-            shutil.copy(
-                os.path.join(".setuptools-cmake-build/third_libs/install",
-                             dirname, "lib", lib), "fastdeploy/libs")
+        if f.count("fastdeploy_main.cpython-"):
+            pybind_so_file = f

-    all_libs = os.listdir("fastdeploy/libs")
-    for lib in all_libs:
-        package_data[PACKAGE_NAME].append(os.path.join("libs", lib))
+    if not os.path.exists(".setuptools-cmake-build/third_libs/install"):
+        raise Exception(
+            "Cannot find directory third_libs/install in .setuptools-cmake-build."
+        )
+
+    if os.path.exists("fastdeploy/libs/third_libs"):
+        shutil.rmtree("fastdeploy/libs/third_libs")
+    shutil.copytree(
+        ".setuptools-cmake-build/third_libs/install",
+        "fastdeploy/libs/third_libs",
+        symlinks=True)
+
+    all_files = GetAllFiles("fastdeploy/libs")
+    for f in all_files:
+        package_data[PACKAGE_NAME].append(os.path.relpath(f, "fastdeploy"))
+
+    if platform.system().lower() == "linux":
+        rpaths = ["${ORIGIN}"]
+        for root, dirs, files in os.walk("fastdeploy/libs/third_libs"):
+            for d in dirs:
+                if d == "lib":
+                    path = os.path.relpath(
+                        os.path.join(root, d), "fastdeploy/libs")
+                    rpaths.append("${ORIGIN}/" + format(path))
+        rpaths = ":".join(rpaths)
+        command = "patchelf --set-rpath '{}' ".format(rpaths) + os.path.join(
+            "fastdeploy/libs", pybind_so_file)
+        # The sw_64 not suppot patchelf, so we just disable that.
+        if platform.machine() != 'sw_64' and platform.machine() != 'mips64':
+            assert os.system(
+                command) == 0, "patchelf {} failed, the command: {}".format(
+                    command, pybind_so_file)

 setuptools.setup(
    name=PACKAGE_NAME,
@@ -382,9 +390,9 @@ setuptools.setup(
    include_package_data=True,
    setup_requires=setup_requires,
    extras_require=extras_require,
-    author='paddle-infer',
-    author_email='paddle-infer@baidu.com',
-    url='https://github.com/PaddlePaddle/Paddle2ONNX.git',
+    author='fastdeploy',
+    author_email='fastdeploy@baidu.com',
+    url='https://github.com/PaddlePaddle/FastDeploy.git',
    install_requires=REQUIRED_PACKAGES,
    classifiers=[
        "Programming Language :: Python :: 3",