[Model] Support Insightface model inferenceing on RKNPU (#1113)

* 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * 更新交叉编译 * Update issues.md * Update fastdeploy_init.sh * 更新交叉编译 * 更新insightface系列模型的rknpu2支持 * 更新insightface系列模型的rknpu2支持 * 更新说明文档 * 更新insightface * 尝试解决pybind问题 Co-authored-by: Jason <928090362@qq.com> Co-authored-by: Jason <jiangjiajun@baidu.com>
2025-10-06 00:57:33 +08:00 · 2023-01-14 20:40:33 +08:00
parent f88c662449
commit 1dabfdf3f1
21 changed files with 712 additions and 147 deletions
--- a/docs/cn/faq/rknpu2/rknpu2.md
+++ b/docs/cn/faq/rknpu2/rknpu2.md
@@ -14,7 +14,7 @@ ONNX模型不能直接调用RK芯片中的NPU进行运算，需要把ONNX模型
 * NPU均使用单核进行测试
 | 任务场景                 | 模型                                                                                       | 模型版本(表示已经测试的版本)          | ARM CPU/RKNN速度(ms) |
-|----------------|------------------------------------------------------------------------------------------|--------------------------|--------------------|
+|----------------------|------------------------------------------------------------------------------------------|--------------------------|--------------------|
 | Detection            | [Picodet](../../../../examples/vision/detection/paddledetection/rknpu2/README.md)        | Picodet-s                | 162/112            |
 | Detection            | [RKYOLOV5](../../../../examples/vision/detection/rkyolo/README.md)                       | YOLOV5-S-Relu(int8)      | -/57               |
 | Detection            | [RKYOLOX](../../../../examples/vision/detection/rkyolo/README.md)                        | -                        | -/-                |
@@ -23,4 +23,12 @@ ONNX模型不能直接调用RK芯片中的NPU进行运算，需要把ONNX模型
 | Segmentation         | [PP-HumanSegV2Lite](../../../../examples/vision/segmentation/paddleseg/rknpu2/README.md) | portrait(int8)           | 133/43             |
 | Segmentation         | [PP-HumanSegV2Lite](../../../../examples/vision/segmentation/paddleseg/rknpu2/README.md) | human(int8)              | 133/43             |
 | Face Detection       | [SCRFD](../../../../examples/vision/facedet/scrfd/rknpu2/README.md)                      | SCRFD-2.5G-kps-640(int8) | 108/42             |
 | Face FaceRecognition | [InsightFace](../../../../examples/vision/faceid/insightface/rknpu2/README_CN.md)        | ms1mv3_arcface_r18(int8) | 81/12              |
 | Classification       | [ResNet](../../../../examples/vision/classification/paddleclas/rknpu2/README.md)         | ResNet50_vd              | -/33               |
 ## 预编译库下载
 为了方便大家进行开发，这里提供1.0.2版本的FastDeploy给大家使用
 - [FastDeploy RK356X c++ SDK](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-rk356X-1.0.2.tgz)
 - [FastDeploy RK3588 c++ SDK](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-rk3588-1.0.2.tgz)
--- a/examples/vision/faceid/insightface/cpp/README_CN.md
+++ b/examples/vision/faceid/insightface/cpp/README_CN.md
@@ -101,7 +101,7 @@ VPL模型加载和初始化，其中model_file为导出的ONNX模型格式。
 #### Predict函数
 > ```c++
-> ArcFace::Predict(cv::Mat* im, FaceRecognitionResult* result)
+> ArcFace::Predict(const cv::Mat& im, FaceRecognitionResult* result)
 > ```
 >
 > 模型预测接口，输入图像直接输出检测结果。
@@ -121,8 +121,6 @@ VPL模型加载和初始化，其中model_file为导出的ONNX模型格式。
      通过InsightFaceRecognitionPreprocessor::SetAlpha(std::vector<float>& alpha)来进行修改
 > > * **beta**(vector&lt;float&gt;): 预处理归一化的beta值，计算公式为`x'=x*alpha+beta`，beta默认为[-1.f, -1.f, -1.f],
      通过InsightFaceRecognitionPreprocessor::SetBeta(std::vector<float>& beta)来进行修改
 > > * **permute**(bool): 预处理是否将BGR转换成RGB，默认true,
      通过InsightFaceRecognitionPreprocessor::SetPermute(bool permute)来进行修改
 #### InsightFaceRecognitionPostprocessor成员变量(后处理参数)
 > > * **l2_normalize**(bool): 输出人脸向量之前是否执行l2归一化，默认false,
--- a/examples/vision/faceid/insightface/python/README_CN.md
+++ b/examples/vision/faceid/insightface/python/README_CN.md
@@ -100,7 +100,6 @@ ArcFace模型加载和初始化，其中model_file为导出的ONNX模型格式
 > > * **size**(list[int]): 通过此参数修改预处理过程中resize的大小，包含两个整型元素，表示[width, height], 默认值为[112, 112]
 > > * **alpha**(list[float]): 预处理归一化的alpha值，计算公式为`x'=x*alpha+beta`，alpha默认为[1. / 127.5, 1.f / 127.5, 1. / 127.5]
 > > * **beta**(list[float]): 预处理归一化的beta值，计算公式为`x'=x*alpha+beta`，beta默认为[-1.f, -1.f, -1.f]
 > > * **swap_rb**(bool): 预处理是否将BGR转换成RGB，默认True
 #### AdaFacePostprocessor的成员变量
 以下变量为AdaFacePostprocessor的成员变量
--- a/examples/vision/faceid/insightface/python/infer_arcface.py
+++ b/examples/vision/faceid/insightface/python/infer_arcface.py
@@ -3,7 +3,6 @@ import cv2
 import numpy as np
 # 余弦相似度
 def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
@@ -56,24 +55,17 @@ def build_option(args):
 args = parse_arguments()
 # 配置runtime，加载模型
 runtime_option = build_option(args)
 model = fd.vision.faceid.ArcFace(args.model, runtime_option=runtime_option)
 # 加载图片
 face0 = cv2.imread(args.face)  # 0,1 同一个人
 face1 = cv2.imread(args.face_positive)
 face2 = cv2.imread(args.face_negative)  # 0,2 不同的人
 # 设置 l2 normalize
 model.postprocessor.l2_normalize = True
 # 预测图片检测结果
 result0 = model.predict(face0)
 result1 = model.predict(face1)
 result2 = model.predict(face2)
 # 计算余弦相似度
 embedding0 = result0.embedding
 embedding1 = result1.embedding
 embedding2 = result2.embedding
@@ -81,7 +73,6 @@ embedding2 = result2.embedding
 cosine01 = cosine_similarity(embedding0, embedding1)
 cosine02 = cosine_similarity(embedding0, embedding2)
 # 打印结果
 print(result0, end="")
 print(result1, end="")
 print(result2, end="")
--- a/examples/vision/faceid/insightface/python/infer_cosface.py
+++ b/examples/vision/faceid/insightface/python/infer_cosface.py
@@ -3,7 +3,6 @@ import cv2
 import numpy as np
 # 余弦相似度
 def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
@@ -56,24 +55,17 @@ def build_option(args):
 args = parse_arguments()
 # 配置runtime，加载模型
 runtime_option = build_option(args)
 model = fd.vision.faceid.CosFace(args.model, runtime_option=runtime_option)
-# 加载图片
+face0 = cv2.imread(args.face)
 face0 = cv2.imread(args.face)  # 0,1 同一个人
 face1 = cv2.imread(args.face_positive)
-face2 = cv2.imread(args.face_negative)  # 0,2 不同的人
+face2 = cv2.imread(args.face_negative)
 # 设置 l2 normalize
 model.postprocessor.l2_normalize = True
 # 预测图片检测结果
 result0 = model.predict(face0)
 result1 = model.predict(face1)
 result2 = model.predict(face2)
 # 计算余弦相似度
 embedding0 = result0.embedding
 embedding1 = result1.embedding
 embedding2 = result2.embedding
@@ -81,7 +73,6 @@ embedding2 = result2.embedding
 cosine01 = cosine_similarity(embedding0, embedding1)
 cosine02 = cosine_similarity(embedding0, embedding2)
 # 打印结果
 print(result0, end="")
 print(result1, end="")
 print(result2, end="")
--- a/examples/vision/faceid/insightface/python/infer_partial_fc.py
+++ b/examples/vision/faceid/insightface/python/infer_partial_fc.py
@@ -3,7 +3,6 @@ import cv2
 import numpy as np
 # 余弦相似度
 def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
@@ -56,24 +55,18 @@ def build_option(args):
 args = parse_arguments()
 # 配置runtime，加载模型
 runtime_option = build_option(args)
 model = fd.vision.faceid.PartialFC(args.model, runtime_option=runtime_option)
 # 加载图片
-face0 = cv2.imread(args.face)  # 0,1 同一个人
+face0 = cv2.imread(args.face)
 face1 = cv2.imread(args.face_positive)
-face2 = cv2.imread(args.face_negative)  # 0,2 不同的人
+face2 = cv2.imread(args.face_negative)
 # 设置 l2 normalize
 model.postprocessor.l2_normalize = True
 # 预测图片检测结果
 result0 = model.predict(face0)
 result1 = model.predict(face1)
 result2 = model.predict(face2)
 # 计算余弦相似度
 embedding0 = result0.embedding
 embedding1 = result1.embedding
 embedding2 = result2.embedding
@@ -81,7 +74,6 @@ embedding2 = result2.embedding
 cosine01 = cosine_similarity(embedding0, embedding1)
 cosine02 = cosine_similarity(embedding0, embedding2)
 # 打印结果
 print(result0, end="")
 print(result1, end="")
 print(result2, end="")
--- a/examples/vision/faceid/insightface/python/infer_vpl.py
+++ b/examples/vision/faceid/insightface/python/infer_vpl.py
@@ -3,7 +3,6 @@ import cv2
 import numpy as np
 # 余弦相似度
 def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
@@ -56,24 +55,17 @@ def build_option(args):
 args = parse_arguments()
 # 配置runtime，加载模型
 runtime_option = build_option(args)
 model = fd.vision.faceid.VPL(args.model, runtime_option=runtime_option)
 # 加载图片
 face0 = cv2.imread(args.face)  # 0,1 同一个人
 face1 = cv2.imread(args.face_positive)
 face2 = cv2.imread(args.face_negative)  # 0,2 不同的人
 # 设置 l2 normalize
 model.postprocessor.l2_normalize = True
 # 预测图片检测结果
 result0 = model.predict(face0)
 result1 = model.predict(face1)
 result2 = model.predict(face2)
 # 计算余弦相似度
 embedding0 = result0.embedding
 embedding1 = result1.embedding
 embedding2 = result2.embedding
@@ -81,7 +73,6 @@ embedding2 = result2.embedding
 cosine01 = cosine_similarity(embedding0, embedding1)
 cosine02 = cosine_similarity(embedding0, embedding2)
 # 打印结果
 print(result0, end="")
 print(result1, end="")
 print(result2, end="")
--- a/examples/vision/faceid/insightface/rknpu2/README.md
+++ b/examples/vision/faceid/insightface/rknpu2/README.md
@@ -0,0 +1,54 @@
 [English](README.md) | 简体中文
 # InsightFace RKNPU准备部署模型
 本教程提供InsightFace模型在RKNPU2环境下的部署，模型的详细介绍已经ONNX模型的下载请查看[模型介绍文档](../README.md)。
 ## 支持模型列表
 目前FastDeploy支持如下模型的部署
 - ArcFace
 - CosFace
 - PartialFC
 - VPL
 ## 下载预训练ONNX模型
 为了方便开发者的测试，下面提供了InsightFace导出的各系列模型，开发者可直接下载使用。（下表中模型的精度来源于源官方库）其中精度指标来源于InsightFace中对各模型的介绍，详情各参考InsightFace中的说明
 | 模型                                                                                         | 大小    | 精度 (AgeDB_30) |
 |:-------------------------------------------------------------------------------------------|:------|:--------------|
 | [CosFace-r18](https://bj.bcebos.com/paddlehub/fastdeploy/glint360k_cosface_r18.onnx)       | 92MB  | 97.7          |
 | [CosFace-r34](https://bj.bcebos.com/paddlehub/fastdeploy/glint360k_cosface_r34.onnx)       | 131MB | 98.3          |
 | [CosFace-r50](https://bj.bcebos.com/paddlehub/fastdeploy/glint360k_cosface_r50.onnx)       | 167MB | 98.3          |
 | [CosFace-r100](https://bj.bcebos.com/paddlehub/fastdeploy/glint360k_cosface_r100.onnx)     | 249MB | 98.4          |
 | [ArcFace-r18](https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r18.onnx)          | 92MB  | 97.7          |
 | [ArcFace-r34](https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r34.onnx)          | 131MB | 98.1          |
 | [ArcFace-r50](https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r50.onnx)          | 167MB | -             |
 | [ArcFace-r100](https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r100.onnx)        | 249MB | 98.4          |
 | [ArcFace-r100_lr0.1](https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_r100_lr01.onnx)     | 249MB | 98.4          |
 | [PartialFC-r34](https://bj.bcebos.com/paddlehub/fastdeploy/partial_fc_glint360k_r50.onnx)  | 167MB | -             |
 | [PartialFC-r50](https://bj.bcebos.com/paddlehub/fastdeploy/partial_fc_glint360k_r100.onnx) | 249MB | -             |
 ## 转换为RKNPU模型
 ```bash
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r18.onnx
 python -m paddle2onnx.optimize --input_model ./ms1mv3_arcface_r18/ms1mv3_arcface_r18.onnx \
                               --output_model ./ms1mv3_arcface_r18/ms1mv3_arcface_r18.onnx \
                               --input_shape_dict "{'data':[1,3,112,112]}"
 python  /Path/To/FastDeploy/tools/rknpu2/export.py \
        --config_path tools/rknpu2/config/arcface_unquantized.yaml \
        --target_platform rk3588
 ```
 ## 详细部署文档
 - [Python部署](python)
 - [C++部署](cpp)
 ## 版本说明
 - 本版本文档和代码基于[InsightFace CommitID:babb9a5](https://github.com/deepinsight/insightface/commit/babb9a5) 编写
--- a/examples/vision/faceid/insightface/rknpu2/cpp/CMakeLists.txt
+++ b/examples/vision/faceid/insightface/rknpu2/cpp/CMakeLists.txt
@@ -0,0 +1,11 @@
 PROJECT(infer_demo C CXX)
 CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
 option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
 include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
 include_directories(${FASTDEPLOY_INCS})
 add_executable(infer_arcface_demo ${PROJECT_SOURCE_DIR}/infer_arcface.cc)
 target_link_libraries(infer_arcface_demo ${FASTDEPLOY_LIBS})
--- a/examples/vision/faceid/insightface/rknpu2/cpp/README.md
+++ b/examples/vision/faceid/insightface/rknpu2/cpp/README.md
@@ -0,0 +1,136 @@
 [English](README.md) | 简体中文
 # InsightFace C++部署示例
 FastDeploy支持在RKNPU上部署包括ArcFace\CosFace\VPL\Partial_FC在内的InsightFace系列模型。
 本目录下提供`infer_arcface.cc`快速完成InsighFace模型包括ArcFace在CPU/RKNPU加速部署的示例。
 在部署前，需确认以下两个步骤:
 1. 软硬件环境满足要求
 2. 根据开发环境，下载预编译部署库或者从头编译FastDeploy仓库
 以上步骤请参考[RK2代NPU部署库编译](../../../../../../docs/cn/build_and_install/rknpu2.md)实现
 在本目录执行如下命令即可完成编译测试
 ```bash
 mkdir build
 cd build
 # FastDeploy version need >=1.0.3
 wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
 tar xvf fastdeploy-linux-x64-x.x.x.tgz
 cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
 make -j
 # 下载官方转换好的ArcFace模型文件和测试图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r18.onnx
 wget https://bj.bcebos.com/paddlehub/fastdeploy/rknpu2/face_demo.zip
 unzip face_demo.zip
 # CPU推理
 ./infer_arcface_demo ms1mv3_arcface_r100.onnx face_0.jpg face_1.jpg face_2.jpg 0
 # RKNPU推理
 ./infer_arcface_demo ms1mv3_arcface_r100.onnx face_0.jpg face_1.jpg face_2.jpg 1
 ```
 运行完成可视化结果如下图所示
 <div width="700">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184321537-860bf857-0101-4e92-a74c-48e8658d838c.JPG">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184322004-a551e6e4-6f47-454e-95d6-f8ba2f47b516.JPG">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184321622-d9a494c3-72f3-47f1-97c5-8a2372de491f.JPG">
 </div>
 以上命令只适用于Linux或MacOS, Windows下SDK的使用方式请参考:  
 - [如何在Windows中使用FastDeploy C++ SDK](../../../../../docs/cn/faq/use_sdk_on_windows.md)
 ## InsightFace C++接口
 ### ArcFace类
 ```c++
 fastdeploy::vision::faceid::ArcFace(
        const string& model_file,
        const string& params_file = "",
        const RuntimeOption& runtime_option = RuntimeOption(),
        const ModelFormat& model_format = ModelFormat::ONNX)
 ```
 ArcFace模型加载和初始化，其中model_file为导出的ONNX模型格式。
 ### CosFace类
 ```c++
 fastdeploy::vision::faceid::CosFace(
        const string& model_file,
        const string& params_file = "",
        const RuntimeOption& runtime_option = RuntimeOption(),
        const ModelFormat& model_format = ModelFormat::ONNX)
 ```
 CosFace模型加载和初始化，其中model_file为导出的ONNX模型格式。
 ### PartialFC类
 ```c++
 fastdeploy::vision::faceid::PartialFC(
        const string& model_file,
        const string& params_file = "",
        const RuntimeOption& runtime_option = RuntimeOption(),
        const ModelFormat& model_format = ModelFormat::ONNX)
 ```
 PartialFC模型加载和初始化，其中model_file为导出的ONNX模型格式。
 ### VPL类
 ```c++
 fastdeploy::vision::faceid::VPL(
        const string& model_file,
        const string& params_file = "",
        const RuntimeOption& runtime_option = RuntimeOption(),
        const ModelFormat& model_format = ModelFormat::ONNX)
 ```
 VPL模型加载和初始化，其中model_file为导出的ONNX模型格式。
 **参数**
 > * **model_file**(str): 模型文件路径
 > * **params_file**(str): 参数文件路径，当模型格式为ONNX时，此参数传入空字符串即可
 > * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
 > * **model_format**(ModelFormat): 模型格式，默认为ONNX格式
 #### Predict函数
 > ```c++
 > ArcFace::Predict(const cv::Mat& im, FaceRecognitionResult* result)
 > ```
 >
 > 模型预测接口，输入图像直接输出检测结果。
 >
 > **参数**
 >
 > > * **im**: 输入图像，注意需为HWC，BGR格式
 > > * **result**: 检测结果，包括检测框，各个框的置信度, FaceRecognitionResult说明参考[视觉模型预测结果](../../../../../docs/api/vision_results/)
 ### 修改预处理以及后处理的参数
 预处理和后处理的参数的需要通过修改InsightFaceRecognitionPostprocessor，InsightFaceRecognitionPreprocessor的成员变量来进行修改。
 #### InsightFaceRecognitionPreprocessor成员变量(预处理参数)
 > > * **size**(vector&lt;int&gt;): 通过此参数修改预处理过程中resize的大小，包含两个整型元素，表示[width, height], 默认值为[112, 112],
      通过InsightFaceRecognitionPreprocessor::SetSize(std::vector<int>& size)来进行修改
 > > * **alpha**(vector&lt;float&gt;): 预处理归一化的alpha值，计算公式为`x'=x*alpha+beta`，alpha默认为[1. / 127.5, 1.f / 127.5, 1. / 127.5],
      通过InsightFaceRecognitionPreprocessor::SetAlpha(std::vector<float>& alpha)来进行修改
 > > * **beta**(vector&lt;float&gt;): 预处理归一化的beta值，计算公式为`x'=x*alpha+beta`，beta默认为[-1.f, -1.f, -1.f],
      通过InsightFaceRecognitionPreprocessor::SetBeta(std::vector<float>& beta)来进行修改
 #### InsightFaceRecognitionPostprocessor成员变量(后处理参数)
 > > * **l2_normalize**(bool): 输出人脸向量之前是否执行l2归一化，默认false,
      InsightFaceRecognitionPostprocessor::SetL2Normalize(bool& l2_normalize)来进行修改
 - [模型介绍](../../../)
 - [Python部署](../python)
 - [视觉模型预测结果](../../../../../../docs/api/vision_results/README.md)
 - [如何切换模型推理后端引擎](../../../../../../docs/cn/faq/how_to_change_backend.md)
--- a/examples/vision/faceid/insightface/rknpu2/cpp/infer_arcface.cc
+++ b/examples/vision/faceid/insightface/rknpu2/cpp/infer_arcface.cc
@@ -0,0 +1,123 @@
 // Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 #include "fastdeploy/vision.h"
 void CpuInfer(const std::string& model_file,
              const std::vector<std::string>& image_file) {
  auto model = fastdeploy::vision::faceid::ArcFace(model_file, "");
  cv::Mat face0 = cv::imread(image_file[0]);
  fastdeploy::vision::FaceRecognitionResult res0;
  if (!model.Predict(face0, &res0)) {
    std::cerr << "Prediction Failed." << std::endl;
  }
  cv::Mat face1 = cv::imread(image_file[1]);
  fastdeploy::vision::FaceRecognitionResult res1;
  if (!model.Predict(face1, &res1)) {
    std::cerr << "Prediction Failed." << std::endl;
  }
  cv::Mat face2 = cv::imread(image_file[2]);
  fastdeploy::vision::FaceRecognitionResult res2;
  if (!model.Predict(face2, &res2)) {
    std::cerr << "Prediction Failed." << std::endl;
    return;
  }
  std::cout << "Prediction Done!" << std::endl;
  std::cout << "--- [Face 0]:" << res0.Str();
  std::cout << "--- [Face 1]:" << res1.Str();
  std::cout << "--- [Face 2]:" << res2.Str();
  float cosine01 = fastdeploy::vision::utils::CosineSimilarity(
      res0.embedding, res1.embedding,
      model.GetPostprocessor().GetL2Normalize());
  float cosine02 = fastdeploy::vision::utils::CosineSimilarity(
      res0.embedding, res2.embedding,
      model.GetPostprocessor().GetL2Normalize());
  std::cout << "Detect Done! Cosine 01: " << cosine01
            << ", Cosine 02:" << cosine02 << std::endl;
 }
 void RKNPUInfer(const std::string& model_file,
                const std::vector<std::string>& image_file) {
  std::string params_file;
  auto option = fastdeploy::RuntimeOption();
  option.UseRKNPU2();
  auto format = fastdeploy::ModelFormat::RKNN;
  auto model = fastdeploy::vision::faceid::ArcFace(model_file, params_file,
                                                   option, format);
  model.GetPreprocessor().DisableNormalize();
  model.GetPreprocessor().DisablePermute();
  cv::Mat face0 = cv::imread(image_file[0]);
  fastdeploy::vision::FaceRecognitionResult res0;
  if (!model.Predict(face0, &res0)) {
    std::cerr << "Prediction Failed." << std::endl;
    return;
  }
  cv::Mat face1 = cv::imread(image_file[1]);
  fastdeploy::vision::FaceRecognitionResult res1;
  if (!model.Predict(face1, &res1)) {
    std::cerr << "Prediction Failed." << std::endl;
    return;
  }
  cv::Mat face2 = cv::imread(image_file[2]);
  fastdeploy::vision::FaceRecognitionResult res2;
  if (!model.Predict(face2, &res2)) {
    std::cerr << "Prediction Failed." << std::endl;
    return;
  }
  std::cout << "Prediction Done!" << std::endl;
  std::cout << "--- [Face 0]:" << res0.Str();
  std::cout << "--- [Face 1]:" << res1.Str();
  std::cout << "--- [Face 2]:" << res2.Str();
  float cosine01 = fastdeploy::vision::utils::CosineSimilarity(
      res0.embedding, res1.embedding,
      model.GetPostprocessor().GetL2Normalize());
  float cosine02 = fastdeploy::vision::utils::CosineSimilarity(
      res0.embedding, res2.embedding,
      model.GetPostprocessor().GetL2Normalize());
  std::cout << "Detect Done! Cosine 01: " << cosine01
            << ", Cosine 02:" << cosine02 << std::endl;
 }
 int main(int argc, char* argv[]) {
  if (argc < 6) {
    std::cout << "Usage: infer_demo path/to/model path/to/image run_option, "
                 "e.g ./infer_arcface_demo ms1mv3_arcface_r100.onnx "
                 "face_0.jpg face_1.jpg face_2.jpg 0"
              << std::endl;
    std::cout << "The data type of run_option is int, "
                 "0: run with cpu; 1: run with rknpu2."
              << std::endl;
    return -1;
  }
  std::vector<std::string> image_files = {argv[2], argv[3], argv[4]};
  if (std::atoi(argv[5]) == 0) {
    CpuInfer(argv[1], image_files);
  } else if (std::atoi(argv[5]) == 1) {
    RKNPUInfer(argv[1], image_files);
  }
  return 0;
 }
--- a/examples/vision/faceid/insightface/rknpu2/python/README_CN.md
+++ b/examples/vision/faceid/insightface/rknpu2/python/README_CN.md
@@ -0,0 +1,108 @@
 [English](README.md) | 简体中文
 # InsightFace Python部署示例
 FastDeploy支持在RKNPU上部署包括ArcFace\CosFace\VPL\Partial_FC在内的InsightFace系列模型。
 本目录下提供`infer_arcface.py`快速完成InsighFace模型包括ArcFace在CPU/RKNPU加速部署的示例。
 在部署前，需确认以下步骤:
 - 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/cn/build_and_install/rknpu2.md)
 ```bash
 #下载部署示例代码
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd examples/vision/faceid/insightface/python/
 #下载ArcFace模型文件和测试图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ms1mv3_arcface_r100.onnx
 wget https://bj.bcebos.com/paddlehub/fastdeploy/rknpu2/face_demo.zip
 unzip face_demo.zip
 # CPU推理
 python infer_arcface.py --model ms1mv3_arcface_r100.onnx \
                        --face face_0.jpg \
                        --face_positive face_1.jpg \
                        --face_negative face_2.jpg \
                        --device cpu
 # GPU推理
 python infer_arcface.py --model ms1mv3_arcface_r100.onnx \
                        --face face_0.jpg \
                        --face_positive face_1.jpg \
                        --face_negative face_2.jpg \
                        --device gpu
 ```
 运行完成可视化结果如下图所示
 <div width="700">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184321537-860bf857-0101-4e92-a74c-48e8658d838c.JPG">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184322004-a551e6e4-6f47-454e-95d6-f8ba2f47b516.JPG">
 <img width="220" float="left" src="https://user-images.githubusercontent.com/67993288/184321622-d9a494c3-72f3-47f1-97c5-8a2372de491f.JPG">
 </div>
 ```bash
 Prediction Done!
 --- [Face 0]:FaceRecognitionResult: [Dim(512), Min(-2.309220), Max(2.372197), Mean(0.016987)]
 --- [Face 1]:FaceRecognitionResult: [Dim(512), Min(-2.288258), Max(1.995104), Mean(-0.003400)]
 --- [Face 2]:FaceRecognitionResult: [Dim(512), Min(-3.243411), Max(3.875866), Mean(-0.030682)]
 Detect Done! Cosine 01: 0.814385, Cosine 02:-0.059388
 ```
 ## InsightFace Python接口
 ```python
 fastdeploy.vision.faceid.ArcFace(model_file, params_file=None, runtime_option=None, model_format=ModelFormat.ONNX)
 fastdeploy.vision.faceid.CosFace(model_file, params_file=None, runtime_option=None, model_format=ModelFormat.ONNX)
 fastdeploy.vision.faceid.PartialFC(model_file, params_file=None, runtime_option=None, model_format=ModelFormat.ONNX)
 fastdeploy.vision.faceid.VPL(model_file, params_file=None, runtime_option=None, model_format=ModelFormat.ONNX)
 ```
 ArcFace模型加载和初始化，其中model_file为导出的ONNX模型格式
 **参数**
 > * **model_file**(str): 模型文件路径
 > * **params_file**(str): 参数文件路径，当模型格式为ONNX格式时，此参数无需设定
 > * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
 > * **model_format**(ModelFormat): 模型格式，默认为ONNX
 ### predict函数
 > ```python
 > ArcFace.predict(image_data)
 > ```
 >
 > 模型预测结口，输入图像直接输出检测结果。
 >
 > **参数**
 >
 > > * **image_data**(np.ndarray): 输入数据，注意需为HWC，BGR格式
 > **返回**
 >
 > > 返回`fastdeploy.vision.FaceRecognitionResult`结构体，结构体说明参考文档[视觉模型预测结果](../../../../../docs/api/vision_results/)
 ### 类成员属性
 #### 预处理参数
 用户可按照自己的实际需求，修改下列预处理参数，从而影响最终的推理和部署效果
 #### AdaFacePreprocessor的成员变量
 以下变量为AdaFacePreprocessor的成员变量
 > > * **size**(list[int]): 通过此参数修改预处理过程中resize的大小，包含两个整型元素，表示[width, height], 默认值为[112, 112]
 > > * **alpha**(list[float]): 预处理归一化的alpha值，计算公式为`x'=x*alpha+beta`，alpha默认为[1. / 127.5, 1.f / 127.5, 1. / 127.5]
 > > * **beta**(list[float]): 预处理归一化的beta值，计算公式为`x'=x*alpha+beta`，beta默认为[-1.f, -1.f, -1.f]
 #### AdaFacePostprocessor的成员变量
 以下变量为AdaFacePostprocessor的成员变量
 > > * **l2_normalize**(bool): 输出人脸向量之前是否执行l2归一化，默认False
 ## 其它文档
 - [InsightFace 模型介绍](..)
 - [InsightFace C++部署](../cpp)
 - [模型预测结果说明](../../../../../docs/api/vision_results/)
 - [如何切换模型推理后端引擎](../../../../../docs/cn/faq/how_to_change_backend.md)
--- a/examples/vision/faceid/insightface/rknpu2/python/infer_arcface.py
+++ b/examples/vision/faceid/insightface/rknpu2/python/infer_arcface.py
@@ -0,0 +1,76 @@
 import fastdeploy as fd
 import cv2
 import numpy as np
 def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
    mul_a = np.linalg.norm(a, ord=2)
    mul_b = np.linalg.norm(b, ord=2)
    mul_ab = np.dot(a, b)
    return mul_ab / (mul_a * mul_b)
 def parse_arguments():
    import argparse
    import ast
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model", required=True, help="Path of insgihtface onnx model.")
    parser.add_argument(
        "--face", required=True, help="Path of test face image file.")
    parser.add_argument(
        "--face_positive",
        required=True,
        help="Path of test face_positive image file.")
    parser.add_argument(
        "--face_negative",
        required=True,
        help="Path of test face_negative image file.")
    parser.add_argument(
        "--device",
        type=str,
        default='cpu',
        help="Type of inference device, support 'cpu' or 'gpu'.")
    return parser.parse_args()
 def build_option(args):
    option = fd.RuntimeOption()
    if args.device.lower() == "npu":
        option.use_rknpu2()
    return option
 args = parse_arguments()
 runtime_option = fd.RuntimeOption()
 model = fd.vision.faceid.ArcFace(args.model, runtime_option=runtime_option)
 if args.device.lower() == "npu":
    runtime_option.use_rknpu2()
    model.preprocessor.disable_normalize()
    model.preprocessor.disable_permute()
 face0 = cv2.imread(args.face)
 face1 = cv2.imread(args.face_positive)
 face2 = cv2.imread(args.face_negative)
 result0 = model.predict(face0)
 result1 = model.predict(face1)
 result2 = model.predict(face2)
 embedding0 = result0.embedding
 embedding1 = result1.embedding
 embedding2 = result2.embedding
 cosine01 = cosine_similarity(embedding0, embedding1)
 cosine02 = cosine_similarity(embedding0, embedding2)
 print(result0, end="")
 print(result1, end="")
 print(result2, end="")
 print("Cosine 01: ", cosine01)
 print("Cosine 02: ", cosine02)
 print(model.runtime_option)
--- a/fastdeploy/vision/faceid/contrib/insightface/base.cc
+++ b/fastdeploy/vision/faceid/contrib/insightface/base.cc
@@ -22,7 +22,6 @@ InsightFaceRecognitionBase::InsightFaceRecognitionBase(
    const std::string& model_file, const std::string& params_file,
    const fastdeploy::RuntimeOption& custom_option,
    const fastdeploy::ModelFormat& model_format) {
  if (model_format == ModelFormat::ONNX) {
    valid_cpu_backends = {Backend::ORT};
    valid_gpu_backends = {Backend::ORT, Backend::TRT};
@@ -31,6 +30,7 @@ InsightFaceRecognitionBase::InsightFaceRecognitionBase(
    valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
    valid_kunlunxin_backends = {Backend::LITE};
  }
  valid_rknpu_backends = {Backend::RKNPU2};
  runtime_option = custom_option;
  runtime_option.model_format = model_format;
  runtime_option.model_file = model_file;
@@ -55,8 +55,9 @@ bool InsightFaceRecognitionBase::Predict(const cv::Mat& im,
  return true;
 }
-bool InsightFaceRecognitionBase::BatchPredict(const std::vector<cv::Mat>& images,
+bool InsightFaceRecognitionBase::BatchPredict(
-                                              std::vector<FaceRecognitionResult>* results){
+    const std::vector<cv::Mat>& images,
    std::vector<FaceRecognitionResult>* results) {
  std::vector<FDMat> fd_images = WrapMat(images);
  FDASSERT(images.size() == 1, "Only support batch = 1 now.");
  if (!preprocessor_.Run(&fd_images, &reused_input_tensors_)) {
@@ -70,8 +71,9 @@ bool InsightFaceRecognitionBase::BatchPredict(const std::vector<cv::Mat>& images
    return false;
  }
-  if (!postprocessor_.Run(reused_output_tensors_, results)){
+  if (!postprocessor_.Run(reused_output_tensors_, results)) {
-    FDERROR << "Failed to postprocess the inference results by runtime." << std::endl;
+    FDERROR << "Failed to postprocess the inference results by runtime."
            << std::endl;
    return false;
  }
  return true;
--- a/fastdeploy/vision/faceid/contrib/insightface/insightface_pybind.cc
+++ b/fastdeploy/vision/faceid/contrib/insightface/insightface_pybind.cc
@@ -19,7 +19,8 @@ void BindInsightFace(pybind11::module& m) {
  pybind11::class_<vision::faceid::InsightFaceRecognitionPreprocessor>(
      m, "InsightFaceRecognitionPreprocessor")
      .def(pybind11::init())
-      .def("run", [](vision::faceid::InsightFaceRecognitionPreprocessor& self,
+      .def("run",
           [](vision::faceid::InsightFaceRecognitionPreprocessor& self,
              std::vector<pybind11::array>& im_list) {
             std::vector<vision::FDMat> images;
             for (size_t i = 0; i < im_list.size(); ++i) {
@@ -27,54 +28,78 @@ void BindInsightFace(pybind11::module& m) {
             }
             std::vector<FDTensor> outputs;
             if (!self.Run(&images, &outputs)) {
-          throw std::runtime_error("Failed to preprocess the input data in InsightFaceRecognitionPreprocessor.");
+               throw std::runtime_error(
                   "Failed to preprocess the input data in "
                   "InsightFaceRecognitionPreprocessor.");
             }
             for (size_t i = 0; i < outputs.size(); ++i) {
               outputs[i].StopSharing();
             }
             return outputs;
           })
-      .def_property("permute", &vision::faceid::InsightFaceRecognitionPreprocessor::GetPermute,
+      .def(
-                    &vision::faceid::InsightFaceRecognitionPreprocessor::SetPermute)
+          "disable_normalize",
-      .def_property("alpha", &vision::faceid::InsightFaceRecognitionPreprocessor::GetAlpha,
+          &vision::faceid::InsightFaceRecognitionPreprocessor::DisableNormalize)
      .def("disable_permute",
           &vision::faceid::InsightFaceRecognitionPreprocessor::DisablePermute)
      .def_property(
          "alpha",
          &vision::faceid::InsightFaceRecognitionPreprocessor::GetAlpha,
          &vision::faceid::InsightFaceRecognitionPreprocessor::SetAlpha)
-      .def_property("beta", &vision::faceid::InsightFaceRecognitionPreprocessor::GetBeta,
+      .def_property(
          "beta", &vision::faceid::InsightFaceRecognitionPreprocessor::GetBeta,
          &vision::faceid::InsightFaceRecognitionPreprocessor::SetBeta)
-      .def_property("size", &vision::faceid::InsightFaceRecognitionPreprocessor::GetSize,
+      .def_property(
          "size", &vision::faceid::InsightFaceRecognitionPreprocessor::GetSize,
          &vision::faceid::InsightFaceRecognitionPreprocessor::SetSize);
  pybind11::class_<vision::faceid::InsightFaceRecognitionPostprocessor>(
      m, "InsightFaceRecognitionPostprocessor")
      .def(pybind11::init())
-      .def("run", [](vision::faceid::InsightFaceRecognitionPostprocessor& self, std::vector<FDTensor>& inputs) {
+      .def("run",
           [](vision::faceid::InsightFaceRecognitionPostprocessor& self,
              std::vector<FDTensor>& inputs) {
             std::vector<vision::FaceRecognitionResult> results;
             if (!self.Run(inputs, &results)) {
-          throw std::runtime_error("Failed to postprocess the runtime result in InsightFaceRecognitionPostprocessor.");
+               throw std::runtime_error(
                   "Failed to postprocess the runtime result in "
                   "InsightFaceRecognitionPostprocessor.");
             }
             return results;
           })
-      .def("run", [](vision::faceid::InsightFaceRecognitionPostprocessor& self, std::vector<pybind11::array>& input_array) {
+      .def("run",
           [](vision::faceid::InsightFaceRecognitionPostprocessor& self,
              std::vector<pybind11::array>& input_array) {
             std::vector<vision::FaceRecognitionResult> results;
             std::vector<FDTensor> inputs;
             PyArrayToTensorList(input_array, &inputs, /*share_buffer=*/true);
             if (!self.Run(inputs, &results)) {
-          throw std::runtime_error("Failed to postprocess the runtime result in InsightFaceRecognitionPostprocessor.");
+               throw std::runtime_error(
                   "Failed to postprocess the runtime result in "
                   "InsightFaceRecognitionPostprocessor.");
             }
             return results;
           })
-      .def_property("l2_normalize", &vision::faceid::InsightFaceRecognitionPostprocessor::GetL2Normalize,
+      .def_property(
          "l2_normalize",
          &vision::faceid::InsightFaceRecognitionPostprocessor::GetL2Normalize,
          &vision::faceid::InsightFaceRecognitionPostprocessor::SetL2Normalize);
  pybind11::class_<vision::faceid::InsightFaceRecognitionBase, FastDeployModel>(
      m, "InsightFaceRecognitionBase")
-      .def(pybind11::init<std::string, std::string, RuntimeOption, ModelFormat>())
+      .def(pybind11::init<std::string, std::string, RuntimeOption,
-      .def("predict", [](vision::faceid::InsightFaceRecognitionBase& self, pybind11::array& data) {
+                          ModelFormat>())
      .def("predict",
           [](vision::faceid::InsightFaceRecognitionBase& self,
              pybind11::array& data) {
             cv::Mat im = PyArrayToCvMat(data);
             vision::FaceRecognitionResult result;
             self.Predict(im, &result);
             return result;
           })
-      .def("batch_predict", [](vision::faceid::InsightFaceRecognitionBase& self, std::vector<pybind11::array>& data) {
+      .def("batch_predict",
           [](vision::faceid::InsightFaceRecognitionBase& self,
              std::vector<pybind11::array>& data) {
             std::vector<cv::Mat> images;
             for (size_t i = 0; i < data.size(); ++i) {
               images.push_back(PyArrayToCvMat(data[i]));
@@ -83,19 +108,31 @@ void BindInsightFace(pybind11::module& m) {
             self.BatchPredict(images, &results);
             return results;
           })
-      .def_property_readonly("preprocessor", &vision::faceid::InsightFaceRecognitionBase::GetPreprocessor)
+      .def_property_readonly(
-      .def_property_readonly("postprocessor", &vision::faceid::InsightFaceRecognitionBase::GetPostprocessor);
+          "preprocessor",
          &vision::faceid::InsightFaceRecognitionBase::GetPreprocessor)
      .def_property_readonly(
          "postprocessor",
          &vision::faceid::InsightFaceRecognitionBase::GetPostprocessor);
-  pybind11::class_<vision::faceid::ArcFace, vision::faceid::InsightFaceRecognitionBase>(m, "ArcFace")
+  pybind11::class_<vision::faceid::ArcFace,
-      .def(pybind11::init<std::string, std::string, RuntimeOption,ModelFormat>());
+                   vision::faceid::InsightFaceRecognitionBase>(m, "ArcFace")
      .def(pybind11::init<std::string, std::string, RuntimeOption,
                          ModelFormat>());
-  pybind11::class_<vision::faceid::CosFace, vision::faceid::InsightFaceRecognitionBase>(m, "CosFace")
+  pybind11::class_<vision::faceid::CosFace,
-      .def(pybind11::init<std::string, std::string, RuntimeOption,ModelFormat>());
+                   vision::faceid::InsightFaceRecognitionBase>(m, "CosFace")
      .def(pybind11::init<std::string, std::string, RuntimeOption,
                          ModelFormat>());
-  pybind11::class_<vision::faceid::PartialFC, vision::faceid::InsightFaceRecognitionBase>(m, "PartialFC")
+  pybind11::class_<vision::faceid::PartialFC,
-      .def(pybind11::init<std::string, std::string, RuntimeOption,ModelFormat>());
+                   vision::faceid::InsightFaceRecognitionBase>(m, "PartialFC")
      .def(pybind11::init<std::string, std::string, RuntimeOption,
                          ModelFormat>());
-  pybind11::class_<vision::faceid::VPL, vision::faceid::InsightFaceRecognitionBase>(m, "VPL")
+  pybind11::class_<vision::faceid::VPL,
-      .def(pybind11::init<std::string, std::string, RuntimeOption,ModelFormat>());
+                   vision::faceid::InsightFaceRecognitionBase>(m, "VPL")
      .def(pybind11::init<std::string, std::string, RuntimeOption,
                          ModelFormat>());
 }
 }  // namespace fastdeploy
--- a/fastdeploy/vision/faceid/contrib/insightface/model.h
+++ b/fastdeploy/vision/faceid/contrib/insightface/model.h
@@ -35,6 +35,8 @@ class FASTDEPLOY_DECL ArcFace : public InsightFaceRecognitionBase {
    if (model_format == ModelFormat::ONNX) {
      valid_cpu_backends = {Backend::ORT};
      valid_gpu_backends = {Backend::ORT, Backend::TRT};
    } else if (model_format == ModelFormat::RKNN) {
      valid_rknpu_backends = {Backend::RKNPU2};
    } else {
      valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::LITE};
      valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
@@ -63,6 +65,8 @@ class FASTDEPLOY_DECL CosFace : public InsightFaceRecognitionBase {
    if (model_format == ModelFormat::ONNX) {
      valid_cpu_backends = {Backend::ORT};
      valid_gpu_backends = {Backend::ORT, Backend::TRT};
    } else if (model_format == ModelFormat::RKNN) {
      valid_rknpu_backends = {Backend::RKNPU2};
    } else {
      valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::LITE};
      valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
@@ -90,6 +94,8 @@ class FASTDEPLOY_DECL PartialFC : public InsightFaceRecognitionBase {
    if (model_format == ModelFormat::ONNX) {
      valid_cpu_backends = {Backend::ORT};
      valid_gpu_backends = {Backend::ORT, Backend::TRT};
    } else if (model_format == ModelFormat::RKNN) {
      valid_rknpu_backends = {Backend::RKNPU2};
    } else {
      valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::LITE};
      valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
@@ -117,6 +123,8 @@ class FASTDEPLOY_DECL VPL : public InsightFaceRecognitionBase {
    if (model_format == ModelFormat::ONNX) {
      valid_cpu_backends = {Backend::ORT};
      valid_gpu_backends = {Backend::ORT, Backend::TRT};
    } else if (model_format == ModelFormat::RKNN) {
      valid_rknpu_backends = {Backend::RKNPU2};
    } else {
      valid_cpu_backends = {Backend::PDINFER, Backend::ORT, Backend::LITE};
      valid_gpu_backends = {Backend::PDINFER, Backend::ORT, Backend::TRT};
--- a/fastdeploy/vision/faceid/contrib/insightface/preprocessor.cc
+++ b/fastdeploy/vision/faceid/contrib/insightface/preprocessor.cc
@@ -23,11 +23,10 @@ InsightFaceRecognitionPreprocessor::InsightFaceRecognitionPreprocessor() {
  size_ = {112, 112};
  alpha_ = {1.f / 127.5f, 1.f / 127.5f, 1.f / 127.5f};
  beta_ = {-1.f, -1.f, -1.f};  // RGB
  permute_ = true;
 }
-bool InsightFaceRecognitionPreprocessor::Preprocess(FDMat * mat, FDTensor* output) {
+bool InsightFaceRecognitionPreprocessor::Preprocess(FDMat* mat,
-
+                                                    FDTensor* output) {
  // face recognition model's preprocess steps in insightface
  // reference: insightface/recognition/arcface_torch/inference.py
  // 1. Resize
@@ -39,13 +38,16 @@ bool InsightFaceRecognitionPreprocessor::Preprocess(FDMat * mat, FDTensor* outpu
  if (resize_h != mat->Height() || resize_w != mat->Width()) {
    Resize::Run(mat, resize_w, resize_h);
  }
-  if (permute_) {
+
  if (!disable_permute_) {
    BGR2RGB::Run(mat);
  }
  if (!disable_normalize_) {
    Convert::Run(mat, alpha_, beta_);
    HWC2CHW::Run(mat);
    Cast::Run(mat, "float");
  }
  mat->ShareWithTensor(output);
  output->ExpandDim(0);  // reshape to n, h, w, c
@@ -55,7 +57,8 @@ bool InsightFaceRecognitionPreprocessor::Preprocess(FDMat * mat, FDTensor* outpu
 bool InsightFaceRecognitionPreprocessor::Run(std::vector<FDMat>* images,
                                             std::vector<FDTensor>* outputs) {
  if (images->empty()) {
-    FDERROR << "The size of input images should be greater than 0." << std::endl;
+    FDERROR << "The size of input images should be greater than 0."
            << std::endl;
    return false;
  }
  FDASSERT(images->size() == 1, "Only support batch = 1 now.");
--- a/fastdeploy/vision/faceid/contrib/insightface/preprocessor.h
+++ b/fastdeploy/vision/faceid/contrib/insightface/preprocessor.h
@@ -54,10 +54,11 @@ class FASTDEPLOY_DECL InsightFaceRecognitionPreprocessor {
  /// Set beta.
  void SetBeta(std::vector<float>& beta) { beta_ = beta; }
-  bool GetPermute() { return permute_; }
+  /// This function will disable normalize and hwc2chw in preprocessing step.
  void DisableNormalize() { disable_normalize_ = true; }
-  /// Set permute.
+  /// This function will disable hwc2chw in preprocessing step.
-  void SetPermute(bool permute) { permute_ = permute; }
+  void DisablePermute() { disable_permute_ = true; }
 protected:
  bool Preprocess(FDMat* mat, FDTensor* output);
@@ -70,9 +71,11 @@ class FASTDEPLOY_DECL InsightFaceRecognitionPreprocessor {
  // Argument for image preprocessing step, beta values for normalization,
  // default beta = {-1.f, -1.f, -1.f}
  std::vector<float> beta_;
  // for recording the switch of normalize
  bool disable_normalize_ = false;
  // Argument for image preprocessing step, whether to swap the B and R channel,
  // such as BGR->RGB, default true.
-  bool permute_;
+  bool disable_permute_ = false;
 };
 }  // namespace faceid
--- a/python/fastdeploy/vision/faceid/contrib/insightface/init.py
+++ b/python/fastdeploy/vision/faceid/contrib/insightface/init.py
@@ -56,13 +56,17 @@ class InsightFaceRecognitionPreprocessor:
        """
        return self._preprocessor.beta
-    @property
+    def disable_normalize(self):
    def permute(self):
        """
-        Argument for image preprocessing step, whether to swap the B and R channel,
+        This function will disable normalize in preprocessing step.
        such as BGR->RGB, default true.
        """
-        return self._preprocessor.permute
+        self._preprocessor.disable_normalize()
    def disable_permute(self):
        """
        This function will disable hwc2chw in preprocessing step.
        """
        self._preprocessor.disable_permute()
 class InsightFaceRecognitionPostprocessor:
--- a/tools/rknpu2/config/arcface_quantized.yaml
+++ b/tools/rknpu2/config/arcface_quantized.yaml
@@ -0,0 +1,15 @@
 mean:
  -
    - 127.5
    - 127.5
    - 127.5
 std:
  -
    - 127.5
    - 127.5
    - 127.5
 model_path: ./ms1mv3_arcface_r18/ms1mv3_arcface_r18.onnx
 outputs_nodes:
 do_quantization: True
 dataset: "./ms1mv3_arcface_r18/datasets.txt"
 output_folder: "./ms1mv3_arcface_r18"
--- a/tools/rknpu2/config/arcface_unquantized.yaml
+++ b/tools/rknpu2/config/arcface_unquantized.yaml
@@ -0,0 +1,15 @@
 mean:
  -
    - 127.5
    - 127.5
    - 127.5
 std:
  -
    - 127.5
    - 127.5
    - 127.5
 model_path: ./ms1mv3_arcface_r18/ms1mv3_arcface_r18.onnx
 outputs_nodes:
 do_quantization: False
 dataset: "./ms1mv3_arcface_r18/datasets.txt"
 output_folder: "./ms1mv3_arcface_r18"