Bump up to version 0.3.0 (#371)

* Update VERSION_NUMBER * Update paddle_inference.cmake * Delete docs directory * release new docs * update version number * add vision result doc * update version * fix dead link * fix vision * fix dead link * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_EN.md Co-authored-by: leiqing <54695910+leiqing1@users.noreply.github.com>
2025-10-05 16:48:03 +08:00 · 2022-10-15 22:01:27 +08:00
parent bac17283ee
commit 3ff562aa77
174 changed files with 322 additions and 6775 deletions
--- a/README_CN.md
+++ b/README_CN.md
@@ -28,7 +28,15 @@
 ## 近期更新
- 🔥 **2022.8.18：发布FastDeploy [release/v0.2.0](https://github.com/PaddlePaddle/FastDeploy/releases/tag/release%2F0.2.0)** <br>
+- 🔥 **2022.10.15：Release FastDeploy [release v0.3.0](https://github.com/PaddlePaddle/FastDeploy/tree/release%2F0.3.0)** <br>
  - **New server-side deployment upgrade:更快的推理性能，一键量化，更多的视觉和NLP模型**
       - 集成 OpenVINO 推理引擎，并且保证了使用 OpenVINO 与 使用 TensorRT、ONNX Runtime、 Paddle Inference一致的开发体验；
       - 提供[一键模型量化工具](tools/quantization)，支持YOLOv7、YOLOv6、YOLOv5等视觉模型，在CPU和GPU推理速度可提升1.5～2倍；
       - 新增加 PP-OCRv3, PP-OCRv2, PP-Matting, PP-HumanMatting, ModNet 等视觉模型并提供[端到端部署示例](examples/vision)；
       - 新增加NLP信息抽取模型 UIE 并提供[端到端部署示例](examples/text/uie).
       - 
 - 🔥 **2022.8.18：发布FastDeploy [release/v0.2.0](https://github.com/PaddlePaddle/FastDeploy/tree/release%2F0.2.0)** <br>
    - **服务端部署全新升级：更快的推理性能，更多的视觉模型支持**  
        - 发布基于x86 CPU、NVIDIA GPU的高性能推理引擎SDK，推理速度大幅提升
        - 集成Paddle Inference、ONNX Runtime、TensorRT等推理引擎并提供统一的部署体验
--- a/README_EN.md
+++ b/README_EN.md
@@ -28,17 +28,24 @@ English | [简体中文](README_CN.md)
 | **Face Alignment**                                                                                                                             | **3D Object Detection**                                                                                                                        | **Face Editing**                                                                                                                                 | **Image Animation**                                                                                                                            |
 | <img src='https://user-images.githubusercontent.com/54695910/188059460-9845e717-c30a-4252-bd80-b7f6d4cf30cb.png' height="126px" width="190px"> | <img src='https://user-images.githubusercontent.com/54695910/188270227-1a4671b3-0123-46ab-8d0f-0e4132ae8ec0.gif' height="126px" width="190px"> | <img src='https://user-images.githubusercontent.com/54695910/188054663-b0c9c037-6d12-4e90-a7e4-e9abf4cf9b97.gif' height="126px" width="126px">   | <img src='https://user-images.githubusercontent.com/54695910/188056800-2190e05e-ad1f-40ef-bf71-df24c3407b2d.gif' height="126px" width="190px"> |
-## Updates
+## 📣 Recent Updates
- 🔥 **2022.8.18：Release FastDeploy [release/v0.2.0](https://github.com/PaddlePaddle/FastDeploy/releases/tag/release%2F0.2.0)** <br>
+- 🔥 **2022.10.15：Release FastDeploy [release v0.3.0](https://github.com/PaddlePaddle/FastDeploy/tree/release/0.3.0)** <br>
-  - **New server-side deployment upgrade: faster inference performance, support more vision model**
+  - **New server-side deployment upgrade: support more CV model and NLP model**
       - Integrate OpenVINO and provide a seamless deployment experience with other inference engines include TensorRT、ONNX Runtime、Paddle Inference；
       - Support [one-click model quantization](tools/quantization) to improve model inference speed by 1.5 to 2 times on CPU & GPU platform. The supported quantized model are YOLOv7, YOLOv6, YOLOv5, etc. 
       - New CV models include PP-OCRv3, PP-OCRv2, PP-TinyPose, PP-Matting, etc. and provides [end-to-end deployment demos](examples/vision/detection/)
       - New information extraction model is UIE, and provides [end-to-end deployment demos](examples/text/uie).
 - 🔥 **2022.8.18：Release FastDeploy [release v0.2.0](https://github.com/PaddlePaddle/FastDeploy/tree/release%2F0.2.0)** <br>
  - **New server-side deployment upgrade: faster inference performance, support more CV model**
    - Release high-performance inference engine SDK based on x86 CPUs and NVIDIA GPUs, with significant increase in inference speed
-    - Integrate Paddle Inference, ONNXRuntime, TensorRT and other inference engines and provide a seamless deployment experience
+    - Integrate Paddle Inference, ONNX Runtime, TensorRT and other inference engines and provide a seamless deployment experience
-    - Supports full range of object detection models such as YOLOv7, YOLOv6, YOLOv5, PP-YOLOE and provides [End-To-End Deployment Demos](examples/vision/detection/)
+    - Supports full range of object detection models such as YOLOv7, YOLOv6, YOLOv5, PP-YOLOE and provides [end-to-end deployment demos](examples/vision/detection/)
-    - Support over 40 key models and [Demo Examples](examples/vision/) including face detection, face recognition, real-time portrait matting, image segmentation.
+    - Support over 40 key models and [demo examples](examples/vision/) including face detection, face recognition, real-time portrait matting, image segmentation.
    - Support deployment in both Python and C++
  - **Supports Rockchip, Amlogic, NXP and other NPU chip deployment capabilities on edge device deployment**
-    - Release Lightweight Object Detection [Picodet-NPU Deployment Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/linux/picodet_detection), providing the full quantized inference capability for INT8.
+    - Release Lightweight Object Detection [Picodet-NPU deployment demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/linux/picodet_detection), providing the full quantized inference capability for INT8.
 ## Contents
@@ -71,7 +78,7 @@ English | [简体中文](README_CN.md)
 - python >= 3.6
 - OS: Linux x86_64/macOS/Windows 10
-##### Install Library with GPU Support
+##### Install Fastdeploy SDK with CPU&GPU support
 ```bash
 pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
@@ -83,7 +90,7 @@ pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdep
 conda config --add channels conda-forge && conda install cudatoolkit=11.2 cudnn=8.2
 ```
-##### Install CPU-only Library
+##### Install Fastdeploy SDK with only CPU support
 ```bash
 pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
--- a/2
+++ b/2
@@ -1 +1 @@
-0.0.0
+0.3.0
--- a/cmake/paddle_inference.cmake
+++ b/cmake/paddle_inference.cmake
@@ -48,7 +48,7 @@ endif(WIN32)
 set(PADDLEINFERENCE_URL_BASE "https://bj.bcebos.com/fastdeploy/third_libs/")
-set(PADDLEINFERENCE_VERSION "2.4-dev")
+set(PADDLEINFERENCE_VERSION "2.4-dev1")
 if(WIN32)
  if (WITH_GPU)
    set(PADDLEINFERENCE_FILE "paddle_inference-win-x64-gpu-${PADDLEINFERENCE_VERSION}.zip")
--- a/.new_docs/README.md
+++ b/.new_docs/README.md
--- a/.new_docs/README_CN.md
+++ b/.new_docs/README_CN.md
--- a/.new_docs/README_EN.md
+++ b/.new_docs/README_EN.md
--- a/docs/api/function.md
+++ b/docs/api/function.md
@@ -1,278 +0,0 @@
 # FDTensor C++ 张量化函数
 FDTensor是FastDeploy在C++层表示张量的结构体。该结构体主要用于管理推理部署时模型的输入输出数据，支持在不同的Runtime后端中使用。在基于C++的推理部署应用开发过程中，我们往往需要对输入输出的数据进行一些数据处理，用以得到模型的实际输入或者应用的实际输出。这种数据预处理的逻辑可以使用原生的C++标准库来实现，但开发难度会比较大，如对3维Tensor的第2维求最大值。针对这个问题，FastDeploy基于FDTensor开发了一套C++张量化函数，用于降低FastDeploy用户的开发成本，提高开发效率。目前主要分为三类函数：Reduce类函数，Manipulate类函数，Math类函数以及Elementwise类函数。
 ## Reduce类函数
 目前FastDeploy支持7种Reduce类函数：Max, Min, Sum, All, Any, Mean, Prod。
 ### Max
 #### 函数签名
 ```c++
 /** Excute the maximum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Max(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the max value for axis 0 of `inputs`
 // The output result would be [[7, 4, 5]].
 Max(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Min
 #### 函数签名
 ```c++
 /** Excute the minimum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Min(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the min value for axis 0 of `inputs`
 // The output result would be [[2, 1, 3]].
 Min(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Sum
 #### 函数签名
 ```c++
 /** Excute the sum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Sum(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the sum value for axis 0 of `inputs`
 // The output result would be [[9, 5, 8]].
 Sum(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Mean
 #### 函数签名
 ```c++
 /** Excute the mean operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Mean(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the mean value for axis 0 of `inputs`
 // The output result would be [[4, 2, 4]].
 Mean(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Prod
 #### 函数签名
 ```c++
 /** Excute the product operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Prod(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the product value for axis 0 of `inputs`
 // The output result would be [[14, 4, 15]].
 Prod(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Any
 #### 函数签名
 ```c++
 /** Excute the any operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Any(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
 input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
 // Calculate the any value for axis 0 of `inputs`
 // The output result would be [[true, false, true]].
 Any(input, &output, {0}, /* keep_dim = */true);
 ```
 ### All
 #### 函数签名
 ```c++
 /** Excute the all operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void All(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
 input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
 // Calculate the all value for axis 0 of `inputs`
 // The output result would be [[false, false, true]].
 All(input, &output, {0}, /* keep_dim = */true);
 ```
 ## Manipulate类函数
 目前FastDeploy支持1种Manipulate类函数：Transpose。
 ### Transpose
 #### 函数签名
 ```c++
 /** Excute the transpose operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which the input tensor will transpose.
 */
 void Transpose(const FDTensor& x, FDTensor* out,
               const std::vector<int64_t>& dims);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 std::vector<float> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
 // Transpose the input tensor with axis {1, 0}.
 // The output result would be [[2, 7], [4, 1], [3, 5]]
 Transpose(input, &output, {1, 0});
 ```
 ## Math类函数
 目前FastDeploy支持1种Math类函数：Softmax。
 ### Softmax
 #### 函数签名
 ```c++
 /** Excute the softmax operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param axis The axis to be computed softmax value.
 */
 void Softmax(const FDTensor& x, FDTensor* out, int axis = -1);
 ```
 #### 使用示例
 ```c++
 FDTensor input, output;
 CheckShape check_shape;
 CheckData check_data;
 std::vector<float> inputs = {1, 2, 3, 4, 5, 6};
 input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
 // Transpose the input tensor with axis {1, 0}.
 // The output result would be
 // [[0.04742587, 0.04742587, 0.04742587],
 //  [0.95257413, 0.95257413, 0.95257413]]
 Softmax(input, &output, 0);
 ```
 ## Elementwise类函数
 正在开发中，敬请关注······
--- a/docs/api/runtime/runtime.md
+++ b/docs/api/runtime/runtime.md
@@ -1,84 +0,0 @@
 # Runtime
 在配置`RuntimeOption`后，即可基于不同后端在不同硬件上创建Runtime用于模型推理。
 ## Python 类
 ```
 class Runtime(runtime_option)
 ```
 **参数**
 > * **runtime_option**(fastdeploy.RuntimeOption): 配置好的RuntimeOption类实例
 ### 成员函数
 ```
 infer(data)
 ```
 根据输入数据进行模型推理
 **参数**
 > * **data**(dict({str: np.ndarray}): 输入数据，字典dict类型，key为输入名，value为np.ndarray数据类型
 **返回值**
 返回list, list的长度与原始模型输出个数一致；list中元素为np.ndarray类型
 ```
 num_inputs()
 ```
 返回模型的输入个数
 ```
 num_outputs()
 ```
 返回模型的输出个数
 ## C++ 类
 ```
 class Runtime
 ```
 ### 成员函数
 ```
 bool Init(const RuntimeOption& runtime_option)
 ```
 模型加载初始化
 **参数**
 > * **runtime_option**: 配置好的RuntimeOption实例
 **返回值**
 初始化成功返回true，否则返回false
 ```
 bool Infer(vector<FDTensor>& inputs, vector<FDTensor>* outputs)
 ```
 根据输入进行推理，并将结果写回到outputs
 **参数**
 > * **inputs**: 输入数据
 > * **outputs**: 输出数据
 **返回值**
 推理成功返回true，否则返回false
 ```
 int NumInputs()
 ```
 返回模型输入个数
 ```
 input NumOutputs()
 ```
 返回模型输出个数
--- a/docs/api/runtime/runtime_option.md
+++ b/docs/api/runtime/runtime_option.md
@@ -1,231 +0,0 @@
 # RuntimeOption
 `RuntimeOption`用于配置模型在不同后端、硬件上的推理参数。
 ## Python 类
 ```
 class RuntimeOption()
 ```
 ### 成员函数
 ```
 set_model_path(model_file, params_file="", model_format="paddle")
 ```
 设定加载的模型路径
 **参数**
 > * **model_file**(str): 模型文件路径
 > * **params_file**(str): 参数文件路径，当为onnx模型格式时，无需指定
 > * **model_format**(str): 模型格式，支持paddle, onnx, 默认paddle
 ```
 use_gpu(device_id=0)
 ```
 设定使用GPU推理
 **参数**
 > * **device_id**(int): 环境中存在多个GPU卡时，此参数指定推理的卡，默认为0
 ```
 use_cpu()
 ```
 设定使用CPU推理
 ```
 set_cpu_thread_num(thread_num=-1)
 ```
 设置CPU上推理时线程数量
 **参数**
 > * **thread_num**(int): 线程数量，当小于或等于0时为后端自动分配，默认-1
 ```
 use_paddle_backend()
 ```
 使用Paddle Inference后端进行推理，支持CPU/GPU，支持Paddle模型格式
 ```
 use_ort_backend()
 ```
 使用ONNX Runtime后端进行推理，支持CPU/GPU，支持Paddle/ONNX模型格式
 ```
 use_trt_backend()
 ```
 使用TensorRT后端进行推理，支持GPU，支持Paddle/ONNX模型格式
 ```
 use_openvino_backend()
 ```
 使用OpenVINO后端进行推理，支持CPU, 支持Paddle/ONNX模型格式
 ```
 set_paddle_mkldnn(pd_mkldnn=True)
 ```
 当使用Paddle Inference后端时，通过此开关开启或关闭CPU上MKLDNN推理加速，后端默认为开启
 ```
 enable_paddle_log_info()
 disable_paddle_log_info()
 ```
 当使用Paddle Inference后端时，通过此开关开启或关闭模型加载时的优化日志，后端默认为关闭
 ```
 set_paddle_mkldnn_cache_size(cache_size)
 ```
 当使用Paddle Inference后端时，通过此接口控制MKLDNN加速时的Shape缓存大小
 **参数**
 > * **cache_size**(int): 缓存大小
 ```
 set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)
 ```
 当使用TensorRT后端时，通过此接口设置模型各个输入的Shape范围，当只设置min_shape时，会自动将opt_shape和max_shape设定为与min_shape一致。
 此接口用户也可以无需自行调用，FastDeploy在推理过程中，会根据推理真实数据自动更新Shape范围，但每次遇到新的shape更新范围后，会重新构造后端引擎，带来一定的耗时。可能过此接口提前配置，来避免推理过程中的引擎重新构建。
 **参数**
 > * **tensor_name**(str): 需要设定输入范围的tensor名
 > * **min_shape(list of int): 对应tensor的最小shape，例如[1, 3, 224, 224]
 > * **opt_shape(list of int): 对应tensor的最常用shape，例如[2, 3, 224, 224], 当为None时，即保持与min_shape一致，默认为None
 > * **max_shape(list of int): 对应tensor的最大shape，例如[8, 3, 224, 224], 当为None时，即保持与min_shape一致，默认为None
 ```
 set_trt_cache_file(cache_file_path)
 ```
 当使用TensorRT后端时，通过此接口将构建好的TensorRT模型引擎缓存到指定路径，或跳过构造引擎步骤，直接加载本地缓存的TensorRT模型
 - 当调用此接口，且`cache_file_path`不存在时，FastDeploy将构建TensorRT模型，并将构建好的模型保持至`cache_file_path`
 - 当调用此接口，且`cache_file_path`存在时，FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型，从而大大减少模型加载初始化的耗时
 通过此接口，可以在第二次运行代码时，加速模型加载初始化的时间，但因此也需注意，如需您修改了模型加载配置，例如TensorRT的max_workspace_size，或重新设置了`set_trt_input_shape`，以及更换了原始的paddle或onnx模型，需先删除已缓存在本地的`cache_file_path`文件，避免重新加载旧的缓存，影响程序正确性。
 **参数**
 > * **cache_file_path**(str): 缓存文件路径，例如`/Downloads/resnet50.trt`
 ```
 enable_trt_fp16()
 disable_trt_fp16()
 ```
 当使用TensorRT后端时，通过此接口开启或关闭半精度推理加速，会带来明显的性能提升，但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上，将会回退到FP32推理，并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
 ## C++ 结构体
 ```
 struct RuntimeOption
 ```
 ### 成员函数
 ```
 void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")
 ```
 设定加载的模型路径
 **参数**
 > * **model_file**: 模型文件路径
 > * **params_file**: 参数文件路径，当为onnx模型格式时，指定为""即可
 > * **model_format**: 模型格式，支持"paddle", "onnx", 默认"paddle"
 ```
 void UseGpu(int device_id = 0)
 ```
 设定使用GPU推理
 **参数**
 > * **device_id**: 环境中存在多个GPU卡时，此参数指定推理的卡，默认为0
 ```
 void UseCpu()
 ```
 设定使用CPU推理
 ```
 void SetCpuThreadNum(int thread_num=-1)
 ```
 设置CPU上推理时线程数量
 **参数**
 > * **thread_num**: 线程数量，当小于或等于0时为后端自动分配，默认-1
 ```
 void UsePaddleBackend()
 ```
 使用Paddle Inference后端进行推理，支持CPU/GPU，支持Paddle模型格式
 ```
 void UseOrtBackend()
 ```
 使用ONNX Runtime后端进行推理，支持CPU/GPU，支持Paddle/ONNX模型格式
 ```
 void UseTrtBackend()
 ```
 使用TensorRT后端进行推理，支持GPU，支持Paddle/ONNX模型格式
 ```
 void UseOpenVINOBackend()
 ```
 使用OpenVINO后端进行推理，支持CPU, 支持Paddle/ONNX模型格式
 ```
 void SetPaddleMKLDNN(bool pd_mkldnn = true)
 ```
 当使用Paddle Inference后端时，通过此开关开启或关闭CPU上MKLDNN推理加速，后端默认为开启
 ```
 void EnablePaddleLogInfo()
 void DisablePaddleLogInfo()
 ```
 当使用Paddle Inference后端时，通过此开关开启或关闭模型加载时的优化日志，后端默认为关闭
 ```
 void SetPaddleMKLDNNCacheSize(int cache_size)
 ```
 当使用Paddle Inference后端时，通过此接口控制MKLDNN加速时的Shape缓存大小
 **参数**
 > * **cache_size**: 缓存大小
 ```
 void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
                      const vector<int32_t>& opt_shape = vector<int32_t>(),
                      const vector<int32_t>& opt_shape = vector<int32_t>())
 ```
 当使用TensorRT后端时，通过此接口设置模型各个输入的Shape范围，当只设置min_shape时，会自动将opt_shape和max_shape设定为与min_shape一致。
 此接口用户也可以无需自行调用，FastDeploy在推理过程中，会根据推理真实数据自动更新Shape范围，但每次遇到新的shape更新范围后，会重新构造后端引擎，带来一定的耗时。可能过此接口提前配置，来避免推理过程中的引擎重新构建。
 **参数**
 > * **tensor_name**: 需要设定输入范围的tensor名
 > * **min_shape: 对应tensor的最小shape，例如[1, 3, 224, 224]
 > * **opt_shape: 对应tensor的最常用shape，例如[2, 3, 224, 224], 当为默认参数即空vector时，则视为保持与min_shape一致，默认为空vector
 > * **max_shape: 对应tensor的最大shape，例如[8, 3, 224, 224], 当为默认参数即空vector时，则视为保持与min_shape一致，默认为空vector
 ```
 void SetTrtCacheFile(const string& cache_file_path)
 ```
 当使用TensorRT后端时，通过此接口将构建好的TensorRT模型引擎缓存到指定路径，或跳过构造引擎步骤，直接加载本地缓存的TensorRT模型
 - 当调用此接口，且`cache_file_path`不存在时，FastDeploy将构建TensorRT模型，并将构建好的模型保持至`cache_file_path`
 - 当调用此接口，且`cache_file_path`存在时，FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型，从而大大减少模型加载初始化的耗时
 通过此接口，可以在第二次运行代码时，加速模型加载初始化的时间，但因此也需注意，如需您修改了模型加载配置，例如TensorRT的max_workspace_size，或重新设置了`SetTrtInputShape`，以及更换了原始的paddle或onnx模型，需先删除已缓存在本地的`cache_file_path`文件，避免重新加载旧的缓存，影响程序正确性。
 **参数**
 > * **cache_file_path**: 缓存文件路径，例如`/Downloads/resnet50.trt`
 ```
 void EnableTrtFp16()
 void DisableTrtFp16()
 ```
 当使用TensorRT后端时，通过此接口开启或关闭半精度推理加速，会带来明显的性能提升，但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上，将会回退到FP32推理，并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
--- a/docs/api/runtime_option.md
+++ b/docs/api/runtime_option.md
@@ -1,132 +0,0 @@
 # RuntimeOption 推理后端配置
 FastDeploy产品中的Runtime包含多个推理后端，其各关系如下所示
 | 模型格式\推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
 | :---------------  | :---------- | :--------------- | :------- | :------- |
 |     Paddle        | 支持(内置Paddle2ONNX) | 支持 | 支持(内置Paddle2ONNX) | 支持 |
 |     ONNX          | 支持        | 支持(需通过X2Paddle转换) | 支持 | 支持 |
 各Runtime支持的硬件情况如下
 | 硬件/推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
 | :---------------  | :---------- | :--------------- | :------- | :------- |
 |   CPU        |  支持       | 支持        | 不支持 |   支持 |
 |   GPU       |   支持       | 支持       | 支持    | 支持   |
 在各模型的，均通过`RuntimeOption`来配置推理的后端，以及推理时的参数，例如在python中，加载模型后可通过如下代码打印推理配置
 ```python
 model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
 print(model.runtime_option)
 ```
 可看下如下输出
 ```python
 RuntimeOption(
  backend : Backend.ORT                # 推理后端ONNXRuntime
  cpu_thread_num : 8                   # CPU线程数（仅当使用CPU推理时有效）
  device : Device.CPU                  # 推理硬件为CPU
  device_id : 0                        # 推理硬件id（针对GPU）
  model_file : yolov5s.onnx            # 模型文件路径
  params_file :                        # 参数文件路径
  model_format : ModelFormat.ONNX         # 模型格式
  ort_execution_mode : -1              # 前辍为ort的表示为ONNXRuntime后端专用参数
  ort_graph_opt_level : -1
  ort_inter_op_num_threads : -1
  trt_enable_fp16 : False              # 前辍为trt的表示为TensorRT后端专用参数
  trt_enable_int8 : False
  trt_max_workspace_size : 1073741824
  trt_serialize_file :
  trt_fixed_shape : {}
  trt_min_shape : {}
  trt_opt_shape : {}
  trt_max_shape : {}
  trt_max_batch_size : 32
 )
 ```
 ## Python 使用
 ### RuntimeOption类
 `fastdeploy.RuntimeOption()`配置选项
 #### 配置选项
 > * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO`等
 > * **cpu_thread_num**(int): CPU推理线程数，仅当CPU推理时有效
 > * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU`等
 > * **device_id**(int): 设备id，在GPU下使用
 > * **model_file**(str): 模型文件路径
 > * **params_file**(str): 参数文件路径
 > * **model_format**(ModelFormat): 模型格式, `fd.ModelFormat.PADDLE`/`fd.ModelFormat.ONNX`
 > * **ort_execution_mode**(int): ORT后端执行方式，0表示按顺序执行所有算子，1表示并行执行算子，默认为-1，即按ORT默认配置方式执行
 > * **ort_graph_opt_level**(int): ORT后端图优化等级；0：禁用图优化；1：基础优化 2：额外拓展优化；99：全部优化； 默认为-1，即按ORT默认配置方式执行
 > * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时，此参数设置算子间并行的线程数
 > * **trt_enable_fp16**(bool): TensorRT开启FP16推理
 > * **trt_enable_int8**(bool): TensorRT开启INT8推理
 > * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
 > * **trt_fixed_shape**(dict[str : list[int]]): 当模型为动态shape，但实际推理时输入shape保持不变，则通过此参数配置输入的固定shape
 > * **trt_min_shape**(dict[str : list[int]]): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最小shape
 > * **trt_opt_shape**(dict[str : list[int]]): 当模型为动态shape, 且实际推理时输入shape也会变化，通过此参数配置输入的最优shape
 > * **trt_max_shape**(dict[str : list[int]]): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最大shape
 > * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
 ```python
 import fastdeploy as fd
 option = fd.RuntimeOption()
 option.backend = fd.Backend.TRT
 # 当使用TRT后端，且为动态输入shape时
 # 需配置输入shape信息
 option.trt_min_shape = {"x": [1, 3, 224, 224]}
 option.trt_opt_shape = {"x": [4, 3, 224, 224]}
 option.trt_max_shape = {"x": [8, 3, 224, 224]}
 model = fd.vision.classification.PaddleClasModel(
    "resnet50/inference.pdmodel",
    "resnet50/inference.pdiparams",
    "resnet50/inference_cls.yaml",
    runtime_option=option)
 ```
 ## C++ 使用
 ### RuntimeOption 结构体
 `fastdeploy::RuntimeOption()`配置选项
 #### 配置选项
 > * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO`等
 > * **cpu_thread_num**(int): CPU推理线程数，仅当CPU推理时有效
 > * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU`等
 > * **device_id**(int): 设备id，在GPU下使用
 > * **model_file**(string): 模型文件路径
 > * **params_file**(string): 参数文件路径
 > * **model_format**(fastdeploy::ModelFormat): 模型格式, `ModelFormat::PADDLE`/`ModelFormat::ONNX`
 > * **ort_execution_mode**(int): ORT后端执行方式，0表示按顺序执行所有算子，1表示并行执行算子，默认为-1，即按ORT默认配置方式执行
 > * **ort_graph_opt_level**(int): ORT后端图优化等级；0：禁用图优化；1：基础优化 2：额外拓展优化；99：全部优化； 默认为-1，即按ORT默认配置方式执行
 > * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时，此参数设置算子间并行的线程数
 > * **trt_enable_fp16**(bool): TensorRT开启FP16推理
 > * **trt_enable_int8**(bool): TensorRT开启INT8推理
 > * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
 > * **trt_fixed_shape**(map<string, vector<int>>): 当模型为动态shape，但实际推理时输入shape保持不变，则通过此参数配置输入的固定shape
 > * **trt_min_shape**(map<string, vector<int>>): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最小shape
 > * **trt_opt_shape**(map<string, vector<int>>): 当模型为动态shape, 且实际推理时输入shape也会变化，通过此参数配置输入的最优shape
 > * **trt_max_shape**(map<string, vector<int>>): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最大shape
 > * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
 ```c++
 #include "fastdeploy/vision.h"
 int main() {
  auto option = fastdeploy::RuntimeOption();
  option.trt_min_shape["x"] = {1, 3, 224, 224};
  option.trt_opt_shape["x"] = {4, 3, 224, 224};
  option.trt_max_shape["x"] = {8, 3, 224, 224};
  auto model = fastdeploy::vision::classification::PaddleClasModel(
                           "resnet50/inference.pdmodel",
                           "resnet50/inference.pdiparams",
                           "resnet50/inference_cls.yaml",
                            option);
  return 0;
 }
 ```
--- a/docs/api/text_results/README.md
+++ b/docs/api/text_results/README.md
@@ -1,7 +0,0 @@
 # 自然语言模型预测结果说明
 FastDeploy根据自然语言模型的任务类型，定义了不同的结构体来表达模型预测结果，具体如下表所示
 | 结构体 | 文档 | 说明 | 相应模型 |
 | :----- | :--- | :---- | :------- |
 | UIEResult | [C++/Python文档](./uie_result.md) | UIE模型返回结果 | UIE模型 |
--- a/docs/api/text_results/uie_result.md
+++ b/docs/api/text_results/uie_result.md
@@ -1,34 +0,0 @@
 # UIEResult 图像分类结果
 UIEResult代码定义在`fastdeploy/text/uie/model.h`中，用于表明UIE模型抽取结果和置信度。
 ## C++ 定义
 `fastdeploy::text::UIEResult`
 ```c++
 struct UIEResult {
  size_t start_;
  size_t end_;
  double probability_;
  std::string text_;
  std::unordered_map<std::string, std::vector<UIEResult>> relation_;
  std::string Str() const;
 };
 ```
 - **start_**: 成员变量，表示抽取结果text_在原文本（Unicode编码）中的起始位置。
 - **end**: 成员变量，表示抽取结果text_在原文本（Unicode编码）中的结束位置。
 - **text_**: 成员函数，表示抽取的结果，以UTF-8编码方式保存。
 - **relation_**: 成员函数，表示当前结果关联的结果。常用于关系抽取。
 - **Str()**: 成员函数，将结构体中的信息以字符串形式输出（用于Debug）
 ## Python 定义
 `fastdeploy.text.C.UIEResult`
 - **start_**(int): 成员变量，表示抽取结果text_在原文本（Unicode编码）中的起始位置。
 - **end**(int): 成员变量，表示抽取结果text_在原文本（Unicode编码）中的结束位置。
 - **text_**(str): 成员函数，表示抽取的结果，以UTF-8编码方式保存。
 - **relation_**(dict(str, list(fastdeploy.text.C.UIEResult))): 成员函数，表示当前结果关联的结果。常用于关系抽取。
 - **get_dict()**: 以dict形式返回fastdeploy.text.C.UIEResult。
--- a/.new_docs/api_docs/README.md
+++ b/.new_docs/api_docs/README.md
--- a/.new_docs/api_docs/build.sh
+++ b/.new_docs/api_docs/build.sh
--- a/.new_docs/api_docs/cpp/.gitignore
+++ b/.new_docs/api_docs/cpp/.gitignore
--- a/.new_docs/api_docs/cpp/Doxyfile
+++ b/.new_docs/api_docs/cpp/Doxyfile
--- a/.new_docs/api_docs/cpp/README.md
+++ b/.new_docs/api_docs/cpp/README.md
--- a/.new_docs/api_docs/cpp/main_page.md
+++ b/.new_docs/api_docs/cpp/main_page.md
--- a/.new_docs/api_docs/index.html
+++ b/.new_docs/api_docs/index.html
--- a/.new_docs/api_docs/python/.gitignore
+++ b/.new_docs/api_docs/python/.gitignore
--- a/.new_docs/api_docs/python/Makefile
+++ b/.new_docs/api_docs/python/Makefile
--- a/.new_docs/api_docs/python/README.md
+++ b/.new_docs/api_docs/python/README.md
--- a/.new_docs/api_docs/python/conf.py
+++ b/.new_docs/api_docs/python/conf.py
--- a/.new_docs/api_docs/python/image_classification.md
+++ b/.new_docs/api_docs/python/image_classification.md
--- a/.new_docs/api_docs/python/index.rst
+++ b/.new_docs/api_docs/python/index.rst
--- a/.new_docs/api_docs/python/keypoint_detection.md
+++ b/.new_docs/api_docs/python/keypoint_detection.md
--- a/.new_docs/api_docs/python/make.bat
+++ b/.new_docs/api_docs/python/make.bat
--- a/.new_docs/api_docs/python/matting.md
+++ b/.new_docs/api_docs/python/matting.md
--- a/.new_docs/api_docs/python/object_detection.md
+++ b/.new_docs/api_docs/python/object_detection.md
--- a/.new_docs/api_docs/python/ocr.md
+++ b/.new_docs/api_docs/python/ocr.md
--- a/.new_docs/api_docs/python/requirements.txt
+++ b/.new_docs/api_docs/python/requirements.txt
--- a/.new_docs/api_docs/python/semantic_segmentation.md
+++ b/.new_docs/api_docs/python/semantic_segmentation.md
--- a/docs/arm_cpu/android_sdk.md
+++ b/docs/arm_cpu/android_sdk.md
@@ -1,404 +0,0 @@
 # 简介
 本文档介绍FastDeploy中的模型SDK，在Android环境下：（1）推理操作步骤；（2）介绍模型SDK使用说明，方便开发者了解项目后二次开发。
 <!--ts-->
 * [简介](#简介)
 * [系统支持说明](#系统支持说明)
 * [快速开始](#快速开始)
  * [1. 项目结构说明](#1-项目结构说明)
  * [2. APP 标准版测试](#2-app-标准版测试)
    * [2.1 扫码体验](#21-扫码体验)
    * [2.2 源码运行](#22-源码运行)
  * [3. 精简版测试](#3-精简版测试)
 * [SDK使用说明](#sdk使用说明)
  * [1.  集成指南](#1-集成指南)
    * [1.1 依赖库集成](#11-依赖库集成)
    * [1.2 添加权限](#12-添加权限)
    * [1.3 混淆规则（可选）](#13-混淆规则可选)
  * [2. API调用流程示例](#2-api调用流程示例)
    * [2.1 初始化](#21-初始化)
    * [2.2 预测图像](#22-预测图像)
 * [错误码](#错误码)
  <!--te-->
 # 系统支持说明
 1. Android 版本支持范围：Android 5.0（API21）<= Android < Android 10（API 29）。
 2. 硬件支持情况：支持 arm64-v8a 和 armeabi-v7a，暂不支持模拟器。 
 * 官网测试机型：红米k30，Vivo v1981a，华为oxp-an00，华为cdy-an90，华为pct-al10，荣耀yal-al00，OPPO Reno5 Pro 5G
 3. 其他说明
 * 【图像分割类算法】（1）图像分割类算法，暂未提供实时摄像头推理功能，开发者可根据自己需要，进行安卓开发；（2）PP-Humanseg-Lite模型设计初衷为横屏视频会议等场景，本次安卓SDK仅支持竖屏场景，开发者可根据自己需要，开发横屏功能。
 * 【OCR模型】OCR任务第一次启动任务，第一张推理时间久，属于正常情况（因为涉及到模型加载、预处理等工作）。
 > 预测图像时运行内存不能过小，一般大于模型资源文件夹大小的3倍。
 # 快速开始
 ## 1. 项目结构说明
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GIthub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。SDK目录结构如下：
 ```
 .EasyEdge-Android-SDK
 ├── app
 │   ├── src/main
 │   │   ├── assets
 │   │   │   ├── demo                 
 │   │   │   │   └── conf.json        # APP名字
 │   │   │   ├── infer                # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
 │   │   │   │   ├── model            # 模型结构文件 
 │   │   │   │   ├── params           # 模型参数文件
 │   │   │   │   ├── label_list.txt   # 模型标签文件
 │   │   │   │   └── infer_cfg.json   # 模型前后处理等配置文件
 │   │   ├── java/com.baidu.ai.edge/demo
 │   │   │   ├── infertest                          # 通用ARM精简版测试
 │   │   │   │   ├── TestInferClassifyTask.java     # 图像分类
 │   │   │   │   ├── TestInferDetectionTask.java    # 物体检测
 │   │   │   │   ├── TestInferSegmentTask.java      # 实例分割
 │   │   │   │   ├── TestInferPoseTask.java         # 姿态估计
 │   │   │   │   ├── TestInferOcrTask.java          # OCR
 │   │   │   │   └── MainActivity.java              # 精简版启动 Activity
 │   │   │   ├── MainActivity.java          # Demo APP 启动 Activity
 │   │   │   ├── CameraActivity.java        # 摄像头UI逻辑
 │   │   │   └── ...
 │   │   └── ...
 │   ├── libs
 │   │   ├── armeabi-v7a            # v7a的依赖库
 │   │   ├── arm64-v8a              # v8a的依赖库
 │   │   └── easyedge-sdk.jar       # jar文件
 │   └── ...
 ├── camera_ui    # UI模块，包含相机逻辑
 ├── README.md
 └── ...          # 其他 gradle 等工程文件
 ```
 ## 2. APP 标准版测试
 考虑部分Android开发板没有摄像头，因此本项目开发了标准版和精简版两种。标准版会调用Android系统的摄像头，采集摄像头来进行AI模型推理；精简版在没有摄像头的开发板上运行，需要开发者准备图像。开发者根据硬件情况，选择对应的版本。
 ### 2.1 扫码体验
 扫描二维码（二维码见下载网页`体验Demo`），无需任何依赖，手机上下载即可直接体验。
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854064-a31755d1-52b9-416d-b35d-885b7338a6cc.png" width="600"></div>
 ### 2.2 源码运行
 （1）下载对应的SDK，解压工程。</br>
   <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854071-f4c17de8-83c2-434e-882d-c175f4202a2d.png" width="600"></div>
 （2）打开Android Studio， 点击 "Import Project..."，即：File->New-> "Import Project...", 选择解压后的目录。</br>
 （3）手机链接Android Studio，并打开开发者模式。（不了解开发者模式的开发者，可浏览器搜索）</br>
 （4）此时点击运行按钮，手机上会有新app安装完毕，运行效果和二维码扫描的一样。</br>
  <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854049-988414c7-116a-4261-a0c7-2705cc199538.png" width="400"></div>
 ## 3. 精简版测试
 * 考虑部分Android开发板没有摄像头，本项目提供了精简版本，精简版忽略摄像头等UI逻辑，可兼容如无摄像头的开发板测试。
 * 精简版对应的测试图像路径，在代码`src/main/java/com.baidu.ai.edge/demo/TestInfer*.java`中进行了设置，开发者可以准备图像到对应路径测试，也可以修改java代码测试。
 * 支持以下硬件环境的精简版测试：通用ARM：图像分类、物体检测、实例分割、姿态估计、文字识别。
 示例代码位于 app 模块下 infertest 目录，修改 app/src/main/AndroidManifest.xml 中的启动 Activity 开启测试。
 修改前：
 ```
 <activity android:name=".MainActivity">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                infertest.MainActivity
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
        <activity
            android:name=".CameraActivity"
            android:screenOrientation="portrait" >
        </activity>
 ```
 修改后：      
 ```
 <!-- 以通用ARM为例 -->
 <activity android:name=".infertest.MainActivity">
    <intent-filter>
        <action android:name="android.intent.action.MAIN" />
        <category android:name="android.intent.category.LAUNCHER" />
    </intent-filter>
 </activity>
 ```
 注意：修改后，因为没有测试数据，需要开发者准备一张测试图像，放到 `app/src/main/asserts/` 路径下，并按照`app/src/main/java/com/baidu/ai/edge/demo/infertest/TestInfer*.java`中的图像命名要求对图像进行命名。
 <div align="center">
 | Demo APP 检测模型运行示例                                                                                   | 精简版检测模型运行示例                                                                                    |
 | --------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
 | ![Demo APP](https://user-images.githubusercontent.com/54695910/175855181-595fd449-7351-4ec6-a3b8-68c021b152f6.jpeg) | ![精简版](https://user-images.githubusercontent.com/54695910/175855176-075f0c8a-b05d-4d60-a2a1-3f0204c6386e.jpeg) |
 </div>
 # SDK使用说明
 本节介绍如何将 SDK 接入开发者的项目中使用。
 ## 1. 集成指南
 步骤一：依赖库集成
 步骤二：添加必要权限
 步骤三：混淆配置（可选）
 ### 1.1 依赖库集成
 A. 项目中未集成其他 jar 包和 so 文件：
 ```
 // 1. 复制 app/libs 至项目的 app/libs 目录
 // 2. 参考 app/build.gradle 配置 NDK 可用架构和 so 依赖库目录
 android {
    ...
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'
        }
    }
    sourceSets {
        main {
            jniLibs.srcDirs = ['libs']
        }
    }
 }
 ```
 B. 项目中已集成其他 jar 包，未集成 so 文件：
 ```
 // 1. 复制 app/libs/easyedge-sdk.jar 与其他 jar 包同目录
 // 2. 复制 app/libs 下 armeabi-v7a 和 arm64-v8a 目录至 app/src/main/jniLibs 目录下
 // 3. 参考 app/build.gradle 配置 NDK 可用架构
 android {
    ...
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'
        }
    }
 }
 ```
 C. 项目中已集成其他 jar 包和 so 文件：
 ```
 // 1. 复制 app/libs/easyedge-sdk.jar 与其他 jar 包同目录
 // 2. 融合 app/libs 下 armeabi-v7a 和 arm64-v8a 下的 so 文件与其他同架构 so 文件同目录
 // 3. 参考 app/build.gradle 配置 NDK 可用架构
 android {
    ...
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'    // 只支持 v7a 和 v8a 两种架构，有其他架构需删除
        }
    }
 }
 ```
 ### 1.2 添加权限
 参考 app/src/main/AndroidManifest.xml 中配置的权限。
 ```
 <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>
 <uses-permission android:name="android.permission.INTERNET"/>
 <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
 ```
 ### 1.3 混淆规则（可选）
 请不要混淆 jar 包文件，参考 app/proguard-rules.pro 配置。
 ```
 -keep class com.baidu.ai.edge.core.*.*{ *; }
 ```
 ## 2. API调用流程示例
 以通用ARM的图像分类预测流程为例，详细说明请参考后续章节：
 ```
 try {
    // step 1-1: 准备配置类
    InferConfig config = new InferConfig(context.getAssets(), "infer");
    // step 1-2: 准备预测 Manager
    InferManager manager = new InferManager(context, config, "");
    // step 2-1: 准备待预测的图像，必须为 Bitmap.Config.ARGB_8888 格式，一般为默认格式
    Bitmap image = getFromSomeWhere();
    // step 2-2: 预测图像
    List<ClassificationResultModel> results = manager.classify(image, 0.3f);
    // step 3: 解析结果
    for (ClassificationResultModel resultModel : results) {
        Log.i(TAG, "labelIndex=" + resultModel.getLabelIndex() 
                + ", labelName=" + resultModel.getLabel() 
                + ", confidence=" + resultModel.getConfidence());
    }
    // step 4: 释放资源。预测完毕请及时释放资源
    manager.destroy();
 } catch (Exception e) {
    Log.e(TAG, e.getMessage());
 }
 ```
 ### 2.1 初始化
 **准备配置类**
 芯片与配置类对应关系：
 - 通用ARM：InferConfig
 ```
 // 示例
 // 参数二为芯片对应的模型资源文件夹名称
 InferConfig config = new InferConfig(context.getAssets(), "infer");
 ```
 **准备预测 Manager**
 芯片与 Manager 对应关系：
 - 通用ARM：InferManager
 ```
 // 示例
 // 参数二为配置类对象
 // 参数三保持空字符串即可
 InferManager manager = new InferManager(context, config, "");
 ```
 >  **注意**
 > 
 > 1. 同一时刻只能有且唯一有效的 Manager，若要新建一个 Manager，之前创建的 Manager 需先调用 destroy() 销毁；
 > 2. Manager 的任何方法都不能在 UI 线程调用；
 > 3. Manager 的任何成员变量及方法由于线程同步问题，都必须在同一个线程中执行；
 ### 2.2 预测图像
 本节介绍各种模型类型的预测函数及结果解析。
 > **注意**
 > 预测函数可以多次调用，但必须在同一个线程中，不支持并发
 > 预测函数中的 confidence 非必需，默认使用模型推荐值。填 0 可返回所有结果
 > 待预测的图像必须为 Bitmap.Config.ARGB_8888 格式的 Bitmap
 **图像分类**
 ```
 // 预测函数
 List<ClassificationResultModel> classify(Bitmap bitmap) throws BaseException;
 List<ClassificationResultModel> classify(Bitmap bitmap, float confidence) throws BaseException;
 // 返回结果
 ClassificationResultModel
 - label: 分类标签，定义在label_list.txt中
 - labelIndex: 分类标签对应的序号
 - confidence: 置信度，0-1
 ```
 **物体检测**
 ```
 // 预测函数
 List<DetectionResultModel> detect(Bitmap bitmap) throws BaseException;
 List<DetectionResultModel> detect(Bitmap bitmap, float confidence) throws BaseException;
 // 返回结果
 DetectionResultModel
 - label: 标签，定义在label_list.txt中
 - confidence: 置信度，0-1
 - bounds: Rect，包含左上角和右下角坐标，指示物体在图像中的位置
 ```
 **实例分割**
 ```
 // 预测函数
 List<SegmentationResultModel> segment(Bitmap bitmap) throws BaseException;
 List<SegmentationResultModel> segment(Bitmap bitmap, float confidence) throws BaseException;
 // 返回结果
 SegmentationResultModel
 - label: 标签，定义在label_list.txt中
 - confidence: 置信度，0-1
 - lableIndex: 标签对应的序号
 - box: Rect，指示物体在图像中的位置
 - mask: byte[]，表示原图大小的0，1掩码，绘制1的像素即可得到当前对象区域
 - maskLEcode: mask的游程编码
 ```
 > 关于 maskLEcode 的解析方式可参考 [http demo](https://github.com/Baidu-AIP/EasyDL-Segmentation-Demo)
 **姿态估计**
 ```
 // 预测函数
 List<PoseResultModel> pose(Bitmap bitmap) throws BaseException;
 // 返回结果
 PoseResultModel
 - label: 标签，定义在label_list.txt中
 - confidence: 置信度，0-1
 - points: Pair<Point, Point>, 2个点构成一条线
 ```
 **文字识别**
 ```
 // 预测函数
 List<OcrResultModel> ocr(Bitmap bitmap) throws BaseException;
 List<OcrResultModel> ocr(Bitmap bitmap, float confidence) throws BaseException;
 // 返回结果
 OcrResultModel
 - label: 识别出的文字
 - confidence: 置信度，0-1
 - points: List<Point>, 文字所在区域的点位
 ```
 # 错误码
 | 错误码  | 错误描述                           | 详细描述及解决方法                                                                            |
 | ---- | ------------------------------ | ------------------------------------------------------------------------------------ |
 | 1001 | assets 目录下用户指定的配置文件不存在         | SDK可以使用assets目录下config.json作为配置文件。如果传入的config.json不在assets目录下，则有此报错                  |
 | 1002 | 用户传入的配置文件作为json解析格式不准确，如缺少某些字段 | 正常情况下，demo中的config.json不要修改                                                          |
 | 19xx | Sdk内部错误                        | 请与百度人员联系                                                                             |
 | 2001 | XxxxMANAGER 只允许一个实例            | 如已有XxxxMANAGER对象，请调用destory方法                                                        |
 | 2002 | XxxxMANAGER  已经调用过destory方法    | 在一个已经调用destory方法的DETECT_MANAGER对象上，不允许再调用任何方法                                        |
 | 2003 | 传入的assets下模型文件路径为null          | XxxxConfig.getModelFileAssetPath() 返回为null。由setModelFileAssetPath(null）导致            |
 | 2011 | libedge-xxxx.so 加载失败           | System.loadLibrary("edge-xxxx"); libedge-xxxx.so 没有在apk中。CPU架构仅支持armeabi-v7a arm-v8a |
 | 2012 | JNI内存错误                        | heap的内存不够                                                                            |
 | 2103 | license过期                      | license失效或者系统时间有异常                                                                   |
 | 2601 | assets 目录下模型文件打开失败             | 请根据报错信息检查模型文件是否存在                                                                    |
 | 2611 | 检测图片时，传递至引擎的图片二进制与长宽不符合        | 具体见报错信息                                                                              |
 | 27xx | Sdk内部错误                        | 请与百度人员联系                                                                             |
 | 28xx | 引擎内部错误                         | 请与百度人员联系                                                                             |
 | 29xx | Sdk内部错误                        | 请与百度人员联系                                                                             |
 | 3000 | so加载错误                         | 请确认所有so文件存在于apk中                                                                     |
 | 3001 | 模型加载错误                         | 请确认模型放置于能被加载到的合法路径中，并确保config.json配置正确                                               |
 | 3002 | 模型卸载错误                         | 请与百度人员联系                                                                             |
 | 3003 | 调用模型错误                         | 在模型未加载正确或者so库未加载正确的情况下调用了分类接口                                                        |
 | 50xx | 在线模式调用异常                       | 请与百度人员联系                                                                             |
--- a/docs/arm_cpu/arm_linux_cpp_sdk_inference.md
+++ b/docs/arm_cpu/arm_linux_cpp_sdk_inference.md
@@ -1,404 +0,0 @@
 # 简介
 本文档介绍FastDeploy中的模型SDK，在ARM Linux C++环境下 ： （1）推理部署步骤； （2）介绍模型推流全流程API，方便开发者了解项目后二次开发。
 其中ARM Linux Python请参考[ARM Linux Python环境下的推理部署](./arm_linux_python_sdk_inference.md)文档。
 **注意**：部分模型（如Tinypose、OCR等）仅支持图像推理，不支持视频推理。
 <!--ts-->
 * [简介](#简介)
 * [环境准备](#环境准备)
  * [1. 硬件支持](#1-硬件支持)
  * [2. 软件环境](#2-软件环境)
 * [快速开始](#快速开始)
  * [1. 项目结构说明](#1-项目结构说明)
  * [2. 测试Demo](#2-测试demo)
    * [2.1 预测图像](#21-预测图像)
    * [2.2 预测视频流](#22-预测视频流)
 * [预测API流程详解](#预测api流程详解)
  * [1. SDK参数运行配置](#1-sdk参数运行配置)
  * [2. 初始化Predictor](#2-初始化predictor)
  * [3. 预测推理](#3-预测推理)
    * [3.1 预测图像](#31-预测图像)
    * [3.2 预测视频](#32-预测视频)
 * [FAQ](#faq)
  <!--te-->
 # 环境准备
 ## 1. 硬件支持
 目前支持的ARM架构：aarch64 、armv7hf
 ## 2. 软件环境
 1.运行二进制文件-环境要求
 * gcc: 5.4 以上 (GLIBCXX_3.4.22) 
  * Linux下查看gcc版本命名（可能因系统差异命令会不同）：`gcc --version`
  * Linux下C++基础库GLIBCXX的命令（因系统差异，库路径会有不同）：`strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX`  
 * glibc：2.23以上
  * Linux查看命令：`ldd --version`
 2.二次开发编译-环境要求
 编译源代码时，除gcc、GLIBCXX、glibc满足`1.运行二进制文件-环境要求`外，cmake需满足：
 * cmake:  3.0 以上 
  * Linux查看命令：`cmake --version`
 # 快速开始
 ## 1. 项目结构说明
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GIthub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。SDK目录结构如下：
 ```
 .EasyEdge-Linux-m43157-b97741-x86
 ├── RES                  # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
 │   ├── conf.json        # Android、iOS系统APP名字需要
 │   ├── model            # 模型结构文件 
 │   ├── params           # 模型参数文件
 │   ├── label_list.txt   # 模型标签文件
 │   ├── infer_cfg.json   # 模型前后处理等配置文件
 ├── ReadMe.txt
 ├── cpp                 # C++ SDK 文件结构
    └── baidu_easyedge_ocr_linux_cpp_aarch64_ARM_gcc5.4_v1.5.1_20220530.tar.gz  #armv8架构硬件的C++包，根据自己硬件，选择对应的压缩包解压即可
        ├── ReadMe.txt   
        ├── bin         # 可直接运行的二进制文件
        ├── include     # 二次开发用的头文件 
        ├── lib         # 二次开发用的所依赖的库
        ├── src         # 二次开发用的示例工程
        └── thirdparty  # 第三方依赖
    └── baidu_easyedge_ocr_linux_cpp_armv7l_armv7hf_ARM_gcc5.4_v1.5.1_20220530.tar.gz  #armv7架构硬件的C++包，根据自己硬件，选择对应的压缩包解压即可
 └── python              # Python SDK 文件
 ```
 **注意**：
 1. 【OCR需要编译】因为OCR任务的特殊性，本次SDK没有提供bin文件夹可执行文件。开发者根据需要，满足文档中gcc和cmake要求后，在`src/demo*`路径编译获取可执行文件，具体可参考。
 2. 【OCR仅支持图像推理，不支持视频流推理】
 3. ARM-Linux-Python的环境要求和使用，请参考[ARM Linux Python环境下的推理部署](./arm_linux_python_sdk_inference.md)文档。
 ## 2. 测试Demo
 > 模型资源文件（即压缩包中的RES文件夹）默认已经打包在开发者下载的SDK包中，请先将tar包整体拷贝到具体运行的设备中，再解压缩使用。
 SDK中已经包含预先编译的二进制，可直接运行。以下运行示例均是`cd cpp/bin`路径下执行的结果。
 ### 2.1 预测图像
 ```bash
 ./easyedge_image_inference {模型RES文件夹路径}  {测试图片路径}
 ```
 运行效果示例：
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175855351-68d1a4f0-6226-4484-b190-65f1ac2c7128.png" width="400"></div>
 ```bash
 > ./easyedge_image_inference ../../../../RES 2.jpeg
 2019-02-13 16:46:12,659 INFO [EasyEdge] [easyedge.cpp:34] 140606189016192 Baidu EasyEdge Linux Development Kit 0.2.1(20190213)
 2019-02-13 16:46:14,083 INFO [EasyEdge] [paddlev2_edge_predictor.cpp:60] 140606189016192 Allocate graph success.
 2019-02-13 16:46:14,326 DEBUG [EasyEdge] [paddlev2_edge_predictor.cpp:143] 140606189016192 Inference costs 168 ms
 1, 1:txt_frame, p:0.994905 loc: 0.168161, 0.153654, 0.920856, 0.779621
 Done
 ```
 ### 2.2 预测视频流
 ```
 ./easyedge_video_inference {模型RES文件夹路径} {video_type} {video_src_path}
 ```
 其中 video_type 支持三种: 
 ```
    video_type : 1                  // 本地视频文件
    video_type : 2                  // 摄像头的index
    video_type : 3                  // 网络视频流
 ```
 video_src_path: 为 video_type 数值所对应的本地视频路径 、本地摄像头id、网络视频流地址，如：
 ```
    本地视频文件: ./easyedge_video_inference {模型RES文件夹路径} 1 ～/my_video_file.mp4
    本地摄像头: ./easyedge_video_inference {模型RES文件夹路径} 2 1 #/dev/video1
    网络视频流: ./easyedge_video_inference {模型RES文件夹路径} 3 rtmp://192.168.x.x:8733/live/src
 ```
 注：以上路径是假模拟路径，开发者需要根据自己实际图像/视频，准备测试图像，并填写正确的测试路径。
 # 预测API流程详解
 本章节主要结合[2.测试Demo](#4)的Demo示例介绍推理API，方便开发者学习后二次开发。更详细的API请参考`include/easyedge/easyedge*.h`文件。图像、视频的推理包含以下3个API，如下代码片段`step`注释所示。
 > ❗注意：<br>
 > （1）`src`文件夹中包含完整可编译的cmake工程实例，建议开发者先行了解[cmake工程基本知识](https://cmake.org/cmake/help/latest/guide/tutorial/index.html)。 <br>
 > （2）请优先参考SDK中自带的Demo工程的使用流程和说明。遇到错误，请优先参考文件中的注释、解释、日志说明。
 ```cpp
    // step 1: SDK配置运行参数
    EdgePredictorConfig config;
    config.model_dir = {模型文件目录};
    // step 2: 创建并初始化Predictor；这这里选择合适的引擎
    auto predictor = global_controller()->CreateEdgePredictor(config);
    // step 3-1: 预测图像
    auto img = cv::imread({图片路径});
    std::vector<EdgeResultData> results;
    predictor->infer(img, results);
    // step 3-2: 预测视频
    std::vector<EdgeResultData> results;
    FrameTensor frame_tensor;
    VideoConfig video_config;
    video_config.source_type = static_cast<SourceType>(video_type);  // source_type 定义参考头文件 easyedge_video.h
    video_config.source_value = video_src;
    /*
    ... more video_configs, 根据需要配置video_config的各选项
    */
    auto video_decoding = CreateVideoDecoding(video_config);
    while (video_decoding->next(frame_tensor) == EDGE_OK) {
        results.clear();
        if (frame_tensor.is_needed) {
            predictor->infer(frame_tensor.frame, results);
            render(frame_tensor.frame, results, predictor->model_info().kind);
        }
        //video_decoding->display(frame_tensor); // 显示当前frame，需在video_config中开启配置
        //video_decoding->save(frame_tensor); // 存储当前frame到视频，需在video_config中开启配置
     }
 ```
 若需自定义library search path或者gcc路径，修改对应Demo工程下的CMakeList.txt即可。
 ## 1. SDK参数运行配置
 SDK的参数通过`EdgePredictorConfig::set_config`和`global_controller()->set_config`配置。本Demo 中设置了模型路径，其他参数保留默认参数。更详细的支持运行参数等，可以参考开发工具包中的头文件（`include/easyedge/easyedge_xxxx_config.h`）的详细说明。
 配置参数使用方法如下：
 ```
 EdgePredictorConfig config;
 config.model_dir = {模型文件目录};
 ```
 ## 2. 初始化Predictor
 * 接口
  ```cpp
  auto predictor = global_controller()->CreateEdgePredictor(config);
  predictor->init();
  ```
 若返回非0，请查看输出日志排查错误原因。
 ## 3. 预测推理
 ### 3.1 预测图像
 > 在Demo中展示了预测接口infer()传入cv::Mat& image图像内容，并将推理结果赋值给std::vector<EdgeResultData>& result。更多关于infer()的使用，可以根据参考`easyedge.h`头文件中的实际情况、参数说明自行传入需要的内容做推理
 * 接口输入
 ```cpp
 /**
  * @brief
  * 通用接口
  * @param image: must be BGR , HWC format (opencv default)
  * @param result
  * @return
  */
 virtual int infer(cv::Mat& image, std::vector<EdgeResultData>& result) = 0;
 ```
 图片的格式务必为opencv默认的BGR, HWC格式。
 * 接口返回
  `EdgeResultData`中可以获取对应的分类信息、位置信息。
 ```cpp
 struct EdgeResultData {
    int index;  // 分类结果的index
    std::string label;  // 分类结果的label
    float prob;  // 置信度
    // 物体检测 或 图像分割时使用：
    float x1, y1, x2, y2;  // (x1, y1): 左上角， （x2, y2): 右下角； 均为0~1的长宽比例值。
    // 图像分割时使用：
    cv::Mat mask;  // 0, 1 的mask
    std::string mask_rle;  // Run Length Encoding，游程编码的mask
 };
 ```
 *** 关于矩形坐标 ***
 x1 * 图片宽度 = 检测框的左上角的横坐标
 y1 * 图片高度 = 检测框的左上角的纵坐标
 x2 * 图片宽度 = 检测框的右下角的横坐标
 y2 * 图片高度 = 检测框的右下角的纵坐标
 *** 关于图像分割mask ***
 ```
 cv::Mat mask为图像掩码的二维数组
 {
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 }
 其中1代表为目标区域，0代表非目标区域
 ```
 *** 关于图像分割mask_rle ***
 该字段返回了mask的游程编码，解析方式可参考 [http demo](https://github.com/Baidu-AIP/EasyDL-Segmentation-Demo)
 以上字段可以参考demo文件中使用opencv绘制的逻辑进行解析
 ### 3.2 预测视频
 SDK 提供了支持摄像头读取、视频文件和网络视频流的解析工具类`VideoDecoding`，此类提供了获取视频帧数据的便利函数。通过`VideoConfig`结构体可以控制视频/摄像头的解析策略、抽帧策略、分辨率调整、结果视频存储等功能。对于抽取到的视频帧可以直接作为SDK infer 接口的参数进行预测。
 * 接口输入
 class`VideoDecoding`：
 ```
    /**
     * @brief 获取输入源的下一帧
     * @param frame_tensor
     * @return
     */
    virtual int next(FrameTensor &frame_tensor) = 0;
    /**
     * @brief 显示当前frame_tensor中的视频帧
     * @param frame_tensor
     * @return
     */
    virtual int display(const FrameTensor &frame_tensor) = 0;
    /**
     * @brief 将当前frame_tensor中的视频帧写为本地视频文件
     * @param frame_tensor
     * @return
     */
    virtual int save(FrameTensor &frame_tensor) = 0;
    /**
     * @brief 获取视频的fps属性
     * @return
     */
    virtual int get_fps() = 0;
     /**
      * @brief 获取视频的width属性
      * @return
      */
    virtual int get_width() = 0;
    /**
     * @brief 获取视频的height属性
     * @return
     */
    virtual int get_height() = 0;
 ```
 struct `VideoConfig`
 ```
 /**
 * @brief 视频源、抽帧策略、存储策略的设置选项
 */
 struct VideoConfig {
    SourceType source_type;            // 输入源类型
    std::string source_value;          // 输入源地址，如视频文件路径、摄像头index、网络流地址
    int skip_frames{0};                // 设置跳帧，每隔skip_frames帧抽取一帧，并把该抽取帧的is_needed置为true
    int retrieve_all{false};           // 是否抽取所有frame以便于作为显示和存储，对于不满足skip_frames策略的frame，把所抽取帧的is_needed置为false
    int input_fps{0};                  // 在采取抽帧之前设置视频的fps
    Resolution resolution{Resolution::kAuto}; // 采样分辨率，只对camera有效
    bool enable_display{false};         // 默认不支持。
    std::string window_name{"EasyEdge"};
    bool display_all{false};           // 是否显示所有frame，若为false，仅显示根据skip_frames抽取的frame
    bool enable_save{false};
    std::string save_path;             // frame存储为视频文件的路径
    bool save_all{false};              // 是否存储所有frame，若为false，仅存储根据skip_frames抽取的frame
    std::map<std::string, std::string> conf;
 };
 ```
 | 序号  | 字段             | 含义                                                                                                                                 |
 | --- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | 1   | `source_type`  | 输入源类型，支持视频文件、摄像头、网络视频流三种，值分别为1、2、3                                                                                                 |
 | 2   | `source_value` | 若`source_type`为视频文件，该值为指向视频文件的完整路径；若`source_type`为摄像头，该值为摄像头的index，如对于`/dev/video0`的摄像头，则index为0；若`source_type`为网络视频流，则为该视频流的完整地址。 |
 | 3   | `skip_frames`  | 设置跳帧，每隔skip_frames帧抽取一帧，并把该抽取帧的is_needed置为true，标记为is_needed的帧是用来做预测的帧。反之，直接跳过该帧，不经过预测。                                             |
 | 4   | `retrieve_all` | 若置该项为true，则无论是否设置跳帧，所有的帧都会被抽取返回，以作为显示或存储用。                                                                                         |
 | 5   | `input_fps`    | 用于抽帧前设置fps                                                                                                                         |
 | 6   | `resolution`   | 设置摄像头采样的分辨率，其值请参考`easyedge_video.h`中的定义，注意该分辨率调整仅对输入源为摄像头时有效                                                                       |
 | 7   | `conf`         | 高级选项。部分配置会通过该map来设置                                                                                                                |
 *** 注意：*** 
 1. `VideoConfig`不支持`display`功能。如果需要使用`VideoConfig`的`display`功能，需要自行编译带有GTK选项的OpenCV。
 2. 使用摄像头抽帧时，如果通过`resolution`设置了分辨率调整，但是不起作用，请添加如下选项：
   ```
   video_config.conf["backend"] = "2";
   ```
 3. 部分设备上的CSI摄像头尚未兼容，如遇到问题，可以通过工单、QQ交流群或微信交流群反馈。
 具体接口调用流程，可以参考SDK中的`demo_video_inference`。
 # FAQ
 1. 如何处理一些 undefined reference / error while loading shared libraries?
   > 如：./easyedge_demo: error while loading shared libraries: libeasyedge.so.1: cannot open shared object file: No such file or directory
    遇到该问题时，请找到具体的库的位置，设置LD_LIBRARY_PATH；或者安装缺少的库。
   > 示例一：libverify.so.1: cannot open shared object file: No such file or directory
   > 链接找不到libveirfy.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../lib 解决(实际冒号后面添加的路径以libverify.so文件所在的路径为准)
   > 示例二：libopencv_videoio.so.4.5: cannot open shared object file: No such file or directory
   >  链接找不到libopencv_videoio.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../thirdparty/opencv/lib 解决(实际冒号后面添加的路径以libopencv_videoio.so所在路径为准)
   > 示例三：GLIBCXX_X.X.X  not found
   > 链接无法找到glibc版本，请确保系统gcc版本>=SDK的gcc版本。升级gcc/glibc可以百度搜索相关文献。
 2. 运行二进制时，提示 libverify.so cannot open shared object file
    可能cmake没有正确设置rpath, 可以设置LD_LIBRARY_PATH为sdk的lib文件夹后，再运行：
   ```bash
   LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./easyedge_demo
   ```
 3. 编译时报错：file format not recognized
    可能是因为在复制SDK时文件信息丢失。请将整个压缩包复制到目标设备中，再解压缩、编译。
--- a/docs/arm_cpu/arm_linux_cpp_sdk_serving.md
+++ b/docs/arm_cpu/arm_linux_cpp_sdk_serving.md
@@ -1,318 +0,0 @@
 # 简介
 本文档介绍FastDeploy中的模型SDK，在ARM Linux C++环境下：（1）服务化推理部署步骤；（2）介绍模型推流全流程API，方便开发者了解项目后二次开发。
 其中ARM Linux Python请参考[ARM Linux Python环境下的HTTP推理部署](./arm_linux_python_sdk_serving.md)文档。
 **注意**：部分模型（如OCR等）不支持服务化推理。
 <!--ts-->
 * [简介](#简介)
 * [安装准备](#安装准备)
  * [1. 硬件支持](#1-硬件支持)
  * [2. 软件环境](#2-软件环境)
 * [快速开始](#快速开始)
  * [1. 项目结构说明](#1-项目结构说明)
  * [2. 测试 HTTP Demo](#2-测试-http-demo)
    * [2.1 启动HTTP预测服务](#21-启动http预测服务)
 * [HTTP API流程详解](#http-api流程详解)
  * [1. 开启http服务](#1-开启http服务)
  * [2. 请求http服务](#2-请求http服务)
    * [2.1 http 请求方式一:不使用图片base64格式](#21-http-请求方式一不使用图片base64格式)
    * [2.2 http 请求方法二:使用图片base64格式](#22-http-请求方法二使用图片base64格式)
  * [3. http返回数据](#3-http返回数据)
 * [FAQ](#faq)
  <!--te-->
 # 安装准备
 ## 1. 硬件支持
 目前支持的ARM架构：aarch64 、armv7hf
 ## 2. 软件环境
 1.运行二进制文件-环境要求
 * gcc: 5.4 以上 (GLIBCXX_3.4.22) 
  * Linux下查看gcc版本命名（可能因系统差异命令会不同）：`gcc --version`； 
  * Linux下C++基础库GLIBCXX的命令（可能因系统差异路径会有不同，可检测自己环境下的情况）：`strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX`  
 * glibc：2.23以上
  * Linux查看命令：`ldd --version`
 2.二次开发编译-环境要求
 编译源代码时，除了gcc、GLIBCXX、glibc满足`1.运行二进制文件-环境要求`外，还需要cmake满足要求。
 * cmake:  3.0 以上 
  * Linux查看命令：`cmake --version`
 # 快速开始
 ## 1. 项目结构说明
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GIthub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。解压后SDK目录结构如下：
 ```
 .EasyEdge-Linux-m43157-b97741-x86
 ├── RES                  # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
 │   ├── conf.json        # Android、iOS系统APP名字需要
 │   ├── model            # 模型结构文件 
 │   ├── params           # 模型参数文件
 │   ├── label_list.txt   # 模型标签文件
 │   ├── infer_cfg.json   # 模型前后处理等配置文件
 ├── ReadMe.txt
 ├── cpp                  # C++ SDK 文件结构
    └── baidu_easyedge_linux_cpp_x86_64_CPU.Generic_gcc5.4_v1.4.0_20220325.tar.gz
        ├── bin          # 可直接运行的二进制文件
        ├── include      # 二次开发用的头文件 
        ├── lib          # 二次开发用的所依赖的库
        ├── src          # 二次开发用的示例工程
        └── thirdparty   # 第三方依赖
 └── python  # Python SDK 文件
 ```
 ## 2. 测试 HTTP Demo
 > 模型资源文件（即压缩包中的RES文件夹）默认已经打包在开发者下载的SDK包中，请先将tar包整体拷贝到具体运行的设备中，再解压缩使用。
 SDK中已经包含预先编译的二进制，可直接运行。以下运行示例均是`cd cpp/bin`路径下执行的结果。
 ### 2.1 启动HTTP预测服务
 ```
 ./easyedge_serving {模型RES文件夹路径} 
 ```
 启动后，日志中会显示如下设备IP和24401端口号信息：
 ```
 HTTP is now serving at 0.0.0.0:24401
 ```
 此时，开发者可以打开浏览器，输入链接地址`http://0.0.0.0:24401`（这里的`设备IP和24401端口号`根据开发者电脑显示修改），选择图片来进行测试。
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175855495-cd8d46ec-2492-4297-b3e4-2bda4cd6727c.png" width="600"></div>
 同时，可以调用HTTP接口来访问服务，具体参考下文的[二次开发](#10)接口说明。
 # HTTP API流程详解
 本章节主要结合[2.1 HTTP Demo]()的API介绍，方便开发者学习并将运行库嵌入到开发者的程序当中，更详细的API请参考`include/easyedge/easyedge*.h`文件。http服务包含服务端和客户端，目前支持的能力包括以下几种方式，Demo中提供了不使用图片base格式的`方式一：浏览器请求的方式`，其他几种方式开发者根据个人需要，选择开发。
 ## 1. 开启http服务
 http服务的启动可直接使用`bin/easyedge_serving`，或参考`src/demo_serving.cpp`文件修改相关逻辑
 ```cpp
 /**
     * @brief 开启一个简单的demo http服务。
     * 该方法会block直到收到sigint/sigterm。
     * http服务里，图片的解码运行在cpu之上，可能会降低推理速度。
     * @tparam ConfigT
     * @param config
     * @param host
     * @param port
     * @param service_id service_id  user parameter, uri '/get/service_id' will respond this value with 'text/plain'
     * @param instance_num 实例数量，根据内存/显存/时延要求调整
     * @return
     */
    template<typename ConfigT>
    int start_http_server(
            const ConfigT &config,
            const std::string &host,
            int port,
            const std::string &service_id,
            int instance_num = 1);
 ```
 ## 2. 请求http服务
 > 开发者可以打开浏览器，`http://{设备ip}:24401`，选择图片来进行测试。
 ### 2.1 http 请求方式一:不使用图片base64格式
 URL中的get参数：
 | 参数        | 说明        | 默认值              |
 | --------- | --------- | ---------------- |
 | threshold | 阈值过滤， 0~1 | 如不提供，则会使用模型的推荐阈值 |
 HTTP POST Body即为图片的二进制内容(无需base64, 无需json)
 Python请求示例
 ```Python
 import requests
 with open('./1.jpg', 'rb') as f:
    img = f.read()
    result = requests.post(
        'http://127.0.0.1:24401/',
        params={'threshold': 0.1},
        data=img).json()
 ```
 ### 2.2 http 请求方法二:使用图片base64格式
 HTTP方法：POST
 Header如下：
 | 参数           | 值                |
 | ------------ | ---------------- |
 | Content-Type | application/json |
 **Body请求填写**：
 * 分类网络：
  body 中请求示例
  ```
  {
    "image": "<base64数据>"
    "top_num": 5
  }
  ```
  body中参数详情
 | 参数      | 是否必选 | 类型     | 可选值范围 | 说明                                                                                  |
 | ------- | ---- | ------ | ----- | ----------------------------------------------------------------------------------- |
 | image   | 是    | string | -     | 图像数据，base64编码，要求base64图片编码后大小不超过4M,最短边至少15px，最长边最大4096px，支持jpg/png/bmp格式 **注意去掉头部** |
 | top_num | 否    | number | -     | 返回分类数量，不填该参数，则默认返回全部分类结果                                                            |
 * 检测和分割网络：
  Body请求示例：
  ```
  {
    "image": "<base64数据>"
  }
  ```
  body中参数详情：
 | 参数        | 是否必选 | 类型     | 可选值范围 | 说明                                                                                  |
 | --------- | ---- | ------ | ----- | ----------------------------------------------------------------------------------- |
 | image     | 是    | string | -     | 图像数据，base64编码，要求base64图片编码后大小不超过4M,最短边至少15px，最长边最大4096px，支持jpg/png/bmp格式 **注意去掉头部** |
 | threshold | 否    | number | -     | 默认为推荐阈值，也可自行根据需要进行设置                                                                |
 Python请求示例：
 ```Python
 import base64
 import requests
 def main():
    with open("图像路径", 'rb') as f:
        result = requests.post("http://{服务ip地址}:24401/", json={
            "image": base64.b64encode(f.read()).decode("utf8")
        })
        # print(result.request.body)
        # print(result.request.headers)
        print(result.content)
 if __name__ == '__main__':
    main()
 ```
 ## 3. http返回数据
 | 字段         | 类型说明   | 其他                                   |
 | ---------- | ------ | ------------------------------------ |
 | error_code | Number | 0为成功,非0参考message获得具体错误信息             |
 | results    | Array  | 内容为具体的识别结果。其中字段的具体含义请参考`预测图像-返回格式`一节 |
 | cost_ms    | Number | 预测耗时ms，不含网络交互时间                      |
 返回示例
 ```json
 {
    "cost_ms": 52,
    "error_code": 0,
    "results": [
        {
            "confidence": 0.94482421875,
            "index": 1,
            "label": "IronMan",
            "x1": 0.059185408055782318,
            "x2": 0.18795496225357056,
            "y1": 0.14762254059314728,
            "y2": 0.52510076761245728,
            "mask": "...",  // 图像分割模型字段
            "trackId": 0,  // 目标追踪模型字段
        },
      ]
 }
 ```
 *** 关于矩形坐标 ***
 x1 * 图片宽度 = 检测框的左上角的横坐标
 y1 * 图片高度 = 检测框的左上角的纵坐标
 x2 * 图片宽度 = 检测框的右下角的横坐标
 y2 * 图片高度 = 检测框的右下角的纵坐标
 *** 关于图像分割mask ***
 ```
 cv::Mat mask为图像掩码的二维数组
 {
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 }
 其中1代表为目标区域，0代表非目标区域
 ```
 # FAQ
 1. 如何处理一些 undefined reference / error while loading shared libraries?
   > 如：./easyedge_demo: error while loading shared libraries: libeasyedge.so.1: cannot open shared object file: No such file or directory
    遇到该问题时，请找到具体的库的位置，设置LD_LIBRARY_PATH；或者安装缺少的库。
   > 示例一：libverify.so.1: cannot open shared object file: No such file or directory
   > 链接找不到libveirfy.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../lib 解决(实际冒号后面添加的路径以libverify.so文件所在的路径为准)
   > 示例二：libopencv_videoio.so.4.5: cannot open shared object file: No such file or directory
   >  链接找不到libopencv_videoio.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../thirdparty/opencv/lib 解决(实际冒号后面添加的路径以libopencv_videoio.so所在路径为准)
   > 示例三：GLIBCXX_X.X.X  not found
   > 链接无法找到glibc版本，请确保系统gcc版本>=SDK的gcc版本。升级gcc/glibc可以百度搜索相关文献。
 2. 使用libcurl请求http服务时，速度明显变慢
    这是因为libcurl请求continue导致server等待数据的问题，添加空的header即可
   ```bash
   headers = curl_slist_append(headers, "Expect:");
   ```
 3. 运行二进制时，提示 libverify.so cannot open shared object file
    可能cmake没有正确设置rpath, 可以设置LD_LIBRARY_PATH为sdk的lib文件夹后，再运行：
   ```bash
   LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./easyedge_demo
   ```
 4. 编译时报错：file format not recognized
    可能是因为在复制SDK时文件信息丢失。请将整个压缩包复制到目标设备中，再解压缩、编译。
--- a/docs/arm_cpu/arm_linux_python_sdk_inference.md
+++ b/docs/arm_cpu/arm_linux_python_sdk_inference.md
@@ -1,371 +0,0 @@
 # 简介
 本文档以[千分类模型_MobileNetV3](https://ai.baidu.com/easyedge/app/openSource)为例，介绍FastDeploy中的模型SDK， 在**ARM Linux Python** 环境下:（1)图像推理部署步骤； （2）介绍模型推流全流程API，方便开发者了解项目后二次开发。其中ARM Linux C++请参考[ARM Linux C++环境下的推理部署](./arm_linux_cpp_sdk_inference.md)文档。
 **注意**：部分模型（如Tinypose、OCR等）仅支持图像推理，不支持视频推理。
 <!--ts-->
 * [简介](#简介)
 * [环境准备](#环境准备)
  * [1.SDK下载](#1sdk下载)
  * [2.硬件支持](#2硬件支持)
  * [3.python环境](#3python环境)
  * [4.安装依赖](#4安装依赖)
    * [4.1.安装paddlepaddle](#41安装paddlepaddle)
    * [4.2.安装EasyEdge Python Wheel 包](#42安装easyedge-python-wheel-包)
 * [快速开始](#快速开始)
  * [1.文件结构说明](#1文件结构说明)
  * [2.测试Demo](#2测试demo)
    * [2.1预测图像](#21预测图像)
 * [Demo API介绍](#demo-api介绍)
  * [1.基础流程](#1基础流程)
  * [2.初始化](#2初始化)
  * [3.SDK参数配置](#3sdk参数配置)
  * [4.预测图像](#4预测图像)
 * [FAQ](#faq)
  <!--te-->
 # 环境准备
 ## 1.SDK下载
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GitHub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。
 ```shell
 EasyEdge-Linux-x86--[部署芯片]
 ├──...
 ├──python          # Linux Python SDK
    ├──            # 特定Python版本的EasyEdge Wheel包, 二次开发可使用
    ├── BaiduAI_EasyEdge_SDK-1.3.1-cp36-cp36m-linux_aarch64.whl
    ├── infer_demo           # demo体验完整文件
    │   ├──  demo_xxx.py     # 包含前后处理的端到端推理demo文件
    │   └──  demo_serving.py # 提供http服务的demo文件
    ├── tensor_demo          # 学习自定义算法前后处理时使用
    │   └──  demo_xxx.py
 ```
 ## 2.硬件支持
 目前支持的ARM架构：aarch64 、armv7hf
 ## 3.python环境
 > ARM Linux SDK仅支持Python 3.6
 使用如下命令获取已安装的Python版本号。如果本机的版本不匹配，建议使用[pyenv](https://github.com/pyenv/pyenv)、[anaconda](https://www.anaconda.com/)等Python版本管理工具对SDK所在目录进行配置。
 ```shell
 $python3 --version
 ```
 接着使用如下命令确认pip的版本是否满足要求，要求pip版本为20.2.2或更高版本。详细的pip安装过程可以参考[官网教程](https://pip.pypa.io/en/stable/installation/)。
 ```shell
 $python3 -m pip --version
 ```
 ## 4.安装依赖
 ### 4.1.安装paddlepaddle
 根据具体的部署芯片（CPU/GPU）安装对应的PaddlePaddle的whl包。
 `armv8 CPU平台`可以使用如下命令进行安装：
 ```shell
 python3 -m pip install http://aipe-easyedge-public.bj.bcebos.com/easydeploy/paddlelite-2.11-cp36-cp36m-linux_aarch64.whl 
 ```
 ### 4.2.安装EasyEdge Python Wheel 包
 在`python`目录下，安装特定Python版本的EasyEdge Wheel包。`armv8 CPU平台`可以使用如下命令进行安装：
 ```shell
 python3 -m pip install -U BaiduAI_EasyEdge_SDK-1.3.1-cp36-cp36m-linux_aarch64.whl
 ```
 # 快速开始
 ## 1.文件结构说明
 Python SDK文件结构如下：
 ```shell
 .EasyEdge-Linux-x86--[部署芯片]
 ├── RES                 # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
 │   ├── conf.json       # Android、iOS系统APP名字需要
 │   ├── label_list.txt  # 模型标签文件
 │   ├── model           # 模型结构文件
 │   ├── params          # 模型参数文件
 │   └── infer_cfg.json  # 模型前后处理等配置文件
 ├── ReadMe.txt
 ├── cpp                 # C++ SDK 文件结构
 └── python              # Python SDK 文件
    ├── BaiduAI_EasyEdge_SDK-1.3.1-cp36-cp36m-linux_aarch64.whl #EasyEdge Python Wheel 包
    ├── infer_demo
        ├── demo_armv8_cpu.py # 图像推理
    ├── demo_serving.py       # HTTP服务化推理
    └── tensor_demo           # 学习自定义算法前后处理时使用
        ├── demo_armv8_cpu.py
 ```
 ## 2.测试Demo
 > 模型资源文件默认已经打包在开发者下载的SDK包中， 默认为`RES`目录。
 ### 2.1预测图像
 使用infer_demo文件夹下的demo文件。
 ```bash
 python3 demo_x86_cpu.py {模型RES文件夹}  {测试图片路径}
 ```
 运行效果示例：
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854068-28d27c0a-ef83-43ee-9e89-b65eed99b476.jpg" width="300"></div>
 ```shell
 2022-06-14 14:40:16 INFO [EasyEdge] [demo_nvidia_gpu.py:38] 140518522509120: Init paddlefluid engine...
 2022-06-14 14:40:20 INFO [EasyEdge] [demo_nvidia_gpu.py:38] 140518522509120: Paddle version: 2.2.2
 {'confidence': 0.9012349843978882, 'index': 8, 'label': 'n01514859 hen'}
 ```
 可以看到，运行结果为`index：8，label：hen`，通过imagenet [类别映射表](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)，可以找到对应的类别，即 'hen'，由此说明我们的预测结果正确。
 # Demo API介绍
 本章节主要结合[测试Demo](#2测试Demo)的Demo示例介绍推理API，方便开发者学习后二次开发。
 ## 1.基础流程
 > ❗注意，请优先参考SDK中自带demo的使用流程和说明。遇到错误，请优先参考文件中的注释、解释、日志说明。
 `infer_demo/demo_xx_xx.py`
 ```python
 # 引入EasyEdge运行库
 import BaiduAI.EasyEdge as edge
 # 创建并初始化一个预测Progam；选择合适的引擎
 pred = edge.Program()
 pred.init(model_dir={RES文件夹路径}, device=edge.Device.CPU, engine=edge.Engine.PADDLE_FLUID) # x86_64 CPU
 # pred.init(model_dir=_model_dir, device=edge.Device.GPU, engine=edge.Engine.PADDLE_FLUID) # x86_64 Nvidia GPU
 # pred.init(model_dir=_model_dir, device=edge.Device.CPU, engine=edge.Engine.PADDLE_LITE) # armv8 CPU
 # 预测图像
 res = pred.infer_image({numpy.ndarray的图片})
 # 关闭结束预测Progam
 pred.close()
 ```
 `infer_demo/demo_serving.py`
 ```python
 import BaiduAI.EasyEdge as edge
 from BaiduAI.EasyEdge.serving import Serving
 # 创建并初始化Http服务
 server = Serving(model_dir={RES文件夹路径}, license=serial_key)
 # 运行Http服务
 # 请参考同级目录下demo_xx_xx.py里:
 # pred.init(model_dir=xx, device=xx, engine=xx, device_id=xx)
 # 对以下参数device\device_id和engine进行修改
 server.run(host=host, port=port, device=edge.Device.CPU, engine=edge.Engine.PADDLE_FLUID) # x86_64 CPU
 # server.run(host=host, port=port, device=edge.Device.GPU, engine=edge.Engine.PADDLE_FLUID) # x86_64 Nvidia GPU
 # server.run(host=host, port=port, device=edge.Device.CPU, engine=edge.Engine.PADDLE_LITE) # armv8 CPU
 ```
 ## 2.初始化
 * 接口
  ```python
   def init(self,
         model_dir,
         device=Device.CPU,
         engine=Engine.PADDLE_FLUID,
         config_file='conf.json',
         preprocess_file='preprocess_args.json',
         model_file='model',
         params_file='params',
         label_file='label_list.txt',
         infer_cfg_file='infer_cfg.json',
         device_id=0,
         thread_num=1
         ):
      """
      Args:
          model_dir: str
          device: BaiduAI.EasyEdge.Device，比如：Device.CPU
          engine: BaiduAI.EasyEdge.Engine， 比如： Engine.PADDLE_FLUID
          config_file: str
          preprocess_file: str
          model_file: str
          params_file: str
          label_file: str 标签文件
          infer_cfg_file: 包含预处理、后处理信息的文件
       device_id: int 设备ID
          thread_num: int CPU的线程数
      Raises:
          RuntimeError, IOError
      Returns:
          bool: True if success
      """
  ```
 若返回不是True，请查看输出日志排查错误原因。
 ## 3.SDK参数配置
 使用 CPU 预测时，可以通过在 init 中设置 thread_num 使用多线程预测。如：
 ```python
 pred.init(model_dir=_model_dir, device=edge.Device.CPU, engine=edge.Engine.PADDLE_FLUID, thread_num=4)
 ```
 使用 GPU 预测时，可以通过在 init 中设置 device_id 指定需要的GPU device id。如：
 ```python
 pred.init(model_dir=_model_dir, device=edge.Device.GPU, engine=edge.Engine.PADDLE_FLUID, device_id=0)
 ```
 ## 4.预测图像
 * 接口
  ```python
   def infer_image(self, img,
                  threshold=0.3,
                  channel_order='HWC',
                  color_format='BGR',
                  data_type='numpy'):
      """
      Args:
          img: np.ndarray or bytes
          threshold: float
              only return result with confidence larger than threshold
          channel_order: string
              channel order HWC or CHW
          color_format: string
              color format order RGB or BGR
          data_type: string
              仅在图像分割时有意义。 'numpy' or 'string'
              'numpy': 返回已解析的mask
              'string': 返回未解析的mask游程编码
      Returns:
          list
      """
  ```
 * 返回格式: `[dict1, dict2, ...]`
 | 字段         | 类型                   | 取值        | 说明                       |
 | ---------- | -------------------- | --------- | ------------------------ |
 | confidence | float                | 0~1       | 分类或检测的置信度                |
 | label      | string               |           | 分类或检测的类别                 |
 | index      | number               |           | 分类或检测的类别                 |
 | x1, y1     | float                | 0~1       | 物体检测，矩形的左上角坐标 (相对长宽的比例值) |
 | x2, y2     | float                | 0~1       | 物体检测，矩形的右下角坐标（相对长宽的比例值）  |
 | mask       | string/numpy.ndarray | 图像分割的mask |                          |
 ***关于矩形坐标***
 x1 * 图片宽度 = 检测框的左上角的横坐标
 y1 * 图片高度 = 检测框的左上角的纵坐标
 x2 * 图片宽度 = 检测框的右下角的横坐标
 y2 * 图片高度 = 检测框的右下角的纵坐标
 可以参考 demo 文件中使用 opencv 绘制矩形的逻辑。
 ***结果示例***
 i) 图像分类
 ```json
 {
    "index": 736,
    "label": "table",
    "confidence": 0.9
 }
 ```
 ii) 物体检测
 ```json
 {
    "index": 8,
    "label": "cat",
    "confidence": 1.0,
    "x1": 0.21289,
    "y1": 0.12671,
    "x2": 0.91504,
    "y2": 0.91211,
 }
 ```
 iii) 图像分割
 ```json
 {
    "name": "cat",
    "score": 1.0,
    "location": {
        "left": ..., 
        "top": ..., 
        "width": ...,
        "height": ...,
    },
    "mask": ...
 }
 ```
 mask字段中，data_type为`numpy`时，返回图像掩码的二维数组
 ```
 {
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 }
 其中1代表为目标区域，0代表非目标区域
 ```
 data_type为`string`时，mask的游程编码，解析方式可参考 [demo](https://github.com/Baidu-AIP/EasyDL-Segmentation-Demo)
 # FAQ
 1.执行infer_demo文件时，提示your generated code is out of date and must be regenerated with protoc >= 3.19.0
    进入当前项目，首先卸载protobuf
    ```shell
    python3 -m pip uninstall protobuf
    ```
    安装低版本protobuf
    ```shell
    python3 -m pip install protobuf==3.19.0
    ```
--- a/docs/arm_cpu/arm_linux_python_sdk_serving.md
+++ b/docs/arm_cpu/arm_linux_python_sdk_serving.md
@@ -1,266 +0,0 @@
 # 简介
 本文档以[千分类模型_MobileNetV3](https://ai.baidu.com/easyedge/app/openSource)为例，介绍FastDeploy中的模型SDK， 在**ARM Linux Python** 环境下： （1)**服务化**推理部署步骤； （2）介绍模型推流全流程API，方便开发者了解项目后二次开发。其中ARM Linux Python请参考[ARM Linux C++环境下的HTTP推理部署](./arm_linux_cpp_sdk_serving.md)文档。
 **注意**：部分模型（如OCR等）不支持服务化推理。
 <!--ts-->
 * [简介](#简介)
 * [环境准备](#环境准备)
  * [1.SDK下载](#1sdk下载)
  * [2.硬件支持](#2硬件支持)
  * [3.Python环境](#3python环境)
  * [4.安装依赖](#4安装依赖)
    * [4.1.安装paddlepaddle](#41安装paddlepaddle)
    * [4.2.安装EasyEdge Python Wheel 包](#42安装easyedge-python-wheel-包)
 * [快速开始](#快速开始)
  * [1.文件结构说明](#1文件结构说明)
  * [2.测试Serving服务](#2测试serving服务)
    * [2.1 启动HTTP预测服务](#21-启动http预测服务)
 * [HTTP API流程详解](#http-api流程详解)
  * [1. 开启http服务](#1-开启http服务)
  * [2. 请求http服务](#2-请求http服务)
    * [2.1 http 请求方式：不使用图片base64格式](#21-http-请求方式不使用图片base64格式)
  * [3. http返回数据](#3-http返回数据)
 * [FAQ](#faq)
  <!--te-->
 # 环境准备
 ## 1.SDK下载
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GitHub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。解压缩后的文件结构如下。
 ```shell
 EasyEdge-Linux-x86-[部署芯片]
 ├── RES      # 模型文件资源文件夹，可替换为其他模型
 ├── README.md
 ├── cpp     # C++ SDK
 └── python  # Python SDK
 ```
 ## 2.硬件支持
 目前支持的ARM架构：aarch64 、armv7hf
 ## 3.Python环境
 > ARM Linux SDK仅支持Python 3.6
 使用如下命令获取已安装的Python版本号。如果本机的版本不匹配，需要根据ARM Linux下Python安装方式进行安装。（不建议在ARM Linux下使用conda，因为ARM Linux场景通常资源很有限）
 ```shell
 $python3 --version
 ```
 接着使用如下命令确认pip的版本是否满足要求，要求pip版本为20.2.2或更高版本。详细的pip安装过程可以参考[官网教程](https://pip.pypa.io/en/stable/installation/)。
 ```shell
 $python3 -m pip --version
 ```
 ## 4.安装依赖
 ### 4.1.安装paddlepaddle
 根据具体的部署芯片（CPU/GPU）安装对应的PaddlePaddle的whl包。
 `armv8 CPU平台`可以使用如下命令进行安装：
 ```shell
 python3 -m pip install http://aipe-easyedge-public.bj.bcebos.com/easydeploy/paddlelite-2.11-cp36-cp36m-linux_aarch64.whl 
 ```
 ### 4.2.安装EasyEdge Python Wheel 包
 在`python`目录下，安装特定Python版本的EasyEdge Wheel包。`armv8 CPU平台`可以使用如下命令进行安装：
 ```shell
 python3 -m pip install -U BaiduAI_EasyEdge_SDK-1.3.1-cp36-cp36m-linux_aarch64.whl
 ```
 # 二.快速开始
 ## 1.文件结构说明
 Python SDK文件结构如下：
 ```shell
 EasyEdge-Linux-x86--[部署芯片]
 ├──...
 ├──python            # Linux Python SDK
    ├──              # 特定Python版本的EasyEdge Wheel包, 二次开发可使用
    ├── BBaiduAI_EasyEdge_SDK-1.3.1-cp36-cp36m-linux_aarch64.whl
    ├── infer_demo   # demo体验完整文件
    │   ├──  demo_xxx.py     # 包含前后处理的端到端推理demo文件
    │   └──  demo_serving.py # 提供http服务的demo文件
    ├── tensor_demo  # 学习自定义算法前后处理时使用
    │   └──  demo_xxx.py
 ```
 ## 2.测试Serving服务
 > 模型资源文件默认已经打包在开发者下载的SDK包中， 默认为`RES`目录。
 ### 2.1 启动HTTP预测服务
 指定对应的模型文件夹（默认为`RES`）、设备ip和指定端口号，运行如下命令。
 ```shell
 python3 demo_serving.py {模型RES文件夹} {host, default 0.0.0.0} {port, default 24401}
 ```
 成功启动后，终端中会显示如下字样。
 ```shell
 ...
 * Running on {host ip}:24401
 ```
 如果是在局域网内的机器上部署，开发者此时可以打开浏览器，输入`http://{host ip}:24401`，选择图片来进行测试，运行效果如下。
 <img src="https://user-images.githubusercontent.com/54695910/175854073-fb8189e5-0ffb-472c-a17d-0f35aa6a8418.png" style="zoom:50%;" />
 如果是在远程机器上部署，那么可以参考`demo_serving.py`中的 `http_client_test()函数`请求http服务来执行推理。
 # 三. HTTP API流程详解
 ## 1. 开启http服务
 http服务的启动使用`demo_serving.py`文件
 ```python
 class Serving(object):
    """
        SDK local serving
    """
    def __init__(self, model_dir, license='', model_filename='model', params_filename='params'):
        self.program = None
        self.model_dir = model_dir
        self.model_filename = model_filename
        self.params_filename = params_filename
        self.program_lock = threading.Lock()
        self.license_key = license
        # 只有ObjectTracking会初始化video_processor
        self.video_processor = None
     def run(self, host, port, device, engine=Engine.PADDLE_FLUID, service_id=0, device_id=0, **kwargs):
      """
          Args:
              host : str
              port : str
              device : BaiduAI.EasyEdge.Device，比如：Device.CPU
              engine : BaiduAI.EasyEdge.Engine， 比如： Engine.PADDLE_FLUID
      """
        self.run_serving_with_flask(host, port, device, engine, service_id, device_id, **kwargs)
 ```
 ## 2. 请求http服务
 > 开发者可以打开浏览器，`http://{设备ip}:24401`，选择图片来进行测试。
 ### 2.1 http 请求方式：不使用图片base64格式
 URL中的get参数：
 | 参数        | 说明        | 默认值              |
 | --------- | --------- | ---------------- |
 | threshold | 阈值过滤， 0~1 | 如不提供，则会使用模型的推荐阈值 |
 HTTP POST Body即为图片的二进制内容
 Python请求示例
 ```python
 import requests
 with open('./1.jpg', 'rb') as f:
    img = f.read()
    result = requests.post(
        'http://127.0.0.1:24401/',
        params={'threshold': 0.1},
        data=img).json()
 ```
 ## 3. http返回数据
 | 字段         | 类型说明   | 其他                                   |
 | ---------- | ------ | ------------------------------------ |
 | error_code | Number | 0为成功,非0参考message获得具体错误信息             |
 | results    | Array  | 内容为具体的识别结果。其中字段的具体含义请参考`预测图像-返回格式`一节 |
 | cost_ms    | Number | 预测耗时ms，不含网络交互时间                      |
 返回示例
 ```json
 {
    "cost_ms": 52,
    "error_code": 0,
    "results": [
        {
            "confidence": 0.94482421875,
            "index": 1,
            "label": "IronMan",
            "x1": 0.059185408055782318,
            "x2": 0.18795496225357056,
            "y1": 0.14762254059314728,
            "y2": 0.52510076761245728,
            "mask": "...",  // 图像分割模型字段
            "trackId": 0,  // 目标追踪模型字段
        },
      ]
 }
 ```
 ***关于矩形坐标*** 
 x1 * 图片宽度 = 检测框的左上角的横坐标
 y1 * 图片高度 = 检测框的左上角的纵坐标
 x2 * 图片宽度 = 检测框的右下角的横坐标
 y2 * 图片高度 = 检测框的右下角的纵坐标
 *** 关于图像分割mask ***
 ```
 cv::Mat mask为图像掩码的二维数组
 {
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 }
 其中1代表为目标区域，0代表非目标区域
 ```
 # FAQ
 1.执行infer_demo文件时，提示your generated code is out of date and must be regenerated with protoc >= 3.19.0
    进入当前项目，首先卸载protobuf
    ```shell
    python3 -m pip uninstall protobuf
    ```
    安装低版本protobuf
    ```shell
    python3 -m pip install protobuf==3.19.0
    ```
--- a/docs/arm_cpu/ios_sdk.md
+++ b/docs/arm_cpu/ios_sdk.md
@@ -1,212 +0,0 @@
 # 简介
 本文档介绍FastDeploy中的模型SDK，在iOS环境下：（1）推理部署步骤；（2）介绍SDK使用说明，方便开发者了解项目后二次开发。
 <!--ts-->
 * [简介](#简介)
 * [系统支持说明](#系统支持说明)
  * [1. 系统支持说明](#1-系统支持说明)
  * [2. SDK大小说明](#2-sdk大小说明)
 * [快速开始](#快速开始)
  * [1. 项目结构说明](#1-项目结构说明)
  * [2. 测试Demo](#2-测试demo)
 * [SDK使用说明](#sdk使用说明)
  * [1. 集成指南](#1-集成指南)
    * [1.1 依赖库集成](#11-依赖库集成)
  * [2. 调用流程示例](#2-调用流程示例)
    * [2.1 初始化](#21-初始化)
  * [2.2 预测图像](#22-预测图像)
 * [FAQ](#faq)
  <!--te-->
 # 系统支持说明
 ## 1. 系统支持说明
 1. 系统支持：iOS 9.0及以上。
 2. 硬件支持：支持 arm64 (Starndard architectures)，暂不支持模拟器。
   * 官方验证过的手机机型：大部分ARM 架构的手机、平板及开发板。
 3.其他说明
    * 3.1 【图像分割类模型】（1）图像分割类Demo暂未提供实时摄像头录制拍摄的能力，开发者可根据自己需要，进行安卓开发完成；（2）PP-Humanseg-Lite模型设计初衷为横屏视频会议等场景，本次安卓开发仅支持述评场景，开发者可根据自己需要，开发横屏的Android功能。<br>
    * 3.2 【OCR模型】OCR任务第一次启动任务，第一张推理时间久，属于正常情况（因为涉及到模型加载、预处理等工作）。<br>
 ## 2. SDK大小说明
 1. 模型资源文件大小影响 SDK 大小
 2. SDK 包及 IPA 安装包虽然比较大，但最终安装到设备后所占大小会缩小很多。这与 multi architechtures、bitcode 和 AppStore 的优化有关。
 # 快速开始
 ## 1. 项目结构说明
 根据开发者模型、部署芯片、操作系统需要，在图像界面[飞桨开源模型](https://ai.baidu.com/easyedge/app/openSource)或[GitHub](https://github.com/PaddlePaddle/FastDeploy)中选择对应的SDK进行下载。SDK目录结构如下：
 ```
 .EasyEdge-iOS-SDK
 ├── EasyDLDemo    # Demo工程文件
 ├── LIB            # 依赖库
 ├── RES
 │   ├── easyedge         # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
 │   ├── conf.json        # Android、iOS系统APP名字需要
 │   ├── model            # 模型结构文件 
 │   ├── params           # 模型参数文件
 │   ├── label_list.txt   # 模型标签文件
 │   ├── infer_cfg.json   # 模型前后处理等配置文件
 └── DOC            # 文档
 ```
 ## 2. 测试Demo
 按如下步骤可直接运行 SDK 体验 Demo：  
 步骤一：用 Xcode 打开 `EasyDLDemo/EasyDLDemo.xcodeproj`  
 步骤二：配置开发者自己的签名（不了解签名机制的，可以看FAQ [iOS签名介绍](#100)）</br>
 步骤三：连接手机运行，不支持模拟器  
 检测模型运行示例：
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854078-4f1f761d-0629-411a-92cc-6f4180164ca5.png" width="400"></div>
 # SDK使用说明
 本节介绍如何将 SDK 接入开发者的项目中使用。
 ## 1. 集成指南
 步骤一：依赖库集成
 步骤二：`import <EasyDL/EasyDL.h>`
 ### 1.1 依赖库集成
 1. 复制 LIB 目录至项目合适的位置
 2. 配置 Build Settings 中 Search paths: 以 SDK 中 LIB 目录路径为例
 - Framework Search Paths：`${PROJECT_DIR}/../LIB/lib`
 - Header Search Paths：`${PROJECT_DIR}/../LIB/include`
 - Library Search Paths：`${PROJECT_DIR}/../LIB/lib`
 > 集成过程如出现错误，请参考 Demo 工程对依赖库的引用
 ## 2. 调用流程示例
 以通用ARM的图像分类预测流程为例，详细说明请参考后续章节：
 ```
 NSError *err;
 // step 1: 初始化模型
 EasyDLModel *model = [[EasyDLModel alloc] initModelFromResourceDirectory:@"easyedge" withError:&err];
 // step 2: 准备待预测的图像
 UIImage *image = ...;
 // step 3: 预测图像
 NSArray *results = [model detectUIImage:image withFilterScore:0 andError:&err];
 // step 4: 解析结果
 for (id res in results) {
    EasyDLClassfiData *clsData = (EasyDLClassfiData *) res;
    NSLog(@"labelIndex=%d, labelName=%@, confidence=%f", clsData.category, clsData.label, clsData.accuracy);
 }
 ```
 ### 2.1 初始化
 ```
 // 示例
 // 参数一为模型资源文件夹名称
 EasyDLModel *model = [[EasyDLModel alloc] initModelFromResourceDirectory:@"easyedge" withError:&err];
 ```
 > 模型资源文件夹需以 folder reference 方式加入 Xcode 工程，如 `RES/easyedge` 文件夹在 Demo 工程中表现为蓝色
 ### 2.2 预测图像
 所有模型类型通过以下接口获取预测结果：
 ```
 // 返回的数组类型不定
 NSArray *results = [model detectUIImage:image withFilterScore:0 andError:&err];
 ```
 返回的数组类型如下，具体可参考 `EasyDLResultData.h` 中的定义：
 | 模型类型 | 类型 |
 | --- | ---- |
 | 图像分类   | EasyDLClassfiData |
 | 物体检测/人脸检测   | EasyDLObjectDetectionData |
 | 实例分割   | EasyDLObjSegmentationData |
 | 姿态估计   | EasyDLPoseData |
 | 文字识别   | EasyDLOcrData |
 # FAQ
 1. 如何多线程并发预测？
 SDK内部已经能充分利用多核的计算能力。不建议使用并发来预测。
 如果开发者想并发使用，请务必注意`EasyDLModel`所有的方法都不是线程安全的。请初始化多个实例进行并发使用，如
 ```c
 - (void)testMultiThread {
    UIImage *img = [UIImage imageNamed:@"1.jpeg"];
    NSError *err;
    EasyDLModel * model1 = [[EasyDLModel alloc] initModelFromResourceDirectory:@"easyedge" withError:&err];
    EasyDLModel * model2 = [[EasyDLModel alloc] initModelFromResourceDirectory:@"easyedge" withError:&err];
    dispatch_queue_t queue1 = dispatch_queue_create("testQueue", DISPATCH_QUEUE_CONCURRENT);
    dispatch_queue_t queue2 = dispatch_queue_create("testQueue2", DISPATCH_QUEUE_CONCURRENT);
    dispatch_async(queue1, ^{
        NSError *detectErr;
        for(int i = 0; i < 1000; ++i) {
            NSArray * res = [model1 detectUIImage:img withFilterScore:0 andError:&detectErr];
            NSLog(@"1: %@", res[0]);
        }
    });
    dispatch_async(queue2, ^{
        NSError *detectErr;
        for(int i = 0; i < 1000; ++i) {
            NSArray * res = [model2 detectUIImage:img withFilterScore:0 andError:&detectErr];
            NSLog(@"2: %@", res[0]);
        }
    });
 }
 ```
 2. 编译时出现 Undefined symbols for architecture arm64: ...
 * 出现 `cxx11, vtable` 字样：请引入 `libc++.tbd`
 * 出现 `cv::Mat` 字样：请引入 `opencv2.framework`
 * 出现 `CoreML`, `VNRequest` 字样：请引入`CoreML.framework` 并务必`#import <CoreML/CoreML.h> ` 
 3. 运行时报错 Image not found: xxx ...
 请Embed具体报错的库。
 4. 编译时报错：Invalid bitcode version
 这个可能是开发者使用的 Xcode 低于12导致，可以升级至12版本。
 5. 错误说明
 SDK 的方法会返回 NSError，直接返回的 NSError 的错误码定义在 `EasyDLDefine.h - EEasyDLErrorCode` 中。NSError 附带 message （有时候会附带 NSUnderlyingError），开发者可根据 code 和 message 进行错误判断和处理。
 6. iOS签名说明
 iOS 签名是苹果生态对 APP 开发者做的限定，对于个人开发者是免费的，对于企业开发者（譬如APP要上架应用市场），是收费的。此处，仅简单说明作为普通开发者，第一次尝试使用 Xcode编译代码，需要进行的签名操作。<br>
 （1）在Xcode/Preferences/Accounts 中添加个人Apple ID;<br>
 （2）在对应的EasyDLDemo中做如下图设置：<br>
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175854089-aa1d1af8-7daa-43ae-868d-32041c27ad86.jpg" width="600"></div>
 （3）（2）后会在手机上安装好对应APP，还需要在手机上`设置/通用/设备管理/开发者应用/信任appleID`，才能运行该 APP。
--- a/docs/arm_cpu/replace_model_with_another_one.md
+++ b/docs/arm_cpu/replace_model_with_another_one.md
@@ -1,266 +0,0 @@
  <a name="0"></a>
 # 简介
 本文档介绍如何将FastDeploy的Demo模型，替换成开发者自己训练的AI模型。（**注意**：FastDeploy下载的SDK和Demo仅支持相同算法模型的替换）。本文档要求开发者已经将Demo和SDK运行跑通，如果要了解运行跑通Demo和SDK指导文档，可以参考[SDK使用文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/README.md#sdk使用)
 * [简介](#0)<br>
 * [模型替换](#1)<br>
  * [1.模型准备](#2)<br>
    * [1.1 Paddle模型](#3)<br>
    * [1.2 Paddle OCR模型增加一步特殊转换](#4)<br>
      * [1.2.1 下载模型转换工具](#5)<br>
      * [1.2.2 下载模型转换工具](#6)<br>
    * [1.3 其他框架模型](#7)<br>
  * [2.模型名修改和label文件准备](#8)<br>
    * [2.1 非OCR模型名修改](#9)<br>
    * [2.2 OCR模型名修改](#10)<br>
    * [2.3 模型label文件](#11)<br>
  * [3.修改配置文件](#12)<br>
 * [测试效果](#13)<br>
 * [完整配置文件说明](#14)<br>
  * [1.配置文件字段含义](#15)<br>
  * [2.预处理顺序](#16)<br>
 * [FAQ](#17)<br>
 **注意事项：** 
 1. PP-PicoDet模型： 在FastDeploy中，支持PP-Picodet模型，是将后处理写到网络里面的方式（即后处理+NMS都在网络结构里面）。Paddle Detection导出静态模型时，有3种方法，选择将后处理和NMS导入到网络里面即可（参考[导出部分](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/picodet#%E5%AF%BC%E5%87%BA%E5%8F%8A%E8%BD%AC%E6%8D%A2%E6%A8%A1%E5%9E%8B)）。详细网络区别，可以通过netron工具对比。
 2. PP-Picodet模型：在FastDeploy中，支持PP-Picodet模型，是将前处理写在网络外面的方式。Paddle Detection中的TinyPose算法中，会将PP-PicoDet模型的前处理写入网络中。如果要使用FastDeploy的SDK进行模型替换，需要将前处理写到网络外面。（参考[Detection中的导出命令](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint/tiny_pose#%E5%B0%86%E8%AE%AD%E7%BB%83%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%AE%9E%E7%8E%B0%E7%AB%AF%E4%BE%A7%E9%83%A8%E7%BD%B2)，将TestReader.fuse_normalize=False即可）。
   <a name="1"></a>
 # 模型替换
 开发者从PaddleDetection、PaddleClas、PaddleOCR、PaddleSeg等飞桨开发套件导出来的对应模型，完成 [1.模型准备](#)、[1.模型名修改和模型label](#)、[3.修改配置文件](#) 3步操作（需要相同算法才可替换），可完成自定义模型的模型文件，运行时指定新的模型文件，即可在自己训练的模型上实现相应的预测推理任务。
 * Linux下模型资源文件夹路径：`EasyEdge-Linux-**/RES/` 。
 * Windows下模型资源文件夹路径：`EasyEdge-Windows-**/data/model/`。
 * Android下模型资源文件夹路径：`EasyEdge-Android-**/app/src/assets/infer/` 和 ` app/src/assets/demo/conf.json` 
 * iOS下模型资源文件夹路径：`EasyEdge-iOS-**/RES/easyedge/` 
 主要涉及到下面4个模型相关的文件（mode、params、label_list.txt、infer_cfg.json）和一个APP名相关的配置文件（仅Android、iOS、HTTP需要，APP名字，非必需。）
 * ```
  ├── RES、model、infer  # 模型资源文件夹，一套模型适配不同硬件、OS和部署方式
  │   ├── conf.json        # Android、iOS系统APP名字需要
  │   ├── model            # 模型结构文件 
  │   ├── params           # 模型参数文件
  │   ├── label_list.txt   # 模型标签文件
  │   ├── infer_cfg.json   # 模型前后处理等配置文件
  ```
  > ❗注意：OCR模型在ARM CPU硬件上（包括Android、Linux、iOS 三款操作系统），因为任务的特殊性，替换在 [1.模型准备](#)、[1.模型名修改和模型label](#) 不同于其他任务模型，详细参考下面步骤。
  <a name="2"></a>
 ## 1.模型准备
 <a name="3"></a>
 ### 1.1 Paddle模型
 * 通过PaddleDetection、PaddleClas、PaddleOCR、PaddleSeg等导出来飞桨模型文件，包括如下文件（可能存在导出时修改了名字的情况，后缀`.pdmodel`为模型网络结构文件，后缀`.pdiparams`为模型权重文件）：
 ```
 model.pdmodel       # 模型网络结构
 model.pdiparams   # 模型权重
 model.yml           # 模型的配置文件（包括预处理参数、模型定义等）
 ```
 <a name="4"></a>
 ### 1.2 OCR模型特殊转换（仅在ARM CPU上需要）
 因为推理引擎版本的问题，OCR模型需要在[1.1 Paddle模型](#3)导出`.pdmodel`和`.pdiparams`模型后，多增加一步模型转换的特殊处理，主要执行下面2步：
 <a name="5"></a>
 #### 1.2.1 下载模型转换工具
 Linux 模型转换工具下载链接：[opt_linux](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.11/opt_linux)</br>
 M1 模型转换工具下载链接：[opt_m1](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.11/opt_m1)</br>
 mac 模型转换工具下载链接：[opt_mac](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.11/opt_mac)</br>
 <a name="6"></a>
 #### 1.2.2 模型转换
 以下命令，以mac为例，完成模型转换。
 ```
 * 转换 OCR 检测模型命名：
 ./opt_mac --model_dir=./ch_PP-OCRv3_det_infer/ --valid_targets=arm --optimize_out_type=naive_buffer --optimize_out=./ocr_det
 * 转换 OCR 识别模型命名：
 ./opt_mac --model_dir=./ch_PP-OCRv3_rec_infer/ --valid_targets=arm --optimize_out_type=naive_buffer --optimize_out=./ocr_rec
 ```
 产出：
 <div align=center><img src="https://user-images.githubusercontent.com/54695910/175856746-501b05ad-6fba-482e-8e72-fdd68fe52101.png" width="400"></div>
 <a name="7"></a>
 ### 1.3 其他框架模型
 * 如果开发着是PyTorch、TensorFLow、Caffe、ONNX等其他框架模型，可以参考[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)官网完成模型转换，即可得到对应的`model.pdmodel`和`model.pdiparams`模型文件。
 <a name="8"></a>
 ## 2.模型名修改和label文件准备
 <a name="9"></a>
 ### 2.1 非OCR模型名修改
 按照下面的规则，修改套件导出来的模型名和标签文件，并替换到模型资源文件中。
 ```
 1. model.pdmodel 修改成  model
 2. model.pdiparams 修改成 params
 ```
 <a name="10"></a>
 ### 2.2 OCR模型名修改
 ```
 1. ocr_det.nb 修改成  model  # 将 检测模型 修改名称成 model
 2. ocr_rec.nb 修改成 params  # 将 识别模型 修改名称成 model
 ```
 <a name="11"></a>
 ### 2.3 模型label文件
 同时需要准备模型文件对应的label文件`label_list.txt`。label文件可以参考原Demo中`label_list.txt`的格式准备。
 <a name="12"></a>
 ## 3. 修改模型相关配置文件
 （1）infer_cfg.json 文件修改
 所有程序开发者都需要关注该配置文件。开发者在自己数据/任务中训练模型，可能会修改输入图像尺寸、修改阈值等操作，因此需要根据训练情况修改`Res文件夹下的infer_cfg.json`文件中的对应。CV任务涉及到的配置文件修改包括如下字段：
 ```
 1. "best_threshold": 0.3,   #网络输出的阈值,根据开发者模型实际情况修改
 2. "resize": [512, 512],    #[w, h]网络输入图像尺寸,用户根据实际情况修改。
 ```
 （2）conf.json 文件修改
 仅Android、iOS、HTTP服务应用开发者，需要关注该配置文件。开发者根据自己应用程序命名需要，参考已有`conf.json`即可。
 通常，开发者修改FastDeploy项目中的模型，涉及到主要是这几个配置信息的修改。FastDeploy详细的配置文件介绍参考[完整配置文件说明](#8)。
 <a name="13"></a>
 # 测试效果
 将自定义准备的`RES`文件，按照第2、3步完成修改后，参考可以参考[SDK使用文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/README.md#sdk%E4%BD%BF%E7%94%A8)完成自己模型上的不同预测体验。
 <a name="14"></a>
 # 完整配置文件说明
 <a name="15"></a>
 ## 1. 配置文件字段含义
 模型资源文件`infer_cfg.json`涉及到大量不同算法的前后处理等信息，下表是相关的字段介绍，通常开发者如果没有修改算法前出处理，不需要关心这些字段。非标记【必须】的可不填。
 ```json
 {
    "version": 1,
    "model_info": { 
        "best_threshold": 0.3,   // 默认0.3
        "model_kind": 1, // 【必须】 1-分类，2-检测，6-实例分割，12-追踪，14-语义分割，401-人脸，402-姿态，10001-决策
    },
    "pre_process": { // 【必须】
        // 归一化， 预处理会把图像 (origin_img - mean) * scale 
        "skip_norm": false, // 默认为false, 如果设置为true，不做mean scale处理
        "mean": [123, 123, 123],  // 【必须，一般不需要动】图像均值，已经根据Paddle套件均值做了转换处理，开发者如果没有修改套件参数，可以不用关注。（X-mean）/ scale
        "scale": [0.017, 0.017, 0.017],  // 【必须，一般不需要动】
        "color_format": "RGB", // BGR 【必须，一般不需要动】
        "channel_order": "CHW", // HWC
        // 大小相关
        "resize": [300, 300],        // w, h 【必须】
        "rescale_mode": "keep_size", // 默认keep_size， keep_ratio, keep_ratio2, keep_raw_size, warp_affine
        "max_size": 1366, // keep_ratio 用。如果没有提供，则用 resize[0]
        "target_size": 800,  // keep_ratio 用。如果没有提供，则用 resize[1]
        "raw_size_range": [100, 10000], // keep_raw_size 用
        "warp_affine_keep_res": // warp_affine模式使用，默认为false
        "center_crop_size": [224, 224]， // w, h, 如果需要做center_crop，则提供，否则，无需提供该字段
        "padding": false,
        "padding_mode": "padding_align32",  // 【非必须】默认padding_align32, 其他可指定：padding_fill_size
        "padding_fill_size": [416, 416], // 【非必须】仅padding_fill_size模式下需要提供, [fill_size_w, fill_size_h], 这里padding fill对齐paddle detection实现，在bottom和right方向实现补齐
        "padding_fill_value": [114, 114, 114] // 【非必须】仅padding_fill_size模式下需要提供
        // 其他
        "letterbox": true,
     },
    "post_process": {
        "box_normed": true, // 默认为true, 如果为false 则表示该模型的box坐标输出不是归一化的
    }
 }
 ```
 <a name="16"></a>
 ## 2. 预处理顺序（没有的流程自动略过）
 1. 灰度图 -> rgb图变换 
 2. resize 尺寸变换 
 3. center_crop
 4. rgb/bgr变换
 5. padding_fill_size
 6. letterbox（画个厚边框，填上黑色）
 7. chw/hwc变换
 8. 归一化：mean, scale
 9. padding_align32
 rescale_mode说明：
 * keep_size: 将图片缩放到resize指定的大小
 * keep_ratio:将图片按比例缩放，长边不超过max_size，短边不超过target_size
 * keep_raw_size:保持原图尺寸，但必须在raw_size_range之间
 * warp_affine: 仿射变换，可以设置warp_affine_keep_res指定是否keep_res，在keep_res为false场景下，宽高通过resize字段指定
 <a name="17"></a>
 # FAQ
 ### 1. 如何处理一些 undefined reference / error while loading shared libraries?
 > 如：./easyedge_demo: error while loading shared libraries: libeasyedge.so.1: cannot open shared object file: No such file or directory
 遇到该问题时，请找到具体的库的位置，设置LD_LIBRARY_PATH；或者安装缺少的库。
 > 示例一：libverify.so.1: cannot open shared object file: No such file or directory
 > 链接找不到libveirfy.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../lib 解决(实际冒号后面添加的路径以libverify.so文件所在的路径为准)
 > 示例二：libopencv_videoio.so.4.5: cannot open shared object file: No such file or directory
 >  链接找不到libopencv_videoio.so文件，一般可通过 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../thirdparty/opencv/lib 解决(实际冒号后面添加的路径以libopencv_videoio.so所在路径为准)
 > 示例三：GLIBCXX_X.X.X  not found
 > 链接无法找到glibc版本，请确保系统gcc版本>=SDK的gcc版本。升级gcc/glibc可以百度搜索相关文献。
 ### 2. 使用libcurl请求http服务时，速度明显变慢
 这是因为libcurl请求continue导致server等待数据的问题，添加空的header即可
 ```bash
 headers = curl_slist_append(headers, "Expect:");
 ```
 ### 3. 运行二进制时，提示 libverify.so cannot open shared object file
 可能cmake没有正确设置rpath, 可以设置LD_LIBRARY_PATH为sdk的lib文件夹后，再运行：
 ```bash
 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./easyedge_demo
 ```
 ### 4. 编译时报错：file format not recognized
 可能是因为在复制SDK时文件信息丢失。请将整个压缩包复制到目标设备中，再解压缩、编译
--- a/.new_docs/cn/build_and_install/README.md
+++ b/.new_docs/cn/build_and_install/README.md
--- a/.new_docs/cn/build_and_install/android.md
+++ b/.new_docs/cn/build_and_install/android.md
--- a/.new_docs/cn/build_and_install/cpu.md
+++ b/.new_docs/cn/build_and_install/cpu.md
--- a/.new_docs/cn/build_and_install/download_prebuilt_libraries.md
+++ b/.new_docs/cn/build_and_install/download_prebuilt_libraries.md
@@ -15,7 +15,7 @@ FastDeploy提供各平台预编译库，供开发者直接下载安装使用。
 ### Python安装
-Release版本（当前最新0.2.1）安装
+Release版本（当前最新0.3.0）安装
 ```
 pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html 
 ```
@@ -27,12 +27,12 @@ conda config --add channels conda-forge && conda install cudatoolkit=11.2 cudnn=
 ### C++ SDK安装
-Release版本（当前最新0.2.1）
+Release版本（当前最新0.3.0）
 | 平台 | 文件 | 说明 |
 | :--- | :--- | :---- |
-| Linux x64 | [fastdeploy-linux-x64-gpu-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-gpu-0.2.1.tgz) | g++ 8.2, CUDA 11.2, cuDNN 8.2编译产出 |
+| Linux x64 | [fastdeploy-linux-x64-gpu-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-gpu-0.3.0.tgz) | g++ 8.2, CUDA 11.2, cuDNN 8.2编译产出 |
-| Windows x64 | [fastdeploy-win-x64-gpu-0.2.1.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-gpu-0.2.1.zip) | Visual Studio 16 2019, CUDA 11.2, cuDNN 8.2编译产出 |
+| Windows x64 | [fastdeploy-win-x64-gpu-0.3.0.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-gpu-0.3.0.zip) | Visual Studio 16 2019, CUDA 11.2, cuDNN 8.2编译产出 |
 ## CPU部署环境
@@ -44,20 +44,20 @@ Release版本（当前最新0.2.1）
 ### Python安装
-Release版本（当前最新0.2.1）安装
+Release版本（当前最新0.3.0）安装
 ```
 pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
 ```
 ## C++ SDK安装
-Release版本（当前最新0.2.1）
+Release版本（当前最新0.3.0）
 | 平台 | 文件 | 说明 |
 | :--- | :--- | :---- |
-| Linux x64 | [fastdeploy-linux-x64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.1.tgz) | g++ 8.2编译产出 |
+| Linux x64 | [fastdeploy-linux-x64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.3.0.tgz) | g++ 8.2编译产出 |
-| Windows x64 | [fastdeploy-win-x64-0.2.1.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-0.2.1.zip) | Visual Studio 16 2019编译产出 |
+| Windows x64 | [fastdeploy-win-x64-0.3.0.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-0.3.0.zip) | Visual Studio 16 2019编译产出 |
-| Mac OSX x64 | [fastdeploy-osx-x86_64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-x86_64-0.2.1.tgz) | - |
+| Mac OSX x64 | [fastdeploy-osx-x86_64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-x86_64-0.3.0.tgz) | - |
-| Mac OSX arm64 | [fastdeploy-osx-arm64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-arm64-0.2.1.tgz) | - |
+| Mac OSX arm64 | [fastdeploy-osx-arm64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-arm64-0.3.0.tgz) | - |
-| Linux aarch64 | [fastdeploy-linux-aarch64-0.2.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-0.2.0.tgz) | g++ 6.3.0编译产出 |
+| Linux aarch64 | [fastdeploy-linux-aarch64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-0.3.0.tgz) | g++ 6.3.0编译产出 |
--- a/.new_docs/cn/build_and_install/gpu.md
+++ b/.new_docs/cn/build_and_install/gpu.md
--- a/.new_docs/cn/build_and_install/jetson.md
+++ b/.new_docs/cn/build_and_install/jetson.md
--- a/.new_docs/cn/faq/build_on_win_with_gui.md
+++ b/.new_docs/cn/faq/build_on_win_with_gui.md
--- a/.new_docs/cn/faq/develop_a_new_model.md
+++ b/.new_docs/cn/faq/develop_a_new_model.md
--- a/.new_docs/cn/faq/how_to_change_backend.md
+++ b/.new_docs/cn/faq/how_to_change_backend.md
--- a/.new_docs/cn/faq/tensorrt_tricks.md
+++ b/.new_docs/cn/faq/tensorrt_tricks.md
--- a/.new_docs/cn/faq/use_sdk_on_android.md
+++ b/.new_docs/cn/faq/use_sdk_on_android.md
--- a/.new_docs/cn/faq/use_sdk_on_windows.md
+++ b/.new_docs/cn/faq/use_sdk_on_windows.md
--- a/.new_docs/cn/quantize.md
+++ b/.new_docs/cn/quantize.md
--- a/.new_docs/cn/quick_start/models/cpp.md
+++ b/.new_docs/cn/quick_start/models/cpp.md
--- a/.new_docs/cn/quick_start/models/python.md
+++ b/.new_docs/cn/quick_start/models/python.md
--- a/.new_docs/cn/quick_start/runtime/cpp.md
+++ b/.new_docs/cn/quick_start/runtime/cpp.md
--- a/.new_docs/cn/quick_start/runtime/python.md
+++ b/.new_docs/cn/quick_start/runtime/python.md
--- a/docs/compile/README.md
+++ b/docs/compile/README.md
@@ -1,25 +0,0 @@
 # FastDeploy编译
 本文档说明编译C++预测库、Python预测库两种编译过程，根据编译的平台参考如下文档
 - [Linux & Mac 编译](how_to_build_linux_and_mac.md)
 - [Windows编译](how_to_build_windows.md)
 其中编译过程中，各平台上编译选项如下表所示
 | 选项 | 作用 | 备注 |
 |:---- | :--- | :--- |
 | ENABLE_ORT_BACKEND | 启用ONNXRuntime推理后端，默认ON | 默认支持CPU，开启WITH_GPU后，同时支持GPU |
 | ENABLE_PADDLE_BACKEND | 启用Paddle Inference推理后端，默认OFF | 默认支持CPU，开启WITH_GPU后，同时支持GPU |
 | ENABLE_OPENVINO_BACKEND | 启用OpenVINO推理后端，默认OFF | 仅支持 CPU |
 | ENABLE_TRT_BACKEND | 启用TensorRT推理后端，默认OFF | 仅支持GPU |
 | WITH_GPU | 是否开启GPU使用，默认OFF | 当设为TRUE，编译后将支持Nvidia GPU部署 |
 | CUDA_DIRECTORY | 指定编译时的CUDA路径，默认为/usr/local/cuda | CUDA 11.2及以上 |
 | TRT_DIRECTORY | 当启用TensorRT推理后端时，需通过此参数指定TensorRT路径 | TensorRT 8.4及以上 |
 | ENABLE_VISION | 启用视觉模型模块，默认为ON | |
 | ENABLE_TEXT | 启用文本模型模块，默认为ON | |
 | OPENCV_DIRECTORY | 指定已安装OpenCV库的路径，默认为空| 若没指定OpenCV库路径，则会自动下载安装OpenCV |
 | ORT_DIRECTORY | 指定已安装ONNXRuntime库的路径，默认为空| 若没指定ONNXRuntime库路径，则会自动下载安装ONNXRuntime |
 FastDeploy支持在编译时，用户选择自己的后端进行编译, 目前已经支持Paddle Inference、ONNXRuntime、TensorRT(加载ONNX格式)。FastDeploy已支持的模型已完成在不同后端上的验证工作，会自动根据编译时支持的后端进行选择，如若无可用后端则会给出相应提示(如YOLOv7目前仅支持ONNXRuntime/TensorRT后端，如若编译时未开启这两个后端，则推理时会提示无可用后端)。
--- a/docs/compile/how_to_build_linux_and_mac.md
+++ b/docs/compile/how_to_build_linux_and_mac.md
@@ -1,38 +0,0 @@
 # Linux & Mac编译
 ## 环境依赖
 - cmake >= 3.12
 - g++ >= 8.2
 - cuda >= 11.2 (当WITH_GPU=ON)
 - cudnn >= 8.0 (当WITH_GPU=ON)
 - TensorRT >= 8.4 (当ENABLE_TRT_BACKEND=ON)
 ## 编译C++
 ```bash
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy
 git checkout develop
 mkdir build && cd build
 cmake .. -DENABLE_ORT_BACKEND=ON \
         -DENABLE_VISION=ON \
         -DCMAKE_INSTALL_PREFIX=${PWD}/fastdeploy-0.0.3
 make -j8
 make install
 ```
 编译后的预测库即在当前目录下的`fastdeploy-0.0.3`
 ## 编译Python安装包
 ```bash
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy/python
 git checkout develop
 # Python通过export环境变量设置编译选项
 export ENABLE_ORT_BACKEND=ON
 export ENABLE_VISION=ON
 python setup.py build
 python setup.py bdist_wheel
 ```
 编译后的wheel包即在当前目录下的`dist`目录中
 更多编译选项说明参考[编译指南](./README.md)
--- a/docs/compile/how_to_build_windows.md
+++ b/docs/compile/how_to_build_windows.md
@@ -1,257 +0,0 @@
 # FastDeploy Windows SDK 编译
 ## 目录
 - [环境依赖](#Environment)
 - [命令行方式编译C++ SDK](#CommandLineCpp)
  - [编译CPU版本 C++ SDK](#CommandLineCppCPU)
  - [编译GPU版本 C++ SDK](#CommandLineCppGPU)
 - [命令行方式编译Python Wheel包](#CommandLinePython)
  - [编译CPU版本 Python Wheel包](#CommandLinePythonCPU)
  - [编译GPU版本 Python Wheel包](#CommandLinePythonGPU)
 - [CMake GUI + Visual Studio 2019 IDE方式编译C++ SDK](#CMakeGuiAndVS2019)
  - [使用CMake GUI进行基础配置](#CMakeGuiAndVS2019Basic)
  - [编译CPU版本 C++ SDK设置](#CMakeGuiAndVS2019CPU)
  - [编译GPU版本 C++ SDK设置](#CMakeGuiAndVS2019GPU)
  - [使用Visual Studio 2019 IDE进行编译](#CMakeGuiAndVS2019Build)  
 - [Windows下FastDeploy C++ SDK使用方式](#Usage)  
 ## 1. 环境依赖
 <div id="Environment"></div>  
 - cmake >= 3.12
 - Visual Studio 16 2019
 - cuda >= 11.2 (当WITH_GPU=ON)
 - cudnn >= 8.0 (当WITH_GPU=ON)
 - TensorRT >= 8.4 (当ENABLE_TRT_BACKEND=ON)
 ## 2. 命令行方式编译C++ SDK
 <div id="CommandLineCpp"></div>  
 ### 编译CPU版本 C++ SDK
 <div id="CommandLineCppCPU"></div>  
 Windows菜单打开`x64 Native Tools Command Prompt for VS 2019`命令工具，其中`CMAKE_INSTALL_PREFIX`用于指定编译后生成的SDK路径
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=D:\Paddle\FastDeploy\build\fastdeploy-win-x64 -DENABLE_ORT_BACKEND=ON -DENABLE_PADDLE_BACKEND=ON -DENABLE_VISION=ON -DENABLE_VISION_VISUALIZE=ON
 msbuild fastdeploy.sln /m /p:Configuration=Release /p:Platform=x64
 msbuild INSTALL.vcxproj /m /p:Configuration=Release /p:Platform=x64
 ```
 编译后，FastDeploy CPU C++ SDK即在`D:\Paddle\FastDeploy\build\fastdeploy-win-x64`目录下
 ### 编译GPU版本 C++ SDK
 <div id="CommandLineCppGPU"></div>  
 Windows菜单打开`x64 Native Tools Command Prompt for VS 2019`命令工具，其中`CMAKE_INSTALL_PREFIX`用于指定编译后生成的SDK路径
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=D:\Paddle\FastDeploy\build\fastdeploy-win-x64-gpu -DWITH_GPU=ON -DENABLE_ORT_BACKEND=ON -DENABLE_PADDLE_BACKEND=ON -DENABLE_VISION=ON -DENABLE_VISION_VISUALIZE=ON -DCUDA_DIRECTORY="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2"
 msbuild fastdeploy.sln /m /p:Configuration=Release /p:Platform=x64
 msbuild INSTALL.vcxproj /m /p:Configuration=Release /p:Platform=x64  
 % 附加说明：%
 % (1) -DCUDA_DIRECTORY指定CUDA所在的目录 %
 % (2) 若编译Paddle后端，设置-DENABLE_PADDLE_BACKEND=ON %
 % (3) 若编译TensorRT后端，需要设置-DENABLE_TRT_BACKEND=ON，并指定TRT_DIRECTORY %
 % (4) 如-DTRT_DIRECTORY=D:\x64\third_party\TensorRT-8.4.1.5 %
 ```
 编译后，FastDeploy GPU C++ SDK即在`D:\Paddle\FastDeploy\build\fastdeploy-win-x64-gpu`目录下
 ## 命令行方式编译Python Wheel包
 <div id="CommandLinePython"></div>  
 ### 编译CPU版本 Python Wheel包
 <div id="CommandLinePythonCPU"></div>  
 Windows菜单打开x64 Native Tools Command Prompt for VS 2019命令工具。Python编译时，通过环境变量获取编译选项，在命令行终端运行以下命令
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy/python && git checkout develop
 set ENABLE_ORT_BACKEND=ON
 set ENABLE_PADDLE_BACKEND=ON
 set ENABLE_VISION=ON
 set ENABLE_VISION_VISUALIZE=ON
 % 这里指定用户自己的python解释器 以python3.8为例 %
 C:\Python38\python.exe setup.py build
 C:\Python38\python.exe setup.py bdist_wheel
 ```
 编译好的wheel文件在dist目录下，pip安装编译好的wheel包，命令如下
 ```bat
 C:\Python38\python.exe -m pip install dist\fastdeploy_python-0.2.1-cp38-cp38-win_amd64.whl
 ```
 ### 编译GPU版本 Python Wheel包
 <div id="CommandLinePythonGPU"></div>  
 Windows菜单打开x64 Native Tools Command Prompt for VS 2019命令工具。Python编译时，通过环境变量获取编译选项，在命令行终端运行以下命令
 ```bat
 % 说明：CUDA_DIRECTORY 为用户自己的CUDA目录 以下为示例 %
 set CUDA_DIRECTORY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
 % 说明：TRT_DIRECTORY 为下载的TensorRT库所在的目录 以下为示例 如果不编译TensorRT后端 可以不设置 %
 set TRT_DIRECTORY=D:\x64\third_party\TensorRT-8.4.1.5
 set WITH_GPU=ON
 set ENABLE_ORT_BACKEND=ON
 % 说明：如果不编译TensorRT后端 此项为OFF %
 set ENABLE_TRT_BACKEND=ON
 set ENABLE_PADDLE_BACKEND=ON
 set ENABLE_VISION=ON
 set ENABLE_VISION_VISUALIZE=ON
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 % 说明：这里指定用户自己的python解释器 以python3.8为例 %
 C:\Python38\python.exe setup.py build
 C:\Python38\python.exe setup.py bdist_wheel
 ```
 编译好的wheel文件在dist目录下，pip安装编译好的wheel包，命令如下
 ```bat
 C:\Python38\python.exe -m pip install dist\fastdeploy_gpu_python-0.2.1-cp38-cp38-win_amd64.whl
 ```
 更多编译选项说明参考[编译指南](./README.md)
 ## 3. CMake GUI + Visual Studio 2019 IDE方式编译C++ SDK
 <div id="CMakeGuiAndVS2019"></div>  
 ### 使用CMake GUI进行基础配置
 <div id="CMakeGuiAndVS2019Basic"></div>
 步骤一：首先，打开CMake GUI，先初始化FastDeploy工程：
 ![image](https://user-images.githubusercontent.com/31974251/192094881-c5beb0e5-82ae-4a62-a88c-73f3d80f7936.png)  
 步骤二：点击Configure后，在弹窗中设置编译"x64"架构：
 ![image](https://user-images.githubusercontent.com/31974251/192094951-958a0a22-2090-4ab6-84f5-3573164d0835.png)
 初始化完成后，显示如下：  
 ![image](https://user-images.githubusercontent.com/31974251/192095053-874b9c73-fc0d-4325-b555-ac94ab9a9f38.png)
 步骤三：由于FastDeploy目前只支持Release版本，因此，先将"CMAKE_CONFIGURATION_TYPES"修改成"Release"  
 ![image](https://user-images.githubusercontent.com/31974251/192095175-3aeede95-a633-4b3c-81f8-067f0a0a44a3.png)
 接下来，用户可根据自己实际的开发需求开启对应的编译选项，并生成sln解决方案。以下，针对编译CPU和GPU版本SDK各举一个例子。
 ### 编译CPU版本 C++ SDK设置
 <div id="CMakeGuiAndVS2019CPU"></div>  
 步骤一：勾选CPU版本对应的编译选项。注意CPU版本，请`不要`勾选WITH_GPU和ENABLE_TRT_BACKEND
 ![image](https://user-images.githubusercontent.com/31974251/192095848-b3cfdf19-e378-41e0-b44e-5edb49461eeb.png)
 这个示例中，我们开启ORT、Paddle、OpenVINO等推理后端，并且选择了需要编译TEXT和VISION的API
 步骤二：自定义设置SDK安装路径，修改CMAKE_INSTALL_PREFIX
 ![image](https://user-images.githubusercontent.com/31974251/192095961-5f6e348a-c30b-4473-8331-8beefb7cd2e6.png)
 由于默认的安装路径是C盘，我们可以修改CMAKE_INSTALL_PREFIX来指定自己的安装路径，这里我们将安装路径修改到`build\fastdeploy-win-x64-0.2.1`目录下。  
 ![image](https://user-images.githubusercontent.com/31974251/192096055-8a276a9e-6017-4447-9ded-b95c5579d663.png)
 ### 编译GPU版本 C++ SDK设置
 <div id="CMakeGuiAndVS2019GPU"></div>  
 步骤一：勾选GPU版本对应的编译选项。注意GPU版本，请`需要`勾选WITH_GPU
 ![image](https://user-images.githubusercontent.com/31974251/192099254-9f82abb0-8a29-41ce-a0ce-da6aacf23582.png)
 这个示例中，我们开启ORT、Paddle、OpenVINO和TRT等推理后端，并且选择了需要编译TEXT和VISION的API。并且，由于开启了GPU和TensorRT，此时需要额外指定CUDA_DIRECTORY和TRT_DIRECTORY，在GUI界面中找到这两个变量，点击右侧的选项框，分别选择您安装CUDA的路径和TensorRT的路径  
 ![image](https://user-images.githubusercontent.com/31974251/192098907-9dd9a49c-4a3e-4641-8e68-f25da1cafbba.png)
 ![image](https://user-images.githubusercontent.com/31974251/192098984-7fefd824-7e3b-4185-abba-bae5d8765e2a.png)
 步骤二：自定义设置SDK安装路径，修改CMAKE_INSTALL_PREFIX
 ![image](https://user-images.githubusercontent.com/31974251/192099125-81fc8217-e51f-4039-9421-ba7a09c0027c.png)
 由于默认的安装路径是C盘，我们可以修改CMAKE_INSTALL_PREFIX来指定自己的安装路径，这里我们将安装路径修改到`build\fastdeploy-win-x64-gpu-0.2.1`目录下。  
 ### 使用Visual Studio 2019 IDE进行编译
 <div id="CMakeGuiAndVS2019Build"></div>  
 步骤一：点击"Generate"，生成sln解决方案，并用Visual Studio 2019打开  
 ![image](https://user-images.githubusercontent.com/31974251/192096162-c05cbb11-f96e-4c82-afde-c7fc02cddf68.png)
 这个过程默认会从下载一些编译需要的资源，cmake的dev警告可以不用管。生成完成之后可以看到以下界面：  
 CPU版本SDK:  
 ![image](https://user-images.githubusercontent.com/31974251/192096478-faa570bd-7569-43c3-ad79-cc6be5b605e3.png)
 GPU版本SDK:  
 ![image](https://user-images.githubusercontent.com/31974251/192099583-300e4680-1089-45cf-afaa-d2afda8fd436.png)
 左侧界面，可以看到所有编译需要的include路径和lib路径已经被设置好了，用户可以考虑把这些路径记录下来方便后续的开发。右侧界面，可以看到已经生成fastdeploy.sln解决方案文件。接下来，我们使用Visual Studio 2019打开这个解决方案文件（理论上VS2022也可以编译，但目前建议使用VS2019）。  
 ![image](https://user-images.githubusercontent.com/31974251/192096765-2aeadd68-47fb-4cd6-b083-4a478cf5e584.jpg)
 步骤二：在Visual Studio 2019点击"ALL BUILD"->右键点击"生成"开始编译  
 ![image](https://user-images.githubusercontent.com/31974251/192096893-5d6bc428-b824-4ffe-8930-0ec6d4dcfd02.png)  
 CPU版本SDK编译成功！
 ![image](https://user-images.githubusercontent.com/31974251/192097020-979bd7a3-1cdd-4fb5-a931-864c5372933d.png)
 GPU版本SDK编译成功！  
 ![image](https://user-images.githubusercontent.com/31974251/192099902-4b661f9a-7691-4f7f-b573-92ca9397a890.png)
 步骤三：编译完成后，在Visual Studio 2019点击"INSTALL"->右键点击"生成"将编译好的SDK安装到先前指定的目录  
 ![image](https://user-images.githubusercontent.com/31974251/192097073-ce5236eb-1ed7-439f-8098-fef7a2d02779.png)
 ![image](https://user-images.githubusercontent.com/31974251/192097122-d675ae39-35fb-4dbb-9c75-eefb0597ec2e.png)  
 SDK成功安装到指定目录！  
 ### 编译所有examples（可选）
 可以在CMake GUI中勾选BUILD_EXAMPLES选项，连带编译所有的examples，编译完成后所有example的可执行文件保存在build/bin/Release目录下
 ![image](https://user-images.githubusercontent.com/31974251/192110769-a4f0940d-dea3-4524-831b-1c2a6ab8e871.png)
 ![image](https://user-images.githubusercontent.com/31974251/192110930-e7e49bc6-c271-4076-be74-3d103f27bc78.png)
 ## 4. 特别提示
 如果是用户自行编译SDK，理论上支持Windows 10/11，VS 2019/2022，CUDA 11.x 以及 TensorRT 8.x等配置，但建议使用我们推荐的默认配置，即：Windows 10, VS 2019, CUDA 11.2 和 TensorRT 8.4.x版本。另外，如果编译过程中遇到中文字符的编码问题（如UIE example必须传入中文字符进行预测），可以参考Visual Studio的官方文档，设置源字符集为`/utf-8`解决：  
 - [/utf-8（将源字符集和执行字符集设置为 UTF-8）](https://learn.microsoft.com/zh-cn/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170)
 ## 5. Windows下FastDeploy C++ SDK使用方式  
 <div id="Usage"></div>  
 Windows下FastDeploy C++ SDK使用方式，请参考文档：  
 - [how_to_use_sdk_on_windows.md](./how_to_use_sdk_on_windows.md)  
--- a/docs/compile/how_to_use_sdk_on_windows.md
+++ b/docs/compile/how_to_use_sdk_on_windows.md
@@ -1,509 +0,0 @@
 # 在 Windows 使用 FastDeploy C++ SDK
 在 Windows 下使用 FastDeploy C++ SDK 与在 Linux 下使用稍有不同。以下以 PPYOLOE 为例进行演示在CPU/GPU，以及GPU上通过TensorRT加速部署的示例。在部署前，需确认以下两个步骤：  
 - 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../environment.md)  
 - 2. 根据开发环境，下载预编译部署库和samples代码，参考[FastDeploy预编译库](../quick_start)
 ## 目录
 - [1. 环境依赖](#Environment)  
 - [2. 下载 FastDeploy Windows 10 C++ SDK](#Download)  
 - [3. Windows下多种方式使用 C++ SDK 的方式](#CommandLine)
  - [3.1 命令行方式使用 C++ SDK](#CommandLine)  
    - [3.1.1 在 Windows 命令行终端 上编译 example](#CommandLine)  
    - [3.1.2 运行可执行文件获得推理结果](#CommandLine)  
  - [3.2 Visual Studio 2019 创建sln工程使用 C++ SDK](#VisualStudio2019Sln)  
    - [3.2.1 Visual Studio 2019 创建 sln 工程项目](#VisualStudio2019Sln1)  
    - [3.2.2 从examples中拷贝infer_ppyoloe.cc的代码到工程](#VisualStudio2019Sln2)  
    - [3.2.3 将工程配置设置成"Release x64"配置](#VisualStudio2019Sln3)  
    - [3.2.4 配置头文件include路径](#VisualStudio2019Sln4)  
    - [3.2.5 配置lib路径和添加库文件](#VisualStudio2019Sln5)  
    - [3.2.6 编译工程并运行获取结果](#VisualStudio2019Sln6)
  - [3.3 Visual Studio 2019 创建CMake工程使用 C++ SDK](#VisualStudio2019)
    - [3.3.1 Visual Studio 2019 创建CMake工程项目](#VisualStudio20191)  
    - [3.3.2 在CMakeLists中配置 FastDeploy C++ SDK](#VisualStudio20192)  
    - [3.3.3 生成工程缓存并修改CMakeSetting.json配置](#VisualStudio20193)  
    - [3.3.4 生成可执行文件，运行获取结果](#VisualStudio20194)  
 - [4. 多种方法配置exe运行时所需的依赖库](#CommandLineDeps1)
  - [4.1 使用 fastdeploy_init.bat 进行配置（推荐）](#CommandLineDeps1)
    - [4.1.1 fastdeploy_init.bat 使用说明](#CommandLineDeps11)
    - [4.1.2 fastdeploy_init.bat 查看 SDK 中所有的 dll、lib 和 include 路径](#CommandLineDeps12)
    - [4.1.3 fastdeploy_init.bat 安装 SDK 中所有的 dll 到指定的目录](#CommandLineDeps13)
    - [4.1.4 fastdeploy_init.bat 配置 SDK 环境变量](#CommandLineDeps14)
  - [4.2 修改 CMakeLists.txt，一行命令配置（推荐）](#CommandLineDeps2)  
  - [4.3 命令行设置环境变量](#CommandLineDeps3)  
  - [4.4 手动拷贝依赖库到exe的目录下](#CommandLineDeps4)  
 ## 1. 环境依赖
 <div id="Environment"></div>  
 - cmake >= 3.12
 - Visual Studio 16 2019
 - cuda >= 11.2 (当WITH_GPU=ON)
 - cudnn >= 8.0 (当WITH_GPU=ON)
 ## 2. 下载 FastDeploy Windows 10 C++ SDK
 <div id="Download"></div>  
 ### 2.1 下载预编译库或者从源码编译最新的SDK
 可以从以下链接下载编译好的 FastDeploy Windows 10 C++ SDK，SDK中包含了examples代码。
 ```text
 https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-gpu-0.2.1.zip
 ```
 源码编译请参考: [Windows C++ SDK源码编译文档](./how_to_build_windows.md)
 ### 2.2 准备模型文件和测试图片
 可以从以下链接下载模型文件和测试图片，并解压缩
 ```text
 https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz # (下载后解压缩)
 https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 ```
 ## 3. Windows下多种方式使用 C++ SDK 的方式
 ### 3.1 SDK使用方式一：命令行方式使用 C++ SDK
 <div id="CommandLine"></div>  
 #### 3.1.1 在 Windows 上编译 PPYOLOE
 Windows菜单打开`x64 Native Tools Command Prompt for VS 2019`命令工具，cd到ppyoloe的demo路径  
 ```bat  
 cd fastdeploy-win-x64-gpu-0.2.1\examples\vision\detection\paddledetection\cpp
 ```
 ```bat
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DFASTDEPLOY_INSTALL_DIR=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.1 -DCUDA_DIRECTORY="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2"
 ```
 然后执行
 ```bat
 msbuild infer_demo.sln /m:4 /p:Configuration=Release /p:Platform=x64
 ```
 #### 3.1.2 运行 demo
 ```bat
 cd Release
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 0  # CPU
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 1  # GPU
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 2  # GPU + TensorRT
 ```  
 特别说明，exe运行时所需要的依赖库配置方法，请参考章节: [多种方法配置exe运行时所需的依赖库](#CommandLineDeps)
 ### 3.2 SDK使用方式二：Visual Studio 2019 创建 sln 工程使用 C++ SDK
 本章节针对非CMake用户，介绍如何在Visual Studio 2019 中创建 sln 工程使用 FastDeploy C++ SDK. CMake用户请直接看下一章节。另外，本章节内容特别感谢“梦醒南天”同学关于FastDeploy使用的文档教程：[如何在 Windows 上使用 FastDeploy C++ 部署 PaddleDetection 目标检测模型](https://www.bilibili.com/read/cv18807232)
 <div id="VisualStudio2019Sln"></div>  
 #### 3.2.1 步骤一：Visual Studio 2019 创建 sln 工程项目
 <div id="VisualStudio2019Sln1"></div>  
 （1） 打开Visual Studio 2019，点击"创建新项目"->点击"控制台程序"，从而创建新的sln工程项目.
 ![image](https://user-images.githubusercontent.com/31974251/192813386-cf9a93e0-ee42-42b3-b8bf-d03ae7171d4e.png)
 ![image](https://user-images.githubusercontent.com/31974251/192816516-a4965b9c-21c9-4a01-bbb2-c648a8256fc9.png)
 （2）点击“创建”，便创建了一个空的sln工程。我们直接从examples里面拷贝infer_ppyoloe的代码这里。
 ![image](https://user-images.githubusercontent.com/31974251/192817382-643c8ca2-1f2a-412e-954e-576c22b4ea62.png)
 #### 3.2.2 步骤二：从examples中拷贝infer_ppyoloe.cc的代码到工程
 <div id="VisualStudio2019Sln2"></div>  
 （1）从examples中拷贝infer_ppyoloe.cc的代码到工程，直接替换即可，拷贝代码的路径为：  
 ```bat
 fastdeploy-win-x64-gpu-0.2.1\examples\vision\detection\paddledetection\cpp
 ```
 ![image](https://user-images.githubusercontent.com/31974251/192818456-21ca846c-ab52-4001-96d2-77c8174bff6b.png)  
 #### 3.2.3 步骤三：将工程配置设置成"Release x64"配置
 <div id="VisualStudio2019Sln3"></div>  
 ![image](https://user-images.githubusercontent.com/31974251/192818918-98d7a54c-4a60-4760-a3cb-ecacc38b7e7a.png)
 #### 3.2.4 步骤四：配置头文件include路径
 <div id="VisualStudio2019Sln4"></div>  
 （1）配置头文件include路径：鼠标选择项目，然后单击右键即可弹出下来菜单，在其中单击“属性”。
 ![image](https://user-images.githubusercontent.com/31974251/192820573-23096aea-046c-4bb4-9929-c412718805cb.png)
 （2）在弹出来的属性页中选择：C/C++ —> 常规 —> 附加包含目录，然后在添加 fastdeploy 和 opencv 的头文件路径。如：  
 ```bat  
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\include
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\third_libs\install\opencv-win-x64-3.4.16\build\include  
 ```  
 注意，如果是自行编译最新的SDK或版本>0.2.1，依赖库目录结构有所变动，opencv路径需要做出适当的修改。如：  
 ```bat  
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\third_libs\install\opencv\build\include  
 ```
 ![image](https://user-images.githubusercontent.com/31974251/192824445-978c06ed-cc14-4d6a-8ccf-d4594ca11533.png)
 用户需要根据自己实际的sdk路径稍作修改。
 #### 3.2.5 步骤五：配置lib路径和添加库文件
 <div id="VisualStudio2019Sln5"></div>  
 （1）属性页中选择：链接器—>常规—> 附加库目录，然后在添加 fastdeploy 和 opencv 的lib路径。如：  
 ```bat  
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\lib
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\lib
 ```
 注意，如果是自行编译最新的SDK或版本>0.2.1，依赖库目录结构有所变动，opencv路径需要做出适当的修改。如：  
 ```bat  
 D:\qiuyanjun\fastdeploy_build\built\fastdeploy-win-x64-gpu-0.2.1\third_libs\install\opencv\build\include  
 ```  
 ![image](https://user-images.githubusercontent.com/31974251/192826130-fe28791f-317c-4e66-a6a5-133e60b726f0.png)
 （2）添加库文件：只需要 fastdeploy.lib 和 opencv_world3416.lib  
 ![image](https://user-images.githubusercontent.com/31974251/192826884-44fc84a1-c57a-45f1-8ee2-30b7eaa3dce9.png)
 #### 3.2.6 步骤六：编译工程并运行获取结果
 <div id="VisualStudio2019Sln6"></div>  
 （1）点击菜单栏“生成”->“生成解决方案”
 ![image](https://user-images.githubusercontent.com/31974251/192827608-beb53685-2f94-44dc-aa28-49b09a4ab864.png)
 ![image](https://user-images.githubusercontent.com/31974251/192827842-1f05d435-8a3e-492b-a3b7-d5e88f85f814.png)  
 编译成功，可以看到exe保存在：  
 ```bat  
 D:\qiuyanjun\fastdeploy_test\infer_ppyoloe\x64\Release\infer_ppyoloe.exe  
 ```  
 （2）执行可执行文件，获得推理结果。 首先需要拷贝所有的dll到exe所在的目录下。同时，也需要把ppyoloe的模型文件和测试图片下载解压缩后，拷贝到exe所在的目录。 特别说明，exe运行时所需要的依赖库配置方法，请参考章节: [多种方法配置exe运行时所需的依赖库](#CommandLineDeps)  
 ![image](https://user-images.githubusercontent.com/31974251/192829545-3ea36bfc-9a54-492b-984b-2d5d39094d47.png)  
 ### 3.3 SDK使用方式三：Visual Studio 2019 创建 CMake 工程使用 C++ SDK
 <div id="VisualStudio2019"></div>  
 本章节针对CMake用户，介绍如何在Visual Studio 2019 中创建 CMake 工程使用 FastDeploy C++ SDK.
 #### 3.3.1 步骤一：Visual Studio 2019 创建“CMake”工程项目
 <div id="VisualStudio20191"></div>  
 （1）打开Visual Studio 2019，点击"创建新项目"->点击"CMake"，从而创建CMake工程项目。以PPYOLOE为例，来说明如何在Visual Studio 2019 IDE中使用FastDeploy C++ SDK.
 ![image](https://user-images.githubusercontent.com/31974251/192143543-9f29e4cb-2307-45ca-a61a-bcfba5df19ff.png)
 ![image](https://user-images.githubusercontent.com/31974251/192143640-39e79c65-8b50-4254-8da6-baa21bb23e3c.png)  
 ![image](https://user-images.githubusercontent.com/31974251/192143713-be2e6490-4cab-4151-8463-8c367dbc451a.png)
 （2）打开工程发现，Visual Stuio 2019已经为我们生成了一些基本的文件，其中包括CMakeLists.txt。infer_ppyoloe.h头文件这里实际上用不到，我们可以直接删除。  
 ![image](https://user-images.githubusercontent.com/31974251/192143930-db1655c2-66ee-448c-82cb-0103ca1ca2a0.png)  
 #### 3.3.2 步骤二：在CMakeLists中配置 FastDeploy C++ SDK
 <div id="VisualStudio20192"></div>  
 （1）在工程创建完成后，我们需要添加infer_ppyoloe推理源码，并修改CMakeLists.txt，修改如下：
 ![image](https://user-images.githubusercontent.com/31974251/192144782-79bccf8f-65d0-4f22-9f41-81751c530319.png)
 （2）其中infer_ppyoloe.cpp的代码可以直接从examples中的代码拷贝过来：  
 - [examples/vision/detection/paddledetection/cpp/infer_ppyoloe.cc](../../examples/vision/detection/paddledetection/cpp/infer_ppyoloe.cc)
 （3）CMakeLists.txt主要包括配置FastDeploy C++ SDK的路径，如果是GPU版本的SDK，还需要配置CUDA_DIRECTORY为CUDA的安装路径，CMakeLists.txt的配置如下：
 ```cmake
 project(infer_ppyoloe_demo C CXX)
 cmake_minimum_required(VERSION 3.12)
 # Only support "Release" mode now  
 set(CMAKE_BUILD_TYPE "Release")
 # Set FastDeploy install dir
 set(FASTDEPLOY_INSTALL_DIR "D:/qiuyanjun/fastdeploy-win-x64-gpu-0.2.1"
    CACHE PATH "Path to downloaded or built fastdeploy sdk.")
 # Set CUDA_DIRECTORY (CUDA 11.x) for GPU SDK
 set(CUDA_DIRECTORY "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7"
    CACHE PATH "Path to installed CUDA Toolkit.")
 include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
 include_directories(${FASTDEPLOY_INCS})
 add_executable(infer_ppyoloe_demo ${PROJECT_SOURCE_DIR}/infer_ppyoloe.cpp)
 target_link_libraries(infer_ppyoloe_demo ${FASTDEPLOY_LIBS})  
 # Optional: install all DLLs to binary dir.
 install_fastdeploy_libraries(${CMAKE_CURRENT_BINARY_DIR}/Release)
 ```
 注意，`install_fastdeploy_libraries`函数仅在最新的代码编译的SDK或版本>0.2.1下有效。  
 #### 3.3.3 步骤三：生成工程缓存并修改CMakeSetting.json配置
 <div id="VisualStudio20193"></div>  
 （1）点击"CMakeLists.txt"->右键点击"生成缓存":  
 ![image](https://user-images.githubusercontent.com/31974251/192145349-c78b110a-0e41-4ee5-8942-3bf70bd94a75.png)
 发现已经成功生成缓存了，但是由于打开工程时，默认是Debug模式，我们发现exe和缓存保存路径还是Debug模式下的。 我们可以先修改CMake的设置为Release.
 （2）点击"CMakeLists.txt"->右键点击"infer_ppyoloe_demo的cmake设置"，进入CMakeSettings.json的设置面板，把其中的Debug设置修改为Release.  
 ![image](https://user-images.githubusercontent.com/31974251/192145242-01d37b44-e2fa-47df-82c1-c11c2ccbff99.png)  
 同时设置CMake生成器为 "Visual Studio 16 2019 Win64"
 ![image](https://user-images.githubusercontent.com/31974251/192147961-ac46d0f6-7349-4126-a123-914af2b63d95.jpg)
 （3）点击保存CMake缓存以切换为Release配置：  
 ![image](https://user-images.githubusercontent.com/31974251/192145974-b5a63341-9143-49a2-8bfe-94ac641b1670.png)
 （4）：（4.1）点击"CMakeLists.txt"->右键"CMake缓存仅限x64-Release"->"点击删除缓存"；（4.2）点击"CMakeLists.txt"->"生成缓存"；（4.3）如果在步骤一发现删除缓存的选项是灰色的可以直接点击"CMakeLists.txt"->"生成"，若生成失败则可以重复尝试（4.1）和（4。2）
 ![image](https://user-images.githubusercontent.com/31974251/192146394-51fbf2b8-1cba-41ca-bb45-5f26890f64ce.jpg)  
 最终可以看到，配置已经成功生成Relase模式下的CMake缓存了。  
 ![image](https://user-images.githubusercontent.com/31974251/192146239-a1eacd9e-034d-4373-a262-65b18ce25b87.png)  
 #### 3.3.4 步骤四：生成可执行文件，运行获取结果。
 <div id="VisualStudio20194"></div>  
 （1）点击"CMakeLists.txt"->"生成"。可以发现已经成功生成了infer_ppyoloe_demo.exe，并保存在`out/build/x64-Release/Release`目录下。  
 ![image](https://user-images.githubusercontent.com/31974251/192146852-c64d2252-8c8f-4309-a950-908a5cb258b8.png)
 （2）执行可执行文件，获得推理结果。 首先需要拷贝所有的dll到exe所在的目录下，这里我们可以在CMakeLists.txt添加一下命令，可将FastDeploy中所有的dll安装到指定的目录。注意，该方式仅在最新的代码编译的SDK或版本>0.2.1下有效。其他配置方式，请参考章节: [多种方法配置exe运行时所需的依赖库](#CommandLineDeps)  
 ```cmake  
 install_fastdeploy_libraries(${CMAKE_CURRENT_BINARY_DIR}/Release)
 ```  
 （3）同时，也需要把ppyoloe的模型文件和测试图片下载解压缩后，拷贝到exe所在的目录。 准备完成后，目录结构如下：  
 ![image](https://user-images.githubusercontent.com/31974251/192147505-054edb77-564b-405e-89ee-fd0d2e413e78.png)
 （4）最后，执行以下命令获得推理结果：  
 ```bat  
 D:\xxxinfer_ppyoloe\out\build\x64-Release\Release>infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 0
 [INFO] fastdeploy/runtime.cc(304)::fastdeploy::Runtime::Init    Runtime initialized with Backend::OPENVINO in Device::CPU.
 DetectionResult: [xmin, ymin, xmax, ymax, score, label_id]
 415.047180,89.311569, 506.009613, 283.863098, 0.950423, 0
 163.665710,81.914932, 198.585342, 166.760895, 0.896433, 0
 581.788635,113.027618, 612.623474, 198.521713, 0.842596, 0
 267.217224,89.777306, 298.796051, 169.361526, 0.837951, 0
 ......
 153.301407,123.233757, 177.130539, 164.558350, 0.066697, 60
 505.887604,140.919601, 523.167236, 151.875336, 0.084912, 67
 Visualized result saved in ./vis_result.jpg
 ```  
 打开保存的图片查看可视化结果：  
 <div  align="center">  
 <img src="https://user-images.githubusercontent.com/19339784/184326520-7075e907-10ed-4fad-93f8-52d0e35d4964.jpg", width=480px, height=320px />
 </div>
 特别说明，exe运行时所需要的依赖库配置方法，请参考章节: [多种方法配置exe运行时所需的依赖库](#CommandLineDeps)
 ## 4. 多种方法配置exe运行时所需的依赖库
 <div id="CommandLineDeps"></div>  
 说明：对于使用的最新源码编译的SDK或SDK版本>0.2.1的用户，我们推荐使用（4.1）和（4.2）中的方式配置运行时的依赖库。如果使用的SDK版本<=0.2.1，请参考（4.3）和（4.4）中的方式进行配置。
 ### 4.1 方式一：使用 fastdeploy_init.bat 进行配置（推荐）  
 <div id="CommandLineDeps1"></div>  
 对于版本高于0.2.1的SDK，我们提供了 **fastdeploy_init.bat** 工具来管理FastDeploy中所有的依赖库。可以通过该脚本工具查看(show)、拷贝(install) 和 设置(init and setup) SDK中所有的dll，方便用户快速完成运行时环境配置。
 #### 4.1.1 fastdeploy_init.bat 使用说明  
 <div id="CommandLineDeps11"></div>  
 首先进入SDK的根目录，运行以下命令，可以查看 fastdeploy_init.bat 的用法说明
 ```bat
 D:\path-to-your-fastdeploy-sdk-dir>fastdeploy_init.bat help
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [1] [help]    print help information:                      fastdeploy_init.bat help
 [2] [show]    show all dlls/libs/include paths:            fastdeploy_init.bat show fastdeploy-sdk-dir
 [3] [init]    init all dlls paths for current terminal:    fastdeploy_init.bat init fastdeploy-sdk-dir  [WARNING: need copy onnxruntime.dll manually]
 [4] [setup]   setup path env for current terminal:         fastdeploy_init.bat setup fastdeploy-sdk-dir [WARNING: need copy onnxruntime.dll manually]
 [5] [install] install all dlls to a specific dir:          fastdeploy_init.bat install fastdeploy-sdk-dir another-dir-to-install-dlls **[RECOMMEND]**
 [6] [install] install all dlls with logging infos:         fastdeploy_init.bat install fastdeploy-sdk-dir another-dir-to-install-dlls info
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 ```  
 用法简要说明如下：  
 - help:     打印所有的用法说明  
 - show:     查看SDK中所有的 dll、lib 和 include 路径
 - init:     初始化所有dll路径信息，后续用于设置terminal环境变量（不推荐，请参考4.3中关于onnxruntime的说明）
 - setup:    在init之后运行，设置terminal环境便令（不推荐，请参考4.3中关于onnxruntime的说明）  
 - install:  将SDK中所有的dll安装到某个指定的目录（推荐）
 #### 4.1.2  fastdeploy_init.bat 查看 SDK 中所有的 dll、lib 和 include 路径  
 <div id="CommandLineDeps12"></div>  
 进入SDK的根目录，运行show命令，可以查看SDK中所有的 dll、lib 和 include 路径。以下命令中 %cd% 表示当前目录（SDK的根目录）。  
 ```bat
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat show %cd%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [SDK] D:\path-to-fastdeploy-sdk-dir
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [DLL] D:\path-to-fastdeploy-sdk-dir\lib\fastdeploy.dll **[NEEDED]**
 [DLL] D:\path-to-fastdeploy-sdk-dir\third_libs\install\faster_tokenizer\lib\core_tokenizers.dll  **[NEEDED]**
 [DLL] D:\path-to-fastdeploy-sdk-dir\third_libs\install\opencv\build\x64\vc15\bin\opencv_ffmpeg3416_64.dll  **[NEEDED]**
 ......
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [Lib] D:\path-to-fastdeploy-sdk-dir\lib\fastdeploy.lib **[NEEDED][fastdeploy]**
 [Lib] D:\path-to-fastdeploy-sdk-dir\third_libs\install\faster_tokenizer\lib\core_tokenizers.lib  **[NEEDED][fastdeploy::text]**
 [Lib] D:\path-to-fastdeploy-sdk-dir\third_libs\install\opencv\build\x64\vc15\lib\opencv_world3416.lib  **[NEEDED][fastdeploy::vision]**
 ......
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [Include] D:\path-to-fastdeploy-sdk-dir\include **[NEEDED][fastdeploy]**
 [Include] D:\path-to-fastdeploy-sdk-dir\third_libs\install\faster_tokenizer\include  **[NEEDED][fastdeploy::text]**
 [Include] D:\path-to-fastdeploy-sdk-dir\third_libs\install\opencv\build\include  **[NEEDED][fastdeploy::vision]**
 ......
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 [XML] D:\path-to-fastdeploy-sdk-dir\third_libs\install\openvino\runtime\bin\plugins.xml **[NEEDED]**
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 ```  
 可以看到该命令会根据您当前的SDK，输出对应的信息，包含 dll、lib 和 include 的路径信息。对于 dll，被标记为 `[NEEDED]`的，是运行时所需要的，如果包含OpenVINO后端，还需要将他的plugins.xml拷贝到exe所在的目录；对于 lib 和 include，被标记为`[NEEDED]`的，是开发时所需要配置的最小依赖。并且，我们还增加了对应的API Tag标记，如果您只使用vision API，则只需要配置标记为 `[NEEDED][fastdeploy::vision]` 的 lib 和 include 路径.  
 #### 4.1.3 fastdeploy_init.bat 安装 SDK 中所有的 dll 到指定的目录 （推荐）
 <div id="CommandLineDeps13"></div>  
 进入SDK的根目录，运行install命令，可以将SDK 中所有的 dll 安装到指定的目录（如exe所在的目录）。我们推荐这种方式来配置exe运行所需要的依赖库。比如，可以在SDK根目录下创建一个临时的bin目录备份所有的dll文件。以下命令中 %cd% 表示当前目录（SDK的根目录）。
 ```bat  
 % info参数为可选参数，添加info参数后会打印详细的安装信息 %
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat install %cd% bin
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat install %cd% bin info
 ```
 ```bat
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat install %cd% bin
 [INFO] Do you want to install all FastDeploy dlls ?
 [INFO] From: D:\path-to-fastdeploy-sdk-dir
 [INFO]   To: bin
 Choose y means YES, n means NO: [y/n]y
 YES.
 请按任意键继续. . .
 [INFO] Created bin done!
 已复制         1 个文件。
 已复制         1 个文件。
 已复制         1 个文件。
 已复制         1 个文件。
 .....
 已复制         1 个文件。
 已复制         1 个文件。
 已复制         1 个文件。
 已复制         1 个文件。
 .....
 ```  
 #### 4.1.4 fastdeploy_init.bat 配置 SDK 环境变量  
 <div id="CommandLineDeps14"></div>  
 您也可以选择通过配置环境变量的方式来设置运行时的依赖库环境，这种方式只在当前的terminal有效。如果您使用的SDK中包含了onnxruntime推理后端，我们不推荐这种方式，详细原因请参考（4.3）中关于onnxruntime配置的说明（需要手动拷贝onnxruntime所有的dll到exe所在的目录）。配置 SDK 环境变量的方式如下。以下命令中 %cd% 表示当前目录（SDK的根目录）。
 ```bat
 % 先运行 init 初始化当前SDK所有的dll文件路径 %
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat init %cd%
 % 再运行 setup 完成 SDK 环境变量配置  %
 D:\path-to-fastdeploy-sdk-dir>fastdeploy_init.bat setup %cd%
 ```
 ### 4.2 方式二：修改CMakeLists.txt，一行命令配置（推荐）
 <div id="CommandLineDeps2"></div>  
 考虑到Windows下C++开发的特殊性，如经常需要拷贝所有的lib或dll文件到某个指定的目录，FastDeploy提供了`install_fastdeploy_libraries`的cmake函数，方便用户快速配置所有的dll。修改ppyoloe的CMakeLists.txt，添加：  
 ```cmake
 install_fastdeploy_libraries(${CMAKE_CURRENT_BINARY_DIR}/Release)
 ```
 注意，该方式仅在最新的代码编译的SDK或版本>0.2.1下有效。  
 ### 4.3 方式三：命令行设置环境变量
 <div id="CommandLineDeps3"></div>  
 编译好的exe保存在Release目录下，在运行demo前，需要将模型和测试图片拷贝至该目录。另外，需要在终端指定DLL的搜索路径。请在build目录下执行以下命令。
 ```bat
 set FASTDEPLOY_HOME=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.1
 set PATH=%FASTDEPLOY_HOME%\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\onnxruntime\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\bin;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\paddle\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mkldnn\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mklml\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle2onnx\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\tensorrt\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\third_party\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\yaml-cpp\lib;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\bin;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\3rdparty\tbb\bin;%PATH%
 ```  
 注意，需要拷贝onnxruntime.dll到exe所在的目录。
 ```bat
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\onnxruntime\lib\onnxruntime* Release\
 ```  
 由于较新的Windows在System32系统目录下自带了onnxruntime.dll，因此就算设置了PATH，系统依然会出现onnxruntime的加载冲突。因此需要先拷贝demo用到的onnxruntime.dll到exe所在的目录。如下
 ```bat
 where onnxruntime.dll
 C:\Windows\System32\onnxruntime.dll  # windows自带的onnxruntime.dll
 ```  
 另外，注意，如果是自行编译最新的SDK或版本>0.2.1，opencv和openvino目录结构有所改变，路径需要做出适当的修改。如：  
 ```bat  
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\opencv\build\x64\vc15\bin;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\bin;%PATH%
 set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\3rdparty\tbb\bin;%PATH%
 ```
 可以把上述命令拷贝并保存到build目录下的某个bat脚本文件中(包含copy onnxruntime)，如`setup_fastdeploy_dll.bat`，方便多次使用。
 ```bat
 setup_fastdeploy_dll.bat
 ```
 ### 4.4 方式四：手动拷贝依赖库到exe的目录下
 <div id="CommandLineDeps4"></div>  
 手动拷贝，或者在build目录下执行以下命令：
 ```bat
 set FASTDEPLOY_HOME=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.1
 copy /Y %FASTDEPLOY_HOME%\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\onnxruntime\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\bin\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\paddle_inference\paddle\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mkldnn\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mklml\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\paddle2onnx\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\tensorrt\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\third_party\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\yaml-cpp\lib\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\openvino\bin\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\openvino\bin\*.xml Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\openvino\3rdparty\tbb\bin\*.dll Release\
 ```
 另外，注意，如果是自行编译最新的SDK或版本>0.2.1，opencv和openvino目录结构有所改变，路径需要做出适当的修改。如：  
 ```bat  
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\opencv\build\x64\vc15\bin\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\bin\*.dll Release\
 copy /Y %FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\3rdparty\tbb\bin\*.dll Release\
 ```
 可以把上述命令拷贝并保存到build目录下的某个bat脚本文件中，如`copy_fastdeploy_dll.bat`，方便多次使用。
 ```bat
 copy_fastdeploy_dll.bat
 ```
 特别说明：上述的set和copy命令对应的依赖库路径，需要用户根据自己使用SDK中的依赖库进行适当地修改。比如，若是CPU版本的SDK，则不需要TensorRT相关的设置。
--- a/docs/develop.md
+++ b/docs/develop.md
@@ -1,14 +0,0 @@
 # 代码提交说明
 FastDeploy使用clang-format, cpplint检查和格式化代码，提交代码前，需安装pre-commit
 ```bash
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy
 git checkout develop
 pip install pre-commit
 pip install yapf
 pip install cpplint
 pre-commit install
 ```
 commit代码时，若提示无法找到clang-format，请自行安装clang-format
--- a/docs/docs_en/api/function.md
+++ b/docs/docs_en/api/function.md
@@ -1,279 +0,0 @@
 # FDTensor C++ Tensor quantization function
 FDTensor is FastDeploy's struct that represents the tensor at the C++ level. The struct is mainly used to manage the input and output data of the model during inference deployment and supports different Runtime backends.
 In the development of C++-based inference deployment applications, developers often need to process some data on the input and output to get the actual input or the actual output of the application. This pre-processing data logic can easily be done by the original C++ standard library. But it can be difficult to develop, e.g. to find the maximum value for the 2nd dimension of a 3-dimensional Tensor. To solve this problem, FastDeploy has developed a set of C++ tensor functions based on FDTensor to reduce costs and increase efficiency for FastDeploy developers. There are currently four main functions: Reduce, Manipulate, Math and Elementwise.
 ## Reduce Class Function
 Currently FastDeploy supports 7 types of Reduce class functions ：Max, Min, Sum, All, Any, Mean, Prod.
 ### Max
 #### Function Signature
 ```c++
 /** Excute the maximum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Max(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the max value for axis 0 of `inputs`
 // The output result would be [[7, 4, 5]].
 Max(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Min
 #### Function Signature
 ```c++
 /** Excute the minimum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Min(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the min value for axis 0 of `inputs`
 // The output result would be [[2, 1, 3]].
 Min(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Sum
 #### Function Signature
 ```c++
 /** Excute the sum operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Sum(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the sum value for axis 0 of `inputs`
 // The output result would be [[9, 5, 8]].
 Sum(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Mean
 #### Function Signature
 ```c++
 /** Excute the mean operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Mean(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the mean value for axis 0 of `inputs`
 // The output result would be [[4, 2, 4]].
 Mean(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Prod
 #### Function Signature
 ```c++
 /** Excute the product operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Prod(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
 // Calculate the product value for axis 0 of `inputs`
 // The output result would be [[14, 4, 15]].
 Prod(input, &output, {0}, /* keep_dim = */true);
 ```
 ### Any
 #### Function Signature
 ```c++
 /** Excute the any operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void Any(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
 input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
 // Calculate the any value for axis 0 of `inputs`
 // The output result would be [[true, false, true]].
 Any(input, &output, {0}, /* keep_dim = */true);
 ```
 ### All
 #### Function Signature
 ```c++
 /** Excute the all operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which will be reduced.
    @param keep_dim Whether to keep the reduced dims, default false.
    @param reduce_all Whether to reduce all dims, default false.
 */
 void All(const FDTensor& x, FDTensor* out,
         const std::vector<int64_t>& dims,
         bool keep_dim = false, bool reduce_all = false);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
 input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
 // Calculate the all value for axis 0 of `inputs`
 // The output result would be [[false, false, true]].
 All(input, &output, {0}, /* keep_dim = */true);
 ```
 ## Manipulate Class Function
 Currently FastDeploy supports 1 Manipulate class function: Transpose.
 ### Transpose
 #### Function Signature
 ```c++
 /** Excute the transpose operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param dims The vector of axis which the input tensor will transpose.
 */
 void Transpose(const FDTensor& x, FDTensor* out,
               const std::vector<int64_t>& dims);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 std::vector<float> inputs = {2, 4, 3, 7, 1, 5};
 input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
 // Transpose the input tensor with axis {1, 0}.
 // The output result would be [[2, 7], [4, 1], [3, 5]]
 Transpose(input, &output, {1, 0});
 ```
 ## Math Class Function
 Currently FastDeploy supports 1 Math class function: Softmax.
 ### Softmax
 #### Function Signature
 ```c++
 /** Excute the softmax operation for input FDTensor along given dims.
    @param x The input tensor.
    @param out The output tensor which stores the result.
    @param axis The axis to be computed softmax value.
 */
 void Softmax(const FDTensor& x, FDTensor* out, int axis = -1);
 ```
 #### Demo
 ```c++
 FDTensor input, output;
 CheckShape check_shape;
 CheckData check_data;
 std::vector<float> inputs = {1, 2, 3, 4, 5, 6};
 input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
 // Transpose the input tensor with axis {1, 0}.
 // The output result would be
 // [[0.04742587, 0.04742587, 0.04742587],
 //  [0.95257413, 0.95257413, 0.95257413]]
 Softmax(input, &output, 0);
 ```
 ## Elementwise Class Function
 To be continued...
--- a/docs/docs_en/api/runtime/runtime.md
+++ b/docs/docs_en/api/runtime/runtime.md
@@ -1,90 +0,0 @@
 # Runtime
 After configuring `RuntimeOption`, developers can create Runtime for model inference on different hardware based on different backends.
 ## Python Class
 ```
 class Runtime(runtime_option)
 ```
 **Parameters**
 > * **runtime_option**(fastdeploy.RuntimeOption): Configured RuntimeOption class and instance.
 ### Member function
 ```
 infer(data)
 ```
 Model inference based on input data
 **Parameters**
 > * **data**(dict({str: np.ndarray}): Input dict data, and key is input name, value is np.ndarray
 **Return Value**
 Returns a list, whose length equals the number of elements in the original model; elements in the list are np.ndarray
 ```
 num_inputs()
 ```
 Input number that returns to the model
 ```
 num_outputs()
 ```
 Output number that returns to the model
 ## C++  Class
 ```
 class Runtime
 ```
 ### Member function
 ```
 bool Init(const RuntimeOption& runtime_option)
 ```
 Model loading initialization
 **Parameters**
 > * **runtime_option**: Configured RuntimeOption class and instance
 **Return Value**
 Returns TRUE for successful initialisation, FALSE otherwise
 ```
 bool Infer(vector<FDTensor>& inputs, vector<FDTensor>* outputs)
 ```
 Inference from the input and write the result to outputs
 **Parameters**
 > * **inputs**: Input data
 > * **outputs**: Output data
 **Return Value**
 Returns TRUE for successful inference, FALSE otherwise
 ```
 int NumInputs()
 ```
 Input number that returns to the model
 ```
 input NumOutputs()
 ```
 Output number that returns to the model
--- a/docs/docs_en/api/runtime/runtime_option.md
+++ b/docs/docs_en/api/runtime/runtime_option.md
@@ -1,265 +0,0 @@
 # RuntimeOption
 `RuntimeOption` is used to configure the inference parameters of the model on different backends and hardware.
 ## Python  Class
 ```
 class RuntimeOption()
 ```
 ### Member function
 ```
 set_model_path(model_file, params_file="", model_format="paddle")
 ```
 Set the model path for loading
 **Parameters**
 > * **model_file**(str): Model file path
 > * **params_file**(str): Parameter file path. This parameter is not required for onnx model format
 > * **model_format**(str): Model format. The model supports paddle, onnx format (Paddle by default).
 ```
 use_gpu(device_id=0)
 ```
 Inference on GPU
 **Parameters**
 > * **device_id**(int): When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.
 ```
 use_cpu()
 ```
 Inference on CPU
 ```
 set_cpu_thread_num(thread_num=-1)
 ```
 Set the number of threads on the CPU for inference
 **Parameters**
 > * **thread_num**(int): Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1
 ```
 use_paddle_backend()
 ```
 Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).
 ```
 use_ort_backend()
 ```
 Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).
 ```
 use_trt_backend()
 ```
 Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)
 ```
 use_openvino_backend()
 ```
 Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)
 ```
 set_paddle_mkldnn(pd_mkldnn=True)
 ```
 When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.
 ```
 enable_paddle_log_info()
 disable_paddle_log_info()
 ```
 When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.
 ```
 set_paddle_mkldnn_cache_size(cache_size)
 ```
 When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration
 **Parameters**
 > * **cache_size**(int): Cache size
 ```
 set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)
 ```
 When using the TensorRT backend, this interface is used to set the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.
 FastDeploy will automatically update the shape range during the inference process according to the real-time data. But it will lead to a rebuilding of the back-end engine when it encounters a new shape range, costing more time. It is advisable to configure this interface in advance to avoid engine rebuilding during the inference process.
 **Parameters**
 > * **tensor_name**(str): tensor name of the range
 > * **min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]
 > * **opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.
 > * **max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.
 ```
 set_trt_cache_file(cache_file_path)
 ```
 When using the TensorRT backend, developers can use this interface to cache the built TensorRT model engine to the designated path, or skip the building engine step and load the locally cached TensorRT model directly
 - When this interface is called and `cache_file_path` does not exist, FastDeploy will build the TensorRT model and save the built model to `cache_file_path`
 - When this interface is called and `cache_file_path` exists, FastDeploy will directly load the built TensorRT model stored in `cache_file_path`, thus greatly reducing the time spent on model load initialization.
 This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset `set_trt_input_shape`, as well as replace the original paddle or onnx model, it is better to delete the `cache_file_path` file that has been cached locally first to avoid reloading the old cache, which could affect the program working.
 **Parameters**
 > * **cache_file_path**(str): cache file path. e.g.`/Downloads/resnet50.trt`
 ```
 enable_trt_fp16()
 disable_trt_fp16()
 ```
 When using the TensorRT backend, turning half-precision inference acceleration on or off via this interface brings a significant performance boost. However, half-precision inference is not supported on all GPUs. On GPUs that do not support half-precision inference, it will fall back to FP32 inference and give the prompt `Detected FP16 is not supported in the current GPU, will use FP32 instead.`
 ## C++ Struct
 ```
 struct RuntimeOption
 ```
 ### Member function
 ```
 void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")
 ```
 Set the model path for loading
 **Parameters**
 > * **model_file**: Model file path
 > * **params_file**: Parameter file path. This parameter could be optimized as "" for onnx model format
 > * **model_format**: Model format. The model supports paddle, onnx format (Paddle by default).
 ```
 void UseGpu(int device_id = 0)
 ```
 Set to inference on GPU
 **Parameters**
 > * **device_id**: 0When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.
 ```
 void UseCpu()
 ```
 Set to inference on CPU
 ```
 void SetCpuThreadNum(int thread_num=-1)
 ```
 Set the number of threads on the CPU for inference
 **Parameters**
 > * **thread_num**: Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1
 ```
 void UsePaddleBackend()
 ```
 Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).
 ```
 void UseOrtBackend()
 ```
 Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).
 ```
 void UseTrtBackend()
 ```
 Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)
 ```
 void UseOpenVINOBackend()
 ```
 Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)
 ```
 void SetPaddleMKLDNN(bool pd_mkldnn = true)
 ```
 When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.
 ```
 void EnablePaddleLogInfo()
 void DisablePaddleLogInfo()
 ```
 When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.
 ```
 void SetPaddleMKLDNNCacheSize(int cache_size)
 ```
 When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration
 **Parameters**
 > * **cache_size**: Cache size
 ```
 void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
                      const vector<int32_t>& opt_shape = vector<int32_t>(),
                      const vector<int32_t>& opt_shape = vector<int32_t>())
 ```
 When using the TensorRT backend, this interface sets the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.
 FastDeploy will automatically update the shape range during the inference process according to the real-time data. But it will lead to a rebuilding of the back-end engine when it encounters a new shape range, costing more time. It is advisable to configure this interface in advance to avoid engine rebuilding during the inference process.
 **Parameters**
 > - **tensor_name**(str): tensor name of the range
 > - **min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]
 > - **opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.
 > - **max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.
 ```
 void SetTrtCacheFile(const string& cache_file_path)
 ```
 When using the TensorRT backend, developers can use this interface to cache the built TensorRT model engine to the designated path, or skip the building engine step and load the locally cached TensorRT model directly
 - When this interface is called and `cache_file_path` does not exist, FastDeploy will build the TensorRT model and save the built model to `cache_file_path`
 - When this interface is called and `cache_file_path` exists, FastDeploy will directly load the built TensorRT model stored in `cache_file_path`, thus greatly reducing the time spent on model load initialization.
 This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset `SetTrtInputShape`, as well as replace the original paddle or onnx model, it is better to delete the `cache_file_path` file that has been cached locally first to avoid reloading the old cache, which could affect the program working.
 **Parameters**
 > * **cache_file_path**: cache file path, such as `/Downloads/resnet50.trt`
 ```
 void EnableTrtFp16()
 void DisableTrtFp16()
 ```
 When using the TensorRT backend, turning half-precision inference acceleration on or off via this interface brings a significant performance boost. However, half-precision inference is not supported on all GPUs. On GPUs that do not support half-precision inference, it will fall back to FP32 inference and give the prompt `Detected FP16 is not supported in the current GPU, will use FP32 instead.`
--- a/docs/docs_en/api/runtime_option.md
+++ b/docs/docs_en/api/runtime_option.md
@@ -1,138 +0,0 @@
 # RuntimeOption Inference Backend Deployment
 The Runtime in the FastDeploy product contains multiple inference backends:
 | Model Format\Inference Backend | ONNXRuntime                    | Paddle Inference                           | TensorRT                       | OpenVINO |
 |:------------------------------ |:------------------------------ |:------------------------------------------ |:------------------------------ |:-------- |
 | Paddle                         | Support (built-in Paddle2ONNX) | Support                                    | Support (built-in Paddle2ONNX) | Support  |
 | ONNX                           | Support                        | Support (requires conversion via X2Paddle) | Support                        | Support  |
 The hardware supported by Runtime is as follows
 | Hardware/Inference Backend | ONNXRuntime | Paddle Inference | TensorRT    | OpenVINO |
 |:-------------------------- |:----------- |:---------------- |:----------- |:-------- |
 | CPU                        | Support     | Support          | Not Support | Support  |
 | GPU                        | Support     | Support          | Support     | Support  |
 Each model uses `RuntimeOption` to configure the inference backend and parameters, e.g. in python, the inference configuration can be printed after loading the model with the following code
 ```python
 model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
 print(model.runtime_option)
 ```
 See below:
 ```python
 RuntimeOption(
  backend : Backend.ORT                # Inference Backend ONNXRuntime
  cpu_thread_num : 8                   # Number of CPU threads (valid only when using CPU)
  device : Device.CPU                  # Inference hardware is CPU
  device_id : 0                        # Inference hardware id (for GPU)
  model_file : yolov5s.onnx            # Path to the model file
  params_file :                        # Parameter file path
  model_format : ModelFormat.ONNX         # odel format
  ort_execution_mode : -1              # The prefix ort indicates ONNXRuntime backend parameters
  ort_graph_opt_level : -1
  ort_inter_op_num_threads : -1
  trt_enable_fp16 : False              # The prefix of trt indicates a TensorRT backend  parameter
  trt_enable_int8 : False
  trt_max_workspace_size : 1073741824
  trt_serialize_file :
  trt_fixed_shape : {}
  trt_min_shape : {}
  trt_opt_shape : {}
  trt_max_shape : {}
  trt_max_batch_size : 32
 )
 ```
 ## Python
 ### RuntimeOption Class
 `fastdeploy.RuntimeOption()`Configuration
 #### Configuration options
 > * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO`
 > * **cpu_thread_num**(int): Number of CPU inference threads, valid only on CPU inference
 > * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU`
 > * **device_id**(int): Device id, used on GPU
 > * **model_file**(str): Model file path
 > * **params_file**(str): Parameter file path
 > * **model_format**(ModelFormat): Model format, `fd.ModelFormat.PADDLE`/`fd.ModelFormat.ONNX`
 > * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
 > * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
 > * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators
 > * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference
 > * **trt_enable_int8**(bool):TensorRT turns on INT8 inference
 > * **trt_max_workspace_size**(int):  `max_workspace_size` parameter configured on TensorRT
 > * **trt_fixed_shape**(dict[str : list[int]]):When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
 > * **trt_min_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
 > * **trt_opt_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
 > * **trt_max_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
 > * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference
 ```python
 import fastdeploy as fd
 option = fd.RuntimeOption()
 option.backend = fd.Backend.TRT
 # When using a TRT backend with a dynamic input shape
 # Configure input shape information
 option.trt_min_shape = {"x": [1, 3, 224, 224]}
 option.trt_opt_shape = {"x": [4, 3, 224, 224]}
 option.trt_max_shape = {"x": [8, 3, 224, 224]}
 model = fd.vision.classification.PaddleClasModel(
    "resnet50/inference.pdmodel",
    "resnet50/inference.pdiparams",
    "resnet50/inference_cls.yaml",
    runtime_option=option)
 ```
 ## C++
 ### RuntimeOption  Struct
 `fastdeploy::RuntimeOption()`Configuration options
 #### Configuration options
 > * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO`
 > * **cpu_thread_num**(int): 、Number of CPU inference threads, valid only on CPU inference
 > * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU`
 > * **device_id**(int): Device id, used on GPU
 > * **model_file**(string): Model file path
 > * **params_file**(string): Parameter file path
 > * **model_format**(fastdeploy::ModelFormat): Model format,`ModelFormat::PADDLE`/`ModelFormat::ONNX`
 > * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
 > * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
 > * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators
 > * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference
 > * **trt_enable_int8**(bool): TensorRT turns on INT8 inference
 > * **trt_max_workspace_size**(int): `max_workspace_size` parameter configured on TensorRT
 > * **trt_fixed_shape**(map<string, vector<int>>): When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
 > * **trt_min_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
 > * **trt_opt_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
 > * **trt_max_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
 > * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference
 ```c++
 #include "fastdeploy/vision.h"
 int main() {
  auto option = fastdeploy::RuntimeOption();
  option.trt_min_shape["x"] = {1, 3, 224, 224};
  option.trt_opt_shape["x"] = {4, 3, 224, 224};
  option.trt_max_shape["x"] = {8, 3, 224, 224};
  auto model = fastdeploy::vision::classification::PaddleClasModel(
                           "resnet50/inference.pdmodel",
                           "resnet50/inference.pdiparams",
                           "resnet50/inference_cls.yaml",
                            option);
  return 0;
 }
 ```
--- a/docs/docs_en/api/text_results/README.md
+++ b/docs/docs_en/api/text_results/README.md
@@ -1,7 +0,0 @@
 # Natural Language Processing Inference Results
 FastDeploy defines different structs to represent the model inference results according to the task type of the natural language processing model. The details are shown as follow.
 | Struct    | Doc                           | Description       | Related Model |
 |:--------- |:----------------------------- |:----------------- |:------------- |
 | UIEResult | [C++/Python](./uie_result.md) | UIE model results | UIE Model     |
--- a/docs/docs_en/api/text_results/uie_result.md
+++ b/docs/docs_en/api/text_results/uie_result.md
@@ -1,34 +0,0 @@
 # UIEResult Image - Claasification Results
 The UIEResult function is defined in `fastdeploy/text/uie/model.h`, indicating the UIE model abstraction results and confidence levels.
 ## C++ Definition
 `fastdeploy::text::UIEResult`
 ```c++
 struct UIEResult {
  size_t start_;
  size_t end_;
  double probability_;
  std::string text_;
  std::unordered_map<std::string, std::vector<UIEResult>> relation_;
  std::string Str() const;
 };
 ```
 - **start_**: Member variable that indicates the starting position of the abstraction result text_ in the original text (Unicode encoding).
 - **end**: Member variable that indicates the ending position of the abstraction result text_ in the original text (Unicode encoding).
 - **text_**: Member function that indicates the result of the abstraction, saved in UTF-8 format.
 - **relation_**: Member function that indicates the current result association. It is commonly used for relationship abstraction.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 `fastdeploy.text.C.UIEResult`
 - **start_**(int): Member variable that indicates the starting position of the abstraction result text_ in the original text (Unicode encoding).
 - **end**(int): Member variable that indicates the ending position of the abstraction result text_ in the original text (Unicode encoding).
 - **text_**(str): Member function that indicates the result of the abstraction, saved in UTF-8 format.
 - **relation_**(dict(str, list(fastdeploy.text.C.UIEResult))): Member function that indicates the current result association. It is commonly used for relationship abstraction.
 - **get_dict()**: give fastdeploy.text.C.UIEResult in dict format.
--- a/docs/docs_en/api/vision_results/README.md
+++ b/docs/docs_en/api/vision_results/README.md
@@ -1,13 +0,0 @@
 # Vision Model Inference Results
 FastDeploy defines different structs (`fastdeploy/vision/common/result.h`) to demonstrate the model inference results according to the task types of vision models. The details are as follows
 | Struct                | Doc                                        | Description                                                                  | Related Models          |
 |:--------------------- |:------------------------------------------ |:---------------------------------------------------------------------------- |:----------------------- |
 | ClassifyResult        | [C++/Python](./classification_result.md)   | Image classification results                                                 | ResNet50、MobileNetV3    |
 | SegmentationResult    | [C++/Python](./segmentation_result.md)     | Image segmentation results                                                   | PP-HumanSeg、PP-LiteSeg  |
 | DetectionResult       | [C++/Python](./detection_result.md)        | Object detection results                                                     | PPYOLOE、YOLOv7 Series   |
 | FaceDetectionResult   | [C++/Python](./face_detection_result.md)   | Object detection results                                                     | SCRFD、RetinaFace Series |
 | FaceRecognitionResult | [C++/Python](./face_recognition_result.md) | Object detection results                                                     | ArcFace、CosFace Series  |
 | MattingResult         | [C++/Python](./matting_result.md)          | Object detection results                                                     | MODNet Series           |
 | OCRResult             | [C++/Python](./ocr_result.md)              | Text box detection, classification and optical character recognition results | OCR Series              |
--- a/docs/docs_en/api/vision_results/classification_result.md
+++ b/docs/docs_en/api/vision_results/classification_result.md
@@ -1,28 +0,0 @@
 # Image Classification Results - ClassifyResult
 The ClassifyResult function is defined in `fastdeploy/vision/common/result.h` , indicating the classification results and confidence level of the image.
 ## C++ Definition
 `fastdeploy::vision::ClassifyResult`
 ```c++
 struct ClassifyResult {
  std::vector<int32_t> label_ids;
  std::vector<float> scores;
  void Clear();
  std::string Str();
 };
 ```
 - **label_ids**: Member variable, indicating the classification results for a single image. The number of member variables is determined by the topk input when using the classification model, e.g. the top 5 classification results.
 - **scores**: Member variable, indicating the confidence level of a single image on the corresponding classification result. The number of member variables is determined by the topk input when using the classification model, e.g. the top 5 classification confidence level results.
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 `fastdeploy.vision.ClassifyResult`
 - **label_ids**(list of int): Member variable, indicating the classification results for a single image. The number of member variables is determined by the topk input when using the classification model, e.g. the top 5 classification results.
 - **scores**(list of float): Member variable, indicating the confidence level of a single image on the corresponding classification result. The number of member variables is determined by the topk input when using the classification model, e.g. the top 5 classification confidence level results.
--- a/docs/docs_en/api/vision_results/detection_result.md
+++ b/docs/docs_en/api/vision_results/detection_result.md
@@ -1,67 +0,0 @@
 # Detection Results
 The DetectionResult function is defined in `fastdeploy/vision/common/result.h` , indicating the object's frame, class and confidence level from the image detection.
 ## C++  Definition
 ```c++
 fastdeploy::vision::DetectionResult
 ```
 ```c++
 struct DetectionResult {
  std::vector<std::array<float, 4>> boxes;
  std::vector<float> scores;
  std::vector<int32_t> label_ids;
  std::vector<Mask> masks;
  bool contain_masks = false;
  void Clear();
  std::string Str();
 };
 ```
 - **boxes**: Member variable that indicates the coordinates of all object boxes detected from an image.`boxes.size()` indicates the number of boxes, and each box is represented by 4 float values in the order xmin, ymin, xmax, ymax, i.e. the top left and bottom right coordinates.
 - **scores**: Member variable that indicates the confidence level of all objects detected from a single image, with the same number of elements as `boxes.size()`.
 - **label_ids**: Member variable that indicates all object classes detected from a single image, with the same number of elements as `boxes.size()`
 - **masks**: Member variable that indicates all cases of mask detected from a single image, with the same number of elements and shape size as `boxes`.
 - **contain_masks**: Member variable that indicates whether the detection result contains a mask, whose result is generally true for segmentation models.
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ```c++
 fastdeploy::vision::Mask
 ```
 ```c++
 struct Mask {
  std::vector<int32_t> data;
  std::vector<int64_t> shape;  // (H,W) ...
  void Clear();
  std::string Str();
 };
 ```
 - **data**: Member variable that represents a detected mask
 - **shape**: Member variable that indicates the shape of the mask, e.g. (h,w)
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 ```python
 fastdeploy.vision.DetectionResult  
 ```
 - **boxes**(list of list(float)): Member variable that indicates the coordinates of all object boxes detected from an image. Boxes are a list, with each element being a 4-length list presented as a box with 4 float values for xmin, ymin, xmax, ymax, i.e. the top left and bottom right coordinates.
 - **scores**(list of float): Member variable that indicates the confidence level of all objects detected from a single image
 - **label_ids**(list of int): Member variable that indicates all object classes detected from a single image
 - **masks**: Member variable that indicates all cases of mask detected from a single image, with the same number of elements and shape size as `boxes`.
 - **contain_masks**: Member variable that indicates whether the detection result contains a mask, whose result is generally true for segmentation models.
 ```python
 fastdeploy.vision.Mask  
 ```
 - **data**: Member variable that represents a detected mask
 - **shape**: Member variable that indicates the shape of the mask, e.g. (h,w)
--- a/docs/docs_en/api/vision_results/face_detection_result.md
+++ b/docs/docs_en/api/vision_results/face_detection_result.md
@@ -1,34 +0,0 @@
 # Face Detection Results
 The FaceDetectionResult function is defined in `fastdeploy/vision/common/result.h` , indicating the object's frame, face landmarks , target confidence and the number of landmarks for each face detected.
 ## C++  Definition
 `fastdeploy::vision::FaceDetectionResult`
 ```c++
 struct FaceDetectionResult {
  std::vector<std::array<float, 4>> boxes;
  std::vector<std::array<float, 2>> landmarks;
  std::vector<float> scores;
  int landmarks_per_face;
  void Clear();
  std::string Str();
 };
 ```
 - **boxes**: Member variable that indicates the coordinates of all object boxes detected from an image.`boxes.size()` indicates the number of boxes, and each box is represented by 4 float values in the order xmin, ymin, xmax, ymax, i.e. the top left and bottom right coordinates.
 - **scores**: Member variable that indicates the confidence level of all objects detected from a single image, with the same number of elements as `boxes.size()`.
 - **landmarks**: Member variable that indicates the key points of all faces detected in a single image, with the same number of elements as `boxes.size()`.
 - **landmarks_per_face**: Member variable that indicates the number of key points in each face frame.
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug).
 ## Python Definition
 `fastdeploy.vision.FaceDetectionResult`
 - **boxes**(list of list(float)): Member variable that indicates the coordinates of all object boxes detected from an image. Boxes are a list, with each element being a 4-length list presented as a box with 4 float values for xmin, ymin, xmax, ymax, i.e. the top left and bottom right coordinates.
 - **scores**(list of float): Member variable that indicates the confidence level of all objects detected from a single image.
 - **landmarks**(list of list(float)): Member variable that indicates the key points of all faces detected in a single image.
 - **landmarks_per_face**(int): Member variable that indicates the number of key points in each face frame.
--- a/docs/docs_en/api/vision_results/face_recognition_result.md
+++ b/docs/docs_en/api/vision_results/face_recognition_result.md
@@ -1,25 +0,0 @@
 # Face Recognition Results
 The FaceRecognitionResult function is defined in `fastdeploy/vision/common/result.h` , indicating the embedding of image features by the face recognition model.
 ## C++  Definition
 `fastdeploy::vision::FaceRecognitionResult`
 ```c++
 struct FaceRecognitionResult {
  std::vector<float> embedding;
  void Clear();
  std::string Str();
 };
 ```
 - **embedding**: Member variable that indicates the final abstracted feature embedding by the face recognition model, which can be used to calculate the feature similarity between faces.
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug).
 ## Python Definition
 `fastdeploy.vision.FaceRecognitionResult`
 - **embedding**(list of float): Member variable that indicates the final abstracted feature embedding by the face recognition model, which can be used to calculate the feature similarity between faces.
--- a/docs/docs_en/api/vision_results/matting_result.md
+++ b/docs/docs_en/api/vision_results/matting_result.md
@@ -1,34 +0,0 @@
 # Matting Results
 The MattingResult function is defined in `fastdeploy/vision/common/result.h` , indicating the value of alpha transparency predicted by the model, the predicted foreground.
 ## C++ Definition
 `fastdeploy::vision::MattingResult`
 ```c++
 struct MattingResult {
  std::vector<float> alpha;
  std::vector<float> foreground;
  std::vector<int64_t> shape;
  bool contain_foreground = false;
  void Clear();
  std::string Str();
 };
 ```
 - **alpha**: a one-dimensional vector of predicted alpha transparency value in the range [0.,1.] and length hxw, with h,w being the height and width of the input image
 - **foreground**: a one-dimensional vector for the predicted foreground, with a value range of [0.,255.] and a length of hxwxc. 'h,w' is the height and width of the input image, and c=3 in general. The foreground feature is not always available. It is only valid if the model predicts the foreground
 - **contain_foreground**: indicates whether the predicted result contains a foreground
 - **shape**: indicates the results shape. When contain_foreground is false, the shape only contains (h,w); when contain_foreground is true, the shape contains (h,w,c), and c is generally 3
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 `fastdeploy.vision.MattingResult`
 - **alpha**(list of float): a one-dimensional vector of predicted alpha transparency value in the range [0.,1.] and length hxw, with h,w being the height and width of the input image.
 - **foreground**(list of float): a one-dimensional vector for the predicted foreground, with a value range of [0.,255.] and a length of hxwxc. 'h,w' is the height and width of the input image, and c=3 in general. The foreground feature is not always available. It is only valid if the model predicts the foreground.
 - **contain_foreground**(bool): indicates whether the predicted result contains a foreground
 - **shape**(list of int): indicates the results shape. When contain_foreground is false, the shape only contains (h,w); when contain_foreground is true, the shape contains (h,w,c), and c is generally 3
--- a/docs/docs_en/api/vision_results/ocr_result.md
+++ b/docs/docs_en/api/vision_results/ocr_result.md
@@ -1,42 +0,0 @@
 # OCR Results
 The OCRResult function is defined in `fastdeploy/vision/common/result.h` , indicating the text box detected from the image, the text box direction classification, and the text content inside the text box.
 ## C++ Definition
 ```c++
 fastdeploy::vision::OCRResult
 ```
 ```c++
 struct OCRResult {
  std::vector<std::array<int, 8>> boxes;
  std::vector<std::string> text;
  std::vector<float> rec_scores;
  std::vector<float> cls_scores;
  std::vector<int32_t> cls_labels;
  ResultType type = ResultType::OCR;
  void Clear();
  std::string Str();
 };
 ```
 - **boxes**: Member variable that indicates the coordinates of all object boxes detected in a single image. `boxes.size()` indicates the number of boxes detected in a single image, with each box's 4 coordinate points being represented in order of 8 int values: lower left, lower right, upper right, upper left.
 - **text**: Member variable that indicates the text content of multiple identified text boxes, with the same number of elements as `boxes.size()`.
 - **rec_scores**: Member variable that indicates the confidence level of the text identified in the text box, with the same number of elements as `boxes.size()`.
 - **cls_scores**: Member variable that indicates the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()`.
 - **cls_labels**: Member variable that indicates the direction classification of the text box, with the same number of elements as `boxes.size()`.
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 ```python
 fastdeploy.vision.OCRResult  
 ```
 - **boxes**: Member variable that indicates the coordinates of all object boxes detected in a single image. `boxes.size()` indicates the number of boxes detected in a single image, with each box's 4 coordinate points being represented in order of 8 int values: lower left, lower right, upper right, upper left.
 - **text**: Member variable that indicates the text content of multiple identified text boxes, with the same number of elements as `boxes.size()`.
 - **rec_scores**: Member variable that indicates the confidence level of the text identified in the text box, with the same number of elements as `boxes.size()`.
 - **cls_scores**: Member variable that indicates the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()`.
 - **cls_labels**: Member variable that indicates the direction classification of the text box, with the same number of elements as `boxes.size()`.
--- a/docs/docs_en/api/vision_results/segmentation_result.md
+++ b/docs/docs_en/api/vision_results/segmentation_result.md
@@ -1,32 +0,0 @@
 # Segmentation Results
 The SegmentationResult function is defined in `fastdeploy/vision/common/result.h` , indicating the predicted segmentation class and the probability value of the segmentation class from each pixel in the image.
 ## C++  Definition
 `fastdeploy::vision::DetectionResult`
 ```c++
 struct DetectionResult {
  std::vector<uint8_t> label_map;
  std::vector<float> score_map;
  std::vector<int64_t> shape;
  bool contain_score_map = false;
  void Clear();
  std::string Str();
 };
 ```
 - **label_map**: Member variable that indicates the segmentation class for each pixel of a single image, and `label_map.size()` indicates the number of pixel points of the image
 - **score_map**: Member variable that indicates the predicted probability value of the segmentation class corresponding to label_map (define `without_argmax` when exporting the model); or the probability value normalised by softmax (define `without_argmax` and `with_softmax` when exporting the model or define ` without_argmax` while setting the model [Class Member Attribute](../../../../examples/vision/segmentation/paddleseg/cpp/)`with_softmax=True`) during initialization.
 - **shape**: Member variable that indicates the shape of the output, e.g. (h,w)
 - **Clear()**: Member function that clears the results stored in a struct.
 - **Str()**: Member function that outputs the information in the struct as a string (for Debug)
 ## Python Definition
 `fastdeploy.vision.SegmentationResult`
 - **label_map**(list of int): Member variable that indicates the segmentation class for each pixel of a single image
 - **score_map**(list of float): Member variable that indicates the predicted probability value of the segmentation class corresponding to label_map (define `without_argmax` when exporting the model); or the probability value normalised by softmax (define `without_argmax` and `with_softmax` when exporting the model or define `without_argmax` while setting the model [Class Member Attribute](../../../../examples/vision/segmentation/paddleseg/cpp/)`with_softmax=True`) during initialization.
 - **shape**(list of int): Member variable that indicates the shape of the output, e.g. (h,w)
--- a/docs/docs_en/compile/README.md
+++ b/docs/docs_en/compile/README.md
@@ -1,20 +0,0 @@
 # FastDeploy Compile
 This document outlines the compilation process for C++ predictive libraries and Python predictive libraries. Please refer to the following documentation according to your platform:
 - [build on Linux & Mac ](how_to_build_linux_and_mac.md)
 - [build on Windows](how_to_build_windows.md)
 The compilation options on each platform are listed in the table below:
 | Options               | Function                                                                    | Note                                                                          |
 |:--------------------- |:--------------------------------------------------------------------------- |:----------------------------------------------------------------------------- |
 | ENABLE_ORT_BACKEND    | Enable ONNXRuntime inference backend, The default is ON.                    | CPU is supported by default, and with WITH_GPU enabled, GPU is also supported |
 | ENABLE_PADDLE_BACKEND | Enable Paddle Inference backend. The default is OFF.                        | CPU is supported by default, and with WITH_GPU enabled, GPU is also supported |
 | ENABLE_TRT_BACKEND    | Enable TensorRT inference backend. The default is OFF.                      | GPU only                                                                      |
 | WITH_GPU              | Whether to enable GPU, The default is OFF.                                  | When set to TRUE, the compilation will support Nvidia GPU deployments         |
 | CUDA_DIRECTORY        | Choose the path to CUDA for compiling, the default is /usr/local/cuda       | CUDA 11.2 and above                                                           |
 | TRT_DIRECTORY         | When the TensorRT inference backend is enabled, choose the path to TensorRT | TensorRT 8.4 and above                                                        |
 | ENABLE_VISION         | Enable the visual model module. The default is ON                           |                                                                               |
 FastDeploy allows users to choose their own backend for compilation. Paddle Inference, ONNXRuntime, and TensorRT ( with ONNX format ) are currently being supported. The models supported by FastDeploy have been validated on different backends. It will automatically select the available backend or prompt the user accordingly if no backend is available. E.g. YOLOv7 currently only supports the ONNXRuntime and TensorRT. If developers do not enable these two backends, they will be prompted that no backend is available.
--- a/docs/docs_en/compile/how_to_build_linux_and_mac.md
+++ b/docs/docs_en/compile/how_to_build_linux_and_mac.md
@@ -1,42 +0,0 @@
 # Compile on Linux & Mac
 ## Dependencies
 - cmake >= 3.12
 - g++ >= 8.2
 - cuda >= 11.2 (WITH_GPU=ON)
 - cudnn >= 8.0 (WITH_GPU=ON)
 - TensorRT >= 8.4 (ENABLE_TRT_BACKEND=ON)
 ## Compile C++
 ```bash
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy
 git checkout develop
 mkdir build & cd build
 cmake .. -DENABLE_ORT_BACKEND=ON \
         -DENABLE_VISION=ON \
         -DCMAKE_INSTALL_PREFIX=${PWD}/fastdeploy-0.0.3
 make -j8
 make install
 ```
 The compiled prediction library is in the `fastdeploy-0.0.3`of current directory 
 ## Compile Python
 ```bash
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy
 git checkout develop
 # set compile options via export environment variable on Python
 export ENABLE_ORT_BACKEND=ON
 export ENABLE_VISION=ON
 python setup.py build
 python setup.py bdist_wheel
 ```
 The compiled wheel package is in the `dist` directory of current directory
 For more details, please refer to [Compile Readme](./README.md)
--- a/docs/docs_en/compile/how_to_build_windows.md
+++ b/docs/docs_en/compile/how_to_build_windows.md
@@ -1,104 +0,0 @@
 # Compile on Windows
 ## Dependencies
 - cmake >= 3.12
 - Visual Studio 16 2019
 - cuda >= 11.2 (WITH_GPU=ON)
 - cudnn >= 8.0 (WITH_GPU=ON)
 - TensorRT >= 8.4 (ENABLE_TRT_BACKEND=ON)
 ## Compile C++ SDK for CPU
 Opens the `x64 Native Tools Command Prompt for VS 2019` command tool on the Windows menu. In particular, the `CMAKE_INSTALL_PREFIX` is used to designate the path to the SDK generated after compilation.
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=D:\Paddle\FastDeploy\build\fastdeploy-win-x64-0.2.1-DENABLE_ORT_BACKEND=ON -DENABLE_PADDLE_BACKEND=ON -DENABLE_VISION=ON -DENABLE_VISION_VISUALIZE=ON
 msbuild fastdeploy.sln /m /p:Configuration=Release /p:Platform=x64
 msbuild INSTALL.vcxproj /m /p:Configuration=Release /p:Platform=x64
 ```
 After compilation, the FastDeploy CPU C++ SDK is in the `D:\Paddle\FastDeploy\build\fastdeploy-win-x64-0.2.1` directory
 ## Compile C++ SDK for GPU
 Opens the `x64 Native Tools Command Prompt for VS 2019` command tool on the Windows menu. In particular, the `CMAKE_INSTALL_PREFIX` is used to designate the path to the SDK generated after compilation.
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=D:\Paddle\FastDeploy\build\fastdeploy-win-x64-gpu-0.2.1 -DWITH_GPU=ON -DENABLE_ORT_BACKEND=ON -DENABLE_PADDLE_BACKEND=ON -DENABLE_VISION=ON -DENABLE_VISION_VISUALIZE=ON -DCUDA_DIRECTORY="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2"
 msbuild fastdeploy.sln /m /p:Configuration=Release /p:Platform=x64
 msbuild INSTALL.vcxproj /m /p:Configuration=Release /p:Platform=x64  
 % Notes：%
 % (1) -DCUDA_DIRECTORY designates the directory of CUDA %
 % (2) If compile the Paddle backend, set-DENABLE_PADDLE_BACKEND=ON %
 % (3) If compile the TensorRT backend，set-DENABLE_TRT_BACKEND=ON, and designate TRT_DIRECTORY %
 % (4) If-DTRT_DIRECTORY=D:\x64\third_party\TensorRT-8.4.1.5 %
 ```
 After compilation, FastDeploy GPU C++ SDK is under`D:\Paddle\FastDeploy\build\fastdeploy-win-x64-gpu-0.2.0`
 ## Compile Python Wheel package for CPU
 Opens the `x64 Native Tools Command Prompt for VS 2019` command tool on the Windows menu. In particular, the compilation options are obtained through the environment variables and run the following command in terminal.
 ```bat
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 set ENABLE_ORT_BACKEND=ON
 set ENABLE_PADDLE_BACKEND=ON
 set ENABLE_VISION=ON
 set ENABLE_VISION_VISUALIZE=ON
 % 这里指定用户自己的python解释器 以python3.8为例 %
 C:\Python38\python.exe setup.py build
 C:\Python38\python.exe setup.py bdist_wheel
 ```
 The compiled wheel files are in the dist directory. Use pip to install the compiled wheel package with the following command:
 ```bat
 C:\Python38\python.exe -m pip install dist\fastdeploy_python-0.2.0-cp38-cp38-win_amd64.whl
 ```
 ## Compile Python Wheel package for GPU
 Opens the `x64 Native Tools Command Prompt for VS 2019` command tool on the Windows menu. In particular, the compilation options are obtained through the environment variables and run the following command in terminal.
 ```bat
 % Note：CUDA_DIRECTORY is your own CUDA directory. The following is an example %
 set CUDA_DIRECTORY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
 % Note：TRT_DIRECTORY is the directory of the downloaded TensorRT library. The following is an example. Ignore the setting if the TensorRT backend is not needed. %
 set TRT_DIRECTORY=D:\x64\third_party\TensorRT-8.4.1.5
 set WITH_GPU=ON
 set ENABLE_ORT_BACKEND=ON
 % Note：If not compile TensorRT backend, the default is OFF %
 set ENABLE_TRT_BACKEND=ON
 set ENABLE_PADDLE_BACKEND=ON
 set ENABLE_VISION=ON
 set ENABLE_VISION_VISUALIZE=ON
 git clone https://github.com/PaddlePaddle/FastDeploy.git
 cd FastDeploy && git checkout develop
 % Note: Designate your own python interpreter here. Take python 3.8 as an example %
 C:\Python38\python.exe setup.py build
 C:\Python38\python.exe setup.py bdist_wheel
 ```
 The compiled wheel files are in the dist directory. Use pip to install the compiled wheel package with the following command:
 ```bat
 C:\Python38\python.exe -m pip install dist\fastdeploy_gpu_python-0.2.0-cp38-cp38-win_amd64.whl
 ```
 For more details, please refer to [Compile Readme](./README.md)
--- a/docs/docs_en/environment.md
+++ b/docs/docs_en/environment.md
@@ -1,22 +0,0 @@
 # FastDeploy environment requirements
 ## System Platform
 - Linux x64/aarch64
 - Windows 10 x64
 - Mac OSX (x86 10.0+ / arm64 12.0+)
 ## Hardware Dependencies
 - Intel CPU
 - Nvidia GPU
 ## Software Dependencies
 - cmake >= 3.12
 - gcc/g++ >= 8.2
 - python >= 3.6
 - Visual Studio 2019 (Windows)
 - cuda >= 11.0 （The default installation path of Linux is under /usr/local/cuda）
 - cudnn >= 8.0
 - TensorRT、Paddle Inference、ONNXRuntimeand other inference engines will be included in the SDK and do not need to be installed separately.
--- a/docs/docs_en/quick_start/install_cpp_sdk.md
+++ b/docs/docs_en/quick_start/install_cpp_sdk.md
@@ -1,44 +0,0 @@
 # FastDeploy C++ SDK
 FastDeploy provides prebuilt CPP deployment libraries on Windows/Linux/Mac. Developers can download and use it directly or compile the code manually.
 ## Dependencies
 - cuda >= 11.2
 - cudnn >= 8.0
 - g++ >= 5.4 (8.2 is recommended)
 ## Download
 ### Linux x64
 | SDK Download Link                                                                                                 | Hardware | Description                              |
 |:--------------------------------------------------------------------------------------------------------------------- |:-------- |:---------------------------------------- |
 | [fastdeploy-linux-x64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-0.2.1.tgz)         | CPU      | Built with g++ 8.2                       |
 | [fastdeploy-linux-x64-gpu-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-gpu-0.2.1.tgz) | CPU/GPU  | Built with g++ 8.2, cuda 11.2, cudnn 8.2 |
 ### Windows 10 x64
 | SDK Download Link                                                                                             | Hardware | Description                                           |
 |:----------------------------------------------------------------------------------------------------------------- |:-------- |:----------------------------------------------------- |
 | [fastdeploy-win-x64-0.2.1.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-0.2.1.zip)         | CPU      | Built with Visual Studio 16 2019                      |
 | [fastdeploy-win-x64-gpu-0.2.1.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-gpu-0.2.1.zip) | CPU/GPU  | Built with Visual Studio 16 2019，cuda 11.2, cudnn 8.2 |
 ### Linux aarch64
 | SDK Download Link                                                                                                   | Hardware | Description          |
 |:--------------------------------------------------------------------------------------------------------------------- |:-------- |:-------------------- |
 | [fastdeploy-linux-aarch64-0.2.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-0.2.0.tgz) | CPU      | Built with g++ 6.3.0 |
 | [comming...]                                                                                                          | Jetson   |                      |
 ### Mac OSX
 | SDK Download Link                                                                                            | Architecture | Hardware |
 |:--------------------------------------------------------------------------------------------------------------- |:------------ |:-------- |
 | [fastdeploy-osx-x86_64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-x86_64-0.2.1.tgz) | x86          | CPU      |
 | [fastdeploy-osx-arm64-0.2.1.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-arm64-0.2.1.tgz)   | arm64        | CPU      |
 ## Other related docs
 - [Install Python SDK](./install_python_sdk.md)
 - [Example Vision and NLP Model deployment with C++/Python](../../../examples/)
--- a/docs/docs_en/quick_start/install_python_sdk.md
+++ b/docs/docs_en/quick_start/install_python_sdk.md
@@ -1,36 +0,0 @@
 # FastDeploy Python SDK
 FastDeploy provides pre-compiled Python Wheel packages on Windows/Linux/Mac. Developers can download and install them directly, or compile the code manually.
 Currently, Fast Deploy supports
 - Python3.6-3.9 on Linux
 - Python3.8-3.9 on Windows
 - Python3.6-3.9 on Mac
 ## Install CPU Python
 ```bash
 pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
 ```
 ## Install GPU Python
 ```bash
 pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
 ```
 ## Notes:
 - Please choose either `fastdeploy-python`or `fastdeploy-gpu-python`for installation.
 - If you have installed CPU `fastdeploy-python`, please execute `pip uninstall fastdeploy-python` to uninstall the existing version to install GPU `fastdeploy-gpu-python`. 
 ## Dependencies
 - cuda >= 11.2
 - cudnn >= 8.0
 ## Other related docs
 - [FastDeploy Prebuilt C++ Libraries](./install_cpp_sdk.md)
 - [Example Vision and NLP Model deployment with C++/Python](../../../examples/)
--- a/docs/docs_en/quick_start/use_cpp_sdk_windows.md
+++ b/docs/docs_en/quick_start/use_cpp_sdk_windows.md
@@ -1,103 +0,0 @@
 # Use FastDeploy C++ SDK on Windows
 Using the FastDeploy C++ SDK on Windows is slightly different from using it on Linux. Below is an example of PPYOLOE for accelerated deployment on CPU/GPU, and GPU with TensorRT.
 Two steps before deployment:
 - 1. The hardware and software environment meets the requirements. Please refer to [FastDeploy Environment Requirements](../environment.md) for more details.
 - 2. Download the pre-built deployment SDK and samples code according to the development environment. For more details, please refer to [install\_cpp\_sdk](./install_cpp_sdk.md)
 ## Dependencies
 - cmake >= 3.12
 - Visual Studio 16 2019
 - cuda >= 11.2 (WITH_GPU=ON)
 - cudnn >= 8.0 (WITH_GPU=ON)
 - TensorRT >= 8.4 (ENABLE\_TRT\_BACKEND=ON)
 ## Download FastDeploy Windows 10 C++ SDK
 Download a compiled FastDeploy Windows 10 C++ SDK from the link below. It contains the examples code.
 ```text
 https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-gpu-0.2.0.zip
 ```
 ## Prepare model files and test images
 Download the model file and test images from the following link and unzip them
 ```text
 https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz # (Unzip after download)
 https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
 ```
 ## Compile PPYOLOE on Windows
 Open the`x64 Native Tools Command Prompt for VS 2019`command tool on the Windows menu, and direct it to ppyoloe‘s demo path
 ```bat
 cd fastdeploy-win-x64-gpu-0.2.0\examples\vision\detection\paddledetection\cpp
 ```
 ```bat
 mkdir build && cd build
 cmake .. -G "Visual Studio 16 2019" -A x64 -DFASTDEPLOY_INSTALL_DIR=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.0 -DCUDA_DIRECTORY="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2"
 ```
 Run the following command:
 ```bat
 msbuild infer_demo.sln /m:4 /p:Configuration=Release /p:Platform=x64
 ```
 ## Configure dependency library path
 #### Method 1: Set environment variables on the command line
 The compiled exe file is stored in the Release directory. Before running the demo, the model and test images need to be copied to this directory. Also, users need to designate the search path for the DLL in the terminal. Please execute the following command in the build directory.
 ```bat
 set FASTDEPLOY_PATH=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.0
 set PATH=%FASTDEPLOY_PATH%\lib;%FASTDEPLOY_PATH%\third_libs\install\onnxruntime\lib;%FASTDEPLOY_PATH%\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\bin;%FASTDEPLOY_PATH%\third_libs\install\paddle_inference\paddle\lib;%FASTDEPLOY_PATH%\third_libs\install\paddle_inference\third_party\install\mkldnn\lib;%FASTDEPLOY_PATH%\third_libs\install\paddle_inference\third_party\install\mklml\lib;%FASTDEPLOY_PATH%\third_libs\install\paddle2onnx\lib;%FASTDEPLOY_PATH%\third_libs\install\tensorrt\lib;%FASTDEPLOY_PATH%\third_libs\install\yaml-cpp\lib;%PATH%
 ```
 Note: Copy onnxruntime.dll to the directory of the exe.
 ```bat
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\onnxruntime\lib\onnxruntime* Release\
 ```
 As the latest Windows version contains onnxruntime.dll in the System32 system directory, there will still be a loading conflict for onnxruntime even if the PATH is set. To solve this problem, copy the onnxruntime.dll used in the demo to the directory of exe. Example is as follows:
 ```bat
 where onnxruntime.dll
 C:\Windows\System32\onnxruntime.dll  # windows自带的onnxruntime.dll
 ```
 #### Method 2: Copy dependencies library to exe directory
 Copy manually, or execute the following command in the build directory.
 ```bat
 set FASTDEPLOY_PATH=%cd%\..\..\..\..\..\..\..\fastdeploy-win-x64-gpu-0.2.0
 copy /Y %FASTDEPLOY_PATH%\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\onnxruntime\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\bin\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\paddle_inference\paddle\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\paddle_inference\third_party\install\mkldnn\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\paddle_inference\third_party\install\mklml\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\paddle2onnx\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\tensorrt\lib\*.dll Release\
 copy /Y %FASTDEPLOY_PATH%\third_libs\install\yaml-cpp\lib\*.dll Release\
 ```
 ## Run the demo
 ```bat
 cd Release
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 0  # CPU
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 1  # GPU
 infer_ppyoloe_demo.exe ppyoloe_crn_l_300e_coco 000000014439.jpg 2  # GPU + TensorRT
 ```
--- a/docs/docs_en/runtime/README.md
+++ b/docs/docs_en/runtime/README.md
@@ -1,18 +0,0 @@
 # FastDeploy Inference Backend
 FastDeploy currently integrates with a wide range of inference backends. The following table summarises these integrated backends and information, including the platforms and hardware.
 | Inference Backend | Platform                        | Hardware | Supported Model Format |
 |:----------------- |:------------------------------- |:-------- |:---------------------- |
 | Paddle Inference  | Windows(x64)/Linux(x64)         | GPU/CPU  | Paddle                 |
 | ONNX Runtime      | Windows(x64)/Linux(x64/aarch64) | GPU/CPU  | Paddle/ONNX            |
 | TensorRT          | Windows(x64)/Linux(x64/jetson)  | GPU      | Paddle/ONNX            |
 | OpenVINO          | Windows(x64)/Linux(x64)         | CPU      | Paddle/ONNX            |
 | Poros[Incoming]   | Linux(x64)                      | CPU/GPU  | TorchScript            |
 Backends in FastDeploy are independent and developers can choose to enable one or more of them for customized compilation.
 The `Runtime` module in FastDeploy provides a unified API for all backends. See the [FastDeploy Runtime User Guideline](usage.md) for more details.
 ## Related Files
 - [FastDeploy Compile](../compile)
--- a/docs/docs_en/runtime/how_to_change_inference_backend.md
+++ b/docs/docs_en/runtime/how_to_change_inference_backend.md
@@ -1,47 +0,0 @@
 # How to change inference backend
 Vision models in FastDeploy support a wide range of backends, including
 - OpenVINO (support models in Paddle/ONNX formats and inference on CPU only)
 - ONNX Runtime (support models in Paddle/ONNX formats and inference on CPU or GPU）
 - TensorRT (Support models in Paddle/ONNX formats and inference on GPU only
 - Paddle Inference(support models in Paddle format and inference on CPU or GPU)
 All the models change its inference backend through RuntimeOption
 **Python**
 ```
 import fastdeploy as fd
 option = fd.RuntimeOption()
 # Change inference on CPU/GPU
 option.use_cpu()
 option.use_gpu()
 # Change backend
 option.use_paddle_backend() # Paddle Inference
 option.use_trt_backend() # TensorRT
 option.use_openvino_backend() # OpenVINO
 option.use_ort_backend() # ONNX Runtime
 ```
 **C++**
 ```
 fastdeploy::RuntimeOption option;
 # Change inference on CPU/GPU
 option.UseCpu();
 option.UseGpu();
 # Change backend
 option.UsePaddleBackend(); // Paddle Inference
 option.UseTrtBackend(); // TensorRT
 option.UseOpenVINOBackend(); // OpenVINO
 option.UseOrtBackend(); // ONNX Runtime
 ```
 Please refer to `FastDeploy/examples/vision` for python or c++ inference code of different models.
 For more deployment methods of `RuntimeOption`, please refer to [RuntimeOption API](../api/runtime/runtime_option.md)
--- a/docs/docs_en/runtime/usage.md
+++ b/docs/docs_en/runtime/usage.md
@@ -1,45 +0,0 @@
 # FastDeploy Runtime User Guideline
 `Runtime`, the module for model inference in FastDeploy, currently integrates a variety of backends. It allows users to quickly complete inference in different model formats on different hardware, platforms and backends through a unified backend. This demo shows the inference on each hardware and backend.
 ## CPU Inference
 Python demo
 ```python
 import fastdeploy as fd
 import numpy as np
 option = fd.RuntimeOption()
 # Set model path
 option.set_model_path("resnet50/inference.pdmodel", "resnet50/inference.pdiparams")
 # Use OpenVINO backend
 option.use_openvino_backend()
 # Initialize runtime
 runtime = fd.Runtime(option)
 # Get input info
 input_name = runtime.get_input_info(0).name
 # Constructing data for inference
 results = runtime.infer({input_name: np.random.rand(1, 3, 224, 224).astype("float32")})
 ```
 ## GPU Inference
 ```python
 import fastdeploy as fd
 import numpy as np
 option = fd.RuntimeOption()
 # Set model path
 option.set_model_path("resnet50/inference.pdmodel", "resnet50/inference.pdiparams")
 # Use the GPU (0th GPU card)
 option.use_gpu(0)
 # Use Paddle Inference backend
 option.use_paddle_backend()
 # Initialize runtime
 runtime = fd.Runtime(option)
 # Get input info
 input_name = runtime.get_input_info(0).name
 # Constructing data for inference
 results = runtime.infer({input_name: np.random.rand(1, 3, 224, 224).astype("float32")})
 ```
 More Python/C++ inference demo, please refer to [FastDeploy/examples/runtime](../../../examples/runtime)
--- a/.new_docs/en/build_and_install/README.md
+++ b/.new_docs/en/build_and_install/README.md
--- a/.new_docs/en/build_and_install/android.md
+++ b/.new_docs/en/build_and_install/android.md
--- a/.new_docs/en/build_and_install/cpu.md
+++ b/.new_docs/en/build_and_install/cpu.md
--- a/.new_docs/en/build_and_install/download_prebuilt_libraries.md
+++ b/.new_docs/en/build_and_install/download_prebuilt_libraries.md
--- a/.new_docs/en/build_and_install/gpu.md
+++ b/.new_docs/en/build_and_install/gpu.md
--- a/.new_docs/en/build_and_install/jetson.md
+++ b/.new_docs/en/build_and_install/jetson.md
--- a/.new_docs/en/faq/build_on_win_with_gui.md
+++ b/.new_docs/en/faq/build_on_win_with_gui.md
--- a/.new_docs/en/faq/develop_a_new_model.md
+++ b/.new_docs/en/faq/develop_a_new_model.md
--- a/.new_docs/en/faq/how_to_change_backend.md
+++ b/.new_docs/en/faq/how_to_change_backend.md
--- a/.new_docs/en/faq/tensorrt_tricks.md
+++ b/.new_docs/en/faq/tensorrt_tricks.md
--- a/.new_docs/en/faq/use_sdk_on_android.md
+++ b/.new_docs/en/faq/use_sdk_on_android.md
--- a/.new_docs/en/faq/use_sdk_on_windows.md
+++ b/.new_docs/en/faq/use_sdk_on_windows.md
--- a/.new_docs/en/quantize.md
+++ b/.new_docs/en/quantize.md
--- a/.new_docs/en/quick_start/models/cpp.md
+++ b/.new_docs/en/quick_start/models/cpp.md
--- a/Show More
+++ b/Show More