Bump up to version 0.3.0 (#371)

* Update VERSION_NUMBER * Update paddle_inference.cmake * Delete docs directory * release new docs * update version number * add vision result doc * update version * fix dead link * fix vision * fix dead link * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_EN.md Co-authored-by: leiqing <54695910+leiqing1@users.noreply.github.com>
2025-11-03 11:02:01 +08:00 · 2022-10-15 22:01:27 +08:00
parent bac17283ee
commit 3ff562aa77
174 changed files with 322 additions and 6775 deletions
--- a/docs/api/function.md
+++ b/docs/api/function.md
@@ -1,278 +0,0 @@
-# FDTensor C++ 张量化函数
-
-FDTensor是FastDeploy在C++层表示张量的结构体。该结构体主要用于管理推理部署时模型的输入输出数据，支持在不同的Runtime后端中使用。在基于C++的推理部署应用开发过程中，我们往往需要对输入输出的数据进行一些数据处理，用以得到模型的实际输入或者应用的实际输出。这种数据预处理的逻辑可以使用原生的C++标准库来实现，但开发难度会比较大，如对3维Tensor的第2维求最大值。针对这个问题，FastDeploy基于FDTensor开发了一套C++张量化函数，用于降低FastDeploy用户的开发成本，提高开发效率。目前主要分为三类函数：Reduce类函数，Manipulate类函数，Math类函数以及Elementwise类函数。
-
-## Reduce类函数
-
-目前FastDeploy支持7种Reduce类函数：Max, Min, Sum, All, Any, Mean, Prod。
-
-### Max
-
-#### 函数签名
-
-```c++
-/** Excute the maximum operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Max(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
-
-// Calculate the max value for axis 0 of `inputs`
-// The output result would be [[7, 4, 5]].
-Max(input, &output, {0}, /* keep_dim = */true);
-```
-
-### Min
-
-#### 函数签名
-
-```c++
-/** Excute the minimum operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Min(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
-
-// Calculate the min value for axis 0 of `inputs`
-// The output result would be [[2, 1, 3]].
-Min(input, &output, {0}, /* keep_dim = */true);
-```
-
-### Sum
-
-#### 函数签名
-
-```c++
-/** Excute the sum operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Sum(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
-
-// Calculate the sum value for axis 0 of `inputs`
-// The output result would be [[9, 5, 8]].
-Sum(input, &output, {0}, /* keep_dim = */true);
-```
-
-### Mean
-
-#### 函数签名
-
-```c++
-/** Excute the mean operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Mean(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
-
-// Calculate the mean value for axis 0 of `inputs`
-// The output result would be [[4, 2, 4]].
-Mean(input, &output, {0}, /* keep_dim = */true);
-```
-
-### Prod
-
-#### 函数签名
-
-```c++
-/** Excute the product operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Prod(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
-
-// Calculate the product value for axis 0 of `inputs`
-// The output result would be [[14, 4, 15]].
-Prod(input, &output, {0}, /* keep_dim = */true);
-```
-
-### Any
-
-#### 函数签名
-
-```c++
-/** Excute the any operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void Any(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
-input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
-
-// Calculate the any value for axis 0 of `inputs`
-// The output result would be [[true, false, true]].
-Any(input, &output, {0}, /* keep_dim = */true);
-```
-
-### All
-
-#### 函数签名
-
-```c++
-/** Excute the all operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which will be reduced.
-    @param keep_dim Whether to keep the reduced dims, default false.
-    @param reduce_all Whether to reduce all dims, default false.
-*/
-void All(const FDTensor& x, FDTensor* out,
-         const std::vector<int64_t>& dims,
-         bool keep_dim = false, bool reduce_all = false);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
-input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
-
-// Calculate the all value for axis 0 of `inputs`
-// The output result would be [[false, false, true]].
-All(input, &output, {0}, /* keep_dim = */true);
-```
-
-## Manipulate类函数
-
-目前FastDeploy支持1种Manipulate类函数：Transpose。
-
-### Transpose
-
-#### 函数签名
-
-```c++
-/** Excute the transpose operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param dims The vector of axis which the input tensor will transpose.
-*/
-void Transpose(const FDTensor& x, FDTensor* out,
-               const std::vector<int64_t>& dims);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-std::vector<float> inputs = {2, 4, 3, 7, 1, 5};
-input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
-
-// Transpose the input tensor with axis {1, 0}.
-// The output result would be [[2, 7], [4, 1], [3, 5]]
-Transpose(input, &output, {1, 0});
-```
-
-## Math类函数
-
-目前FastDeploy支持1种Math类函数：Softmax。
-
-### Softmax
-
-#### 函数签名
-
-```c++
-/** Excute the softmax operation for input FDTensor along given dims.
-    @param x The input tensor.
-    @param out The output tensor which stores the result.
-    @param axis The axis to be computed softmax value.
-*/
-void Softmax(const FDTensor& x, FDTensor* out, int axis = -1);
-```
-
-#### 使用示例
-
-```c++
-FDTensor input, output;
-CheckShape check_shape;
-CheckData check_data;
-std::vector<float> inputs = {1, 2, 3, 4, 5, 6};
-input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
-
-// Transpose the input tensor with axis {1, 0}.
-// The output result would be
-// [[0.04742587, 0.04742587, 0.04742587],
-//  [0.95257413, 0.95257413, 0.95257413]]
-Softmax(input, &output, 0);
-```
-
-
-## Elementwise类函数
-
-正在开发中，敬请关注······
--- a/docs/api/runtime/runtime.md
+++ b/docs/api/runtime/runtime.md
@@ -1,84 +0,0 @@
-# Runtime
-
-在配置`RuntimeOption`后，即可基于不同后端在不同硬件上创建Runtime用于模型推理。
-
-## Python 类
-
-```
-class Runtime(runtime_option)
-```
-**参数**
-> * **runtime_option**(fastdeploy.RuntimeOption): 配置好的RuntimeOption类实例
-
-### 成员函数
-
-```
-infer(data)
-```
-根据输入数据进行模型推理
-
-**参数**
-
-> * **data**(dict({str: np.ndarray}): 输入数据，字典dict类型，key为输入名，value为np.ndarray数据类型
-
-**返回值**
-
-返回list, list的长度与原始模型输出个数一致；list中元素为np.ndarray类型
-
-
-```
-num_inputs()
-```
-返回模型的输入个数
-
-```
-num_outputs()
-```
-返回模型的输出个数
-
-
-## C++ 类
-
-```
-class Runtime
-```
-
-### 成员函数
-
-```
-bool Init(const RuntimeOption& runtime_option)
-```
-模型加载初始化
-
-**参数**
-
-> * **runtime_option**: 配置好的RuntimeOption实例
-
-**返回值**
-
-初始化成功返回true，否则返回false
-
-
-```
-bool Infer(vector<FDTensor>& inputs, vector<FDTensor>* outputs)
-```
-根据输入进行推理，并将结果写回到outputs
-
-**参数**
-
-> * **inputs**: 输入数据
-> * **outputs**: 输出数据
-
-**返回值**
-
-推理成功返回true，否则返回false
-
-```
-int NumInputs()
-```
-返回模型输入个数
-
-```
-input NumOutputs()
-```
-返回模型输出个数
--- a/docs/api/runtime/runtime_option.md
+++ b/docs/api/runtime/runtime_option.md
@@ -1,231 +0,0 @@
-# RuntimeOption
-
-`RuntimeOption`用于配置模型在不同后端、硬件上的推理参数。
-
-## Python 类
-
-```
-class RuntimeOption()
-```
-
-### 成员函数
-
-```
-set_model_path(model_file, params_file="", model_format="paddle")
-```
-设定加载的模型路径
-
-**参数**
-
-> * **model_file**(str): 模型文件路径
-> * **params_file**(str): 参数文件路径，当为onnx模型格式时，无需指定
-> * **model_format**(str): 模型格式，支持paddle, onnx, 默认paddle
-
-```
-use_gpu(device_id=0)
-```
-设定使用GPU推理
-
-**参数**
-
-> * **device_id**(int): 环境中存在多个GPU卡时，此参数指定推理的卡，默认为0
-
-```
-use_cpu()
-```
-设定使用CPU推理
-
-
-```
-set_cpu_thread_num(thread_num=-1)
-```
-设置CPU上推理时线程数量
-
-**参数**
-
-> * **thread_num**(int): 线程数量，当小于或等于0时为后端自动分配，默认-1
-
-```
-use_paddle_backend()
-```
-使用Paddle Inference后端进行推理，支持CPU/GPU，支持Paddle模型格式
-
-```
-use_ort_backend()
-```
-使用ONNX Runtime后端进行推理，支持CPU/GPU，支持Paddle/ONNX模型格式
-
-```
-use_trt_backend()
-```
-使用TensorRT后端进行推理，支持GPU，支持Paddle/ONNX模型格式
-
-```
-use_openvino_backend()
-```
-使用OpenVINO后端进行推理，支持CPU, 支持Paddle/ONNX模型格式
-
-```
-set_paddle_mkldnn(pd_mkldnn=True)
-```
-当使用Paddle Inference后端时，通过此开关开启或关闭CPU上MKLDNN推理加速，后端默认为开启
-
-```
-enable_paddle_log_info()
-disable_paddle_log_info()
-```
-当使用Paddle Inference后端时，通过此开关开启或关闭模型加载时的优化日志，后端默认为关闭
-
-```
-set_paddle_mkldnn_cache_size(cache_size)
-```
-当使用Paddle Inference后端时，通过此接口控制MKLDNN加速时的Shape缓存大小
-
-**参数**
-> * **cache_size**(int): 缓存大小
-
-```
-set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)
-```
-当使用TensorRT后端时，通过此接口设置模型各个输入的Shape范围，当只设置min_shape时，会自动将opt_shape和max_shape设定为与min_shape一致。
-
-此接口用户也可以无需自行调用，FastDeploy在推理过程中，会根据推理真实数据自动更新Shape范围，但每次遇到新的shape更新范围后，会重新构造后端引擎，带来一定的耗时。可能过此接口提前配置，来避免推理过程中的引擎重新构建。
-
-**参数**
-> * **tensor_name**(str): 需要设定输入范围的tensor名
-> * **min_shape(list of int): 对应tensor的最小shape，例如[1, 3, 224, 224]
-> * **opt_shape(list of int): 对应tensor的最常用shape，例如[2, 3, 224, 224], 当为None时，即保持与min_shape一致，默认为None
-> * **max_shape(list of int): 对应tensor的最大shape，例如[8, 3, 224, 224], 当为None时，即保持与min_shape一致，默认为None
-
-```
-set_trt_cache_file(cache_file_path)
-```
-当使用TensorRT后端时，通过此接口将构建好的TensorRT模型引擎缓存到指定路径，或跳过构造引擎步骤，直接加载本地缓存的TensorRT模型
- 当调用此接口，且`cache_file_path`不存在时，FastDeploy将构建TensorRT模型，并将构建好的模型保持至`cache_file_path`
- 当调用此接口，且`cache_file_path`存在时，FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型，从而大大减少模型加载初始化的耗时
-
-通过此接口，可以在第二次运行代码时，加速模型加载初始化的时间，但因此也需注意，如需您修改了模型加载配置，例如TensorRT的max_workspace_size，或重新设置了`set_trt_input_shape`，以及更换了原始的paddle或onnx模型，需先删除已缓存在本地的`cache_file_path`文件，避免重新加载旧的缓存，影响程序正确性。
-
-**参数**
-> * **cache_file_path**(str): 缓存文件路径，例如`/Downloads/resnet50.trt`
-
-```
-enable_trt_fp16()
-disable_trt_fp16()
-```
-当使用TensorRT后端时，通过此接口开启或关闭半精度推理加速，会带来明显的性能提升，但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上，将会回退到FP32推理，并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
-
-## C++ 结构体
-
-```
-struct RuntimeOption
-```
-
-### 成员函数
-
-```
-void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")
-```
-设定加载的模型路径
-
-**参数**
-
-> * **model_file**: 模型文件路径
-> * **params_file**: 参数文件路径，当为onnx模型格式时，指定为""即可
-> * **model_format**: 模型格式，支持"paddle", "onnx", 默认"paddle"
-
-```
-void UseGpu(int device_id = 0)
-```
-设定使用GPU推理
-
-**参数**
-
-> * **device_id**: 环境中存在多个GPU卡时，此参数指定推理的卡，默认为0
-
-```
-void UseCpu()
-```
-设定使用CPU推理
-
-
-```
-void SetCpuThreadNum(int thread_num=-1)
-```
-设置CPU上推理时线程数量
-
-**参数**
-
-> * **thread_num**: 线程数量，当小于或等于0时为后端自动分配，默认-1
-
-```
-void UsePaddleBackend()
-```
-使用Paddle Inference后端进行推理，支持CPU/GPU，支持Paddle模型格式
-
-```
-void UseOrtBackend()
-```
-使用ONNX Runtime后端进行推理，支持CPU/GPU，支持Paddle/ONNX模型格式
-
-```
-void UseTrtBackend()
-```
-使用TensorRT后端进行推理，支持GPU，支持Paddle/ONNX模型格式
-
-```
-void UseOpenVINOBackend()
-```
-使用OpenVINO后端进行推理，支持CPU, 支持Paddle/ONNX模型格式
-
-```
-void SetPaddleMKLDNN(bool pd_mkldnn = true)
-```
-当使用Paddle Inference后端时，通过此开关开启或关闭CPU上MKLDNN推理加速，后端默认为开启
-
-```
-void EnablePaddleLogInfo()
-void DisablePaddleLogInfo()
-```
-当使用Paddle Inference后端时，通过此开关开启或关闭模型加载时的优化日志，后端默认为关闭
-
-```
-void SetPaddleMKLDNNCacheSize(int cache_size)
-```
-当使用Paddle Inference后端时，通过此接口控制MKLDNN加速时的Shape缓存大小
-
-**参数**
-> * **cache_size**: 缓存大小
-
-```
-void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
-                      const vector<int32_t>& opt_shape = vector<int32_t>(),
-                      const vector<int32_t>& opt_shape = vector<int32_t>())
-```
-当使用TensorRT后端时，通过此接口设置模型各个输入的Shape范围，当只设置min_shape时，会自动将opt_shape和max_shape设定为与min_shape一致。
-
-此接口用户也可以无需自行调用，FastDeploy在推理过程中，会根据推理真实数据自动更新Shape范围，但每次遇到新的shape更新范围后，会重新构造后端引擎，带来一定的耗时。可能过此接口提前配置，来避免推理过程中的引擎重新构建。
-
-**参数**
-> * **tensor_name**: 需要设定输入范围的tensor名
-> * **min_shape: 对应tensor的最小shape，例如[1, 3, 224, 224]
-> * **opt_shape: 对应tensor的最常用shape，例如[2, 3, 224, 224], 当为默认参数即空vector时，则视为保持与min_shape一致，默认为空vector
-> * **max_shape: 对应tensor的最大shape，例如[8, 3, 224, 224], 当为默认参数即空vector时，则视为保持与min_shape一致，默认为空vector
-
-```
-void SetTrtCacheFile(const string& cache_file_path)
-```
-当使用TensorRT后端时，通过此接口将构建好的TensorRT模型引擎缓存到指定路径，或跳过构造引擎步骤，直接加载本地缓存的TensorRT模型
- 当调用此接口，且`cache_file_path`不存在时，FastDeploy将构建TensorRT模型，并将构建好的模型保持至`cache_file_path`
- 当调用此接口，且`cache_file_path`存在时，FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型，从而大大减少模型加载初始化的耗时
-
-通过此接口，可以在第二次运行代码时，加速模型加载初始化的时间，但因此也需注意，如需您修改了模型加载配置，例如TensorRT的max_workspace_size，或重新设置了`SetTrtInputShape`，以及更换了原始的paddle或onnx模型，需先删除已缓存在本地的`cache_file_path`文件，避免重新加载旧的缓存，影响程序正确性。
-
-**参数**
-> * **cache_file_path**: 缓存文件路径，例如`/Downloads/resnet50.trt`
-
-```
-void EnableTrtFp16()
-void DisableTrtFp16()
-```
-当使用TensorRT后端时，通过此接口开启或关闭半精度推理加速，会带来明显的性能提升，但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上，将会回退到FP32推理，并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
--- a/docs/api/runtime_option.md
+++ b/docs/api/runtime_option.md
@@ -1,132 +0,0 @@
-# RuntimeOption 推理后端配置
-
-FastDeploy产品中的Runtime包含多个推理后端，其各关系如下所示
-
-| 模型格式\推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
-| :---------------  | :---------- | :--------------- | :------- | :------- |
-|     Paddle        | 支持(内置Paddle2ONNX) | 支持 | 支持(内置Paddle2ONNX) | 支持 |
-|     ONNX          | 支持        | 支持(需通过X2Paddle转换) | 支持 | 支持 |
-
-各Runtime支持的硬件情况如下
-
-| 硬件/推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
-| :---------------  | :---------- | :--------------- | :------- | :------- |
-|   CPU        |  支持       | 支持        | 不支持 |   支持 |
-|   GPU       |   支持       | 支持       | 支持    | 支持   |
-
-在各模型的，均通过`RuntimeOption`来配置推理的后端，以及推理时的参数，例如在python中，加载模型后可通过如下代码打印推理配置
-```python
-model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
-print(model.runtime_option)
-```
-可看下如下输出
-
-```python
-RuntimeOption(
-  backend : Backend.ORT                # 推理后端ONNXRuntime
-  cpu_thread_num : 8                   # CPU线程数（仅当使用CPU推理时有效）
-  device : Device.CPU                  # 推理硬件为CPU
-  device_id : 0                        # 推理硬件id（针对GPU）
-  model_file : yolov5s.onnx            # 模型文件路径
-  params_file :                        # 参数文件路径
-  model_format : ModelFormat.ONNX         # 模型格式
-  ort_execution_mode : -1              # 前辍为ort的表示为ONNXRuntime后端专用参数
-  ort_graph_opt_level : -1
-  ort_inter_op_num_threads : -1
-  trt_enable_fp16 : False              # 前辍为trt的表示为TensorRT后端专用参数
-  trt_enable_int8 : False
-  trt_max_workspace_size : 1073741824
-  trt_serialize_file :
-  trt_fixed_shape : {}
-  trt_min_shape : {}
-  trt_opt_shape : {}
-  trt_max_shape : {}
-  trt_max_batch_size : 32
-)
-```
-
-## Python 使用
-
-### RuntimeOption类
-`fastdeploy.RuntimeOption()`配置选项
-
-#### 配置选项
-> * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO`等
-> * **cpu_thread_num**(int): CPU推理线程数，仅当CPU推理时有效
-> * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU`等
-> * **device_id**(int): 设备id，在GPU下使用
-> * **model_file**(str): 模型文件路径
-> * **params_file**(str): 参数文件路径
-> * **model_format**(ModelFormat): 模型格式, `fd.ModelFormat.PADDLE`/`fd.ModelFormat.ONNX`
-> * **ort_execution_mode**(int): ORT后端执行方式，0表示按顺序执行所有算子，1表示并行执行算子，默认为-1，即按ORT默认配置方式执行
-> * **ort_graph_opt_level**(int): ORT后端图优化等级；0：禁用图优化；1：基础优化 2：额外拓展优化；99：全部优化； 默认为-1，即按ORT默认配置方式执行
-> * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时，此参数设置算子间并行的线程数
-> * **trt_enable_fp16**(bool): TensorRT开启FP16推理
-> * **trt_enable_int8**(bool): TensorRT开启INT8推理
-> * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
-> * **trt_fixed_shape**(dict[str : list[int]]): 当模型为动态shape，但实际推理时输入shape保持不变，则通过此参数配置输入的固定shape
-> * **trt_min_shape**(dict[str : list[int]]): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最小shape
-> * **trt_opt_shape**(dict[str : list[int]]): 当模型为动态shape, 且实际推理时输入shape也会变化，通过此参数配置输入的最优shape
-> * **trt_max_shape**(dict[str : list[int]]): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最大shape
-> * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
-
-```python
-import fastdeploy as fd
-
-option = fd.RuntimeOption()
-option.backend = fd.Backend.TRT
-# 当使用TRT后端，且为动态输入shape时
-# 需配置输入shape信息
-option.trt_min_shape = {"x": [1, 3, 224, 224]}
-option.trt_opt_shape = {"x": [4, 3, 224, 224]}
-option.trt_max_shape = {"x": [8, 3, 224, 224]}
-
-model = fd.vision.classification.PaddleClasModel(
-    "resnet50/inference.pdmodel",
-    "resnet50/inference.pdiparams",
-    "resnet50/inference_cls.yaml",
-    runtime_option=option)
-```
-
-## C++ 使用
-
-### RuntimeOption 结构体
-`fastdeploy::RuntimeOption()`配置选项
-
-#### 配置选项
-> * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO`等
-> * **cpu_thread_num**(int): CPU推理线程数，仅当CPU推理时有效
-> * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU`等
-> * **device_id**(int): 设备id，在GPU下使用
-> * **model_file**(string): 模型文件路径
-> * **params_file**(string): 参数文件路径
-> * **model_format**(fastdeploy::ModelFormat): 模型格式, `ModelFormat::PADDLE`/`ModelFormat::ONNX`
-> * **ort_execution_mode**(int): ORT后端执行方式，0表示按顺序执行所有算子，1表示并行执行算子，默认为-1，即按ORT默认配置方式执行
-> * **ort_graph_opt_level**(int): ORT后端图优化等级；0：禁用图优化；1：基础优化 2：额外拓展优化；99：全部优化； 默认为-1，即按ORT默认配置方式执行
-> * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时，此参数设置算子间并行的线程数
-> * **trt_enable_fp16**(bool): TensorRT开启FP16推理
-> * **trt_enable_int8**(bool): TensorRT开启INT8推理
-> * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
-> * **trt_fixed_shape**(map<string, vector<int>>): 当模型为动态shape，但实际推理时输入shape保持不变，则通过此参数配置输入的固定shape
-> * **trt_min_shape**(map<string, vector<int>>): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最小shape
-> * **trt_opt_shape**(map<string, vector<int>>): 当模型为动态shape, 且实际推理时输入shape也会变化，通过此参数配置输入的最优shape
-> * **trt_max_shape**(map<string, vector<int>>): 当模型为动态shape，且实际推理时输入shape也会变化，通过此参数配置输入的最大shape
-> * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
-
-```c++
-#include "fastdeploy/vision.h"
-
-int main() {
-  auto option = fastdeploy::RuntimeOption();
-  option.trt_min_shape["x"] = {1, 3, 224, 224};
-  option.trt_opt_shape["x"] = {4, 3, 224, 224};
-  option.trt_max_shape["x"] = {8, 3, 224, 224};
-
-  auto model = fastdeploy::vision::classification::PaddleClasModel(
-                           "resnet50/inference.pdmodel",
-                           "resnet50/inference.pdiparams",
-                           "resnet50/inference_cls.yaml",
-                            option);
-  return 0;
-}
-```
--- a/docs/api/text_results/README.md
+++ b/docs/api/text_results/README.md
@@ -1,7 +0,0 @@
-# 自然语言模型预测结果说明
-
-FastDeploy根据自然语言模型的任务类型，定义了不同的结构体来表达模型预测结果，具体如下表所示
-
-| 结构体 | 文档 | 说明 | 相应模型 |
-| :----- | :--- | :---- | :------- |
-| UIEResult | [C++/Python文档](./uie_result.md) | UIE模型返回结果 | UIE模型 |
--- a/docs/api/text_results/uie_result.md
+++ b/docs/api/text_results/uie_result.md
@@ -1,34 +0,0 @@
-# UIEResult 图像分类结果
-
-UIEResult代码定义在`fastdeploy/text/uie/model.h`中，用于表明UIE模型抽取结果和置信度。
-
-## C++ 定义
-
-`fastdeploy::text::UIEResult`
-
-```c++
-struct UIEResult {
-  size_t start_;
-  size_t end_;
-  double probability_;
-  std::string text_;
-  std::unordered_map<std::string, std::vector<UIEResult>> relation_;
-  std::string Str() const;
-};
-```
-
- **start_**: 成员变量，表示抽取结果text_在原文本（Unicode编码）中的起始位置。
- **end**: 成员变量，表示抽取结果text_在原文本（Unicode编码）中的结束位置。
- **text_**: 成员函数，表示抽取的结果，以UTF-8编码方式保存。
- **relation_**: 成员函数，表示当前结果关联的结果。常用于关系抽取。
- **Str()**: 成员函数，将结构体中的信息以字符串形式输出（用于Debug）
-
-## Python 定义
-
-`fastdeploy.text.C.UIEResult`
-
- **start_**(int): 成员变量，表示抽取结果text_在原文本（Unicode编码）中的起始位置。
- **end**(int): 成员变量，表示抽取结果text_在原文本（Unicode编码）中的结束位置。
- **text_**(str): 成员函数，表示抽取的结果，以UTF-8编码方式保存。
- **relation_**(dict(str, list(fastdeploy.text.C.UIEResult))): 成员函数，表示当前结果关联的结果。常用于关系抽取。
- **get_dict()**: 以dict形式返回fastdeploy.text.C.UIEResult。