Merge branch 'develop' into add_ort_path

2025-12-24 13:28:13 +08:00 · 2022-09-22 10:59:21 +08:00
parent fac68d408b e4bf1f0f9b
commit 2d625fa265
11 changed files with 754 additions and 0 deletions
--- a/docs/docs_en/api/runtime/runtime.md
+++ b/docs/docs_en/api/runtime/runtime.md
@@ -0,0 +1,90 @@
+# Runtime
+
+After configuring `RuntimeOption`, developers can create Runtime for model inference on different hardware based on different backends.
+
+## Python Class
+
+```
+class Runtime(runtime_option)
+```
+
+**Parameters**
+
+> * **runtime_option**(fastdeploy.RuntimeOption): Configured RuntimeOption class and instance.
+
+### Member function
+
+```
+infer(data)
+```
+
+Model inference based on input data
+
+**Parameters**
+
+> * **data**(dict({str: np.ndarray}): Input dict data, and key is input name, value is np.ndarray
+
+**Return Value**
+
+Returns a list, whose length equals the number of elements in the original model; elements in the list are np.ndarray
+
+```
+num_inputs()
+```
+
+Input number that returns to the model
+
+```
+num_outputs()
+```
+
+Output number that returns to the model
+
+## C++  Class
+
+```
+class Runtime
+```
+
+### Member function
+
+```
+bool Init(const RuntimeOption& runtime_option)
+```
+
+Model loading initialization
+
+**Parameters**
+
+> * **runtime_option**: Configured RuntimeOption class and instance
+
+**Return Value**
+
+Returns TRUE for successful initialisation, FALSE otherwise
+
+```
+bool Infer(vector<FDTensor>& inputs, vector<FDTensor>* outputs)
+```
+
+Inference from the input and write the result to outputs
+
+**Parameters**
+
+> * **inputs**: Input data
+> * **outputs**: Output data
+
+**Return Value**
+
+Returns TRUE for successful inference, FALSE otherwise
+
+```
+int NumInputs()
+```
+
+Input number that returns to the model
+
+```
+input NumOutputs()
+```
+
+Output number that returns to the model
--- a/docs/docs_en/api/runtime/runtime_option.md
+++ b/docs/docs_en/api/runtime/runtime_option.md
@@ -0,0 +1,267 @@
+# RuntimeOption
+
+`RuntimeOption` is used to configure the inference parameters of the model on different backends and hardware.
+
+## Python  Class
+
+```
+class RuntimeOption()
+```
+
+### Member function
+
+```
+set_model_path(model_file, params_file="", model_format="paddle")
+```
+
+Set the model path for loading
+
+**Parameters**
+
+> * **model_file**(str): Model file path
+> * **params_file**(str): Parameter file path. This parameter is not required for onnx model format
+> * **model_format**(str): Model format. The model supports paddle, onnx format (Paddle by default).
+
+```
+use_gpu(device_id=0)
+```
+
+Inference on GPU
+
+**Parameters**
+
+> * **device_id**(int): When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.
+
+```
+use_cpu()
+```
+
+Inference on CPU
+
+```
+set_cpu_thread_num(thread_num=-1)
+```
+
+Set the number of threads on the CPU for inference
+
+**Parameters**
+
+> * **thread_num**(int): Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1
+
+```
+use_paddle_backend()
+```
+
+Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).
+
+```
+use_ort_backend()
+```
+
+Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).
+
+```
+use_trt_backend()
+```
+
+Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)
+
+```
+use_openvino_backend()
+```
+
+Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)
+
+```
+enable_paddle_mkldnn()
+disable_paddle_mkldnn()
+```
+
+When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.
+
+```
+enable_paddle_log_info()
+disable_paddle_log_info()
+```
+
+When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.
+
+```
+set_paddle_mkldnn_cache_size(cache_size)
+```
+
+When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration
+
+**Parameters**
+
+> * **cache_size**(int): Cache size
+
+```
+set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)
+```
+
+When using the TensorRT backend, this interface is used to set the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.
+
+FastDeploy will automatically update the shape range during the inference process according to the real-time data. But it will lead to a rebuilding of the back-end engine when it encounters a new shape range, costing more time. It is advisable to configure this interface in advance to avoid engine rebuilding during the inference process.
+
+**Parameters**
+
+> * **tensor_name**(str): tensor name of the range
+> * **min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]
+> * **opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.
+> * **max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.
+
+```
+set_trt_cache_file(cache_file_path)
+```
+
+When using the TensorRT backend, developers can use this interface to cache the built TensorRT model engine to the designated path, or skip the building engine step and load the locally cached TensorRT model directly
+
+- When this interface is called and `cache_file_path` does not exist, FastDeploy will build the TensorRT model and save the built model to `cache_file_path`
+- When this interface is called and `cache_file_path` exists, FastDeploy will directly load the built TensorRT model stored in `cache_file_path`, thus greatly reducing the time spent on model load initialization.
+
+This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset `set_trt_input_shape`, as well as replace the original paddle or onnx model, it is better to delete the `cache_file_path` file that has been cached locally first to avoid reloading the old cache, which could affect the program working.
+
+**Parameters**
+
+> * **cache_file_path**(str): cache file path. e.g.`/Downloads/resnet50.trt`
+
+```
+enable_trt_fp16()
+disable_trt_fp16()
+```
+
+When using the TensorRT backend, turning half-precision inference acceleration on or off via this interface brings a significant performance boost. However, half-precision inference is not supported on all GPUs. On GPUs that do not support half-precision inference, it will fall back to FP32 inference and give the prompt `Detected FP16 is not supported in the current GPU, will use FP32 instead.`
+
+## C++ Struct
+
+```
+struct RuntimeOption
+```
+
+### Member function
+
+```
+void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")
+```
+
+Set the model path for loading
+
+**Parameters**
+
+> * **model_file**: Model file path
+> * **params_file**: Parameter file path. This parameter could be optimized as "" for onnx model format
+> * **model_format**: Model format. The model supports paddle, onnx format (Paddle by default).
+
+```
+void UseGpu(int device_id = 0)
+```
+
+Set to inference on GPU
+
+**Parameters**
+
+> * **device_id**: 0When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.
+
+```
+void UseCpu()
+```
+
+Set to inference on CPU
+
+```
+void SetCpuThreadNum(int thread_num=-1)
+```
+
+Set the number of threads on the CPU for inference
+
+**Parameters**
+
+> * **thread_num**: Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1
+
+```
+void UsePaddleBackend()
+```
+
+Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).
+
+```
+void UseOrtBackend()
+```
+
+Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).
+
+```
+void UseTrtBackend()
+```
+
+Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)
+
+```
+void UseOpenVINOBackend()
+```
+
+Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)
+
+```
+void EnablePaddleMKLDNN()
+void DisablePaddleMKLDNN()
+```
+
+When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.
+
+```
+void EnablePaddleLogInfo()
+void DisablePaddleLogInfo()
+```
+
+When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.
+
+```
+void SetPaddleMKLDNNCacheSize(int cache_size)
+```
+
+When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration
+
+**Parameters**
+
+> * **cache_size**: Cache size
+
+```
+void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
+                      const vector<int32_t>& opt_shape = vector<int32_t>(),
+                      const vector<int32_t>& opt_shape = vector<int32_t>())
+```
+
+When using the TensorRT backend, this interface sets the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.
+
+FastDeploy will automatically update the shape range during the inference process according to the real-time data. But it will lead to a rebuilding of the back-end engine when it encounters a new shape range, costing more time. It is advisable to configure this interface in advance to avoid engine rebuilding during the inference process.
+
+**Parameters**
+
+> - **tensor_name**(str): tensor name of the range
+> - **min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]
+> - **opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.
+> - **max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.
+
+```
+void SetTrtCacheFile(const string& cache_file_path)
+```
+
+When using the TensorRT backend, developers can use this interface to cache the built TensorRT model engine to the designated path, or skip the building engine step and load the locally cached TensorRT model directly
+
+- When this interface is called and `cache_file_path` does not exist, FastDeploy will build the TensorRT model and save the built model to `cache_file_path`
+- When this interface is called and `cache_file_path` exists, FastDeploy will directly load the built TensorRT model stored in `cache_file_path`, thus greatly reducing the time spent on model load initialization.
+
+This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset `SetTrtInputShape`, as well as replace the original paddle or onnx model, it is better to delete the `cache_file_path` file that has been cached locally first to avoid reloading the old cache, which could affect the program working.
+
+**Parameters**
+
+> * **cache_file_path**: cache file path, such as `/Downloads/resnet50.trt`
+
+```
+void EnableTrtFp16()
+void DisableTrtFp16()
+```
+
+When using the TensorRT backend, turning half-precision inference acceleration on or off via this interface brings a significant performance boost. However, half-precision inference is not supported on all GPUs. On GPUs that do not support half-precision inference, it will fall back to FP32 inference and give the prompt `Detected FP16 is not supported in the current GPU, will use FP32 instead.`
--- a/docs/docs_en/api/runtime_option.md
+++ b/docs/docs_en/api/runtime_option.md
@@ -0,0 +1,138 @@
+# RuntimeOption Inference Backend Deployment
+
+The Runtime in the FastDeploy product contains multiple inference backends:
+
+| Model Format\Inference Backend | ONNXRuntime                    | Paddle Inference                           | TensorRT                       | OpenVINO |
+|:------------------------------ |:------------------------------ |:------------------------------------------ |:------------------------------ |:-------- |
+| Paddle                         | Support (built-in Paddle2ONNX) | Support                                    | Support (built-in Paddle2ONNX) | Support  |
+| ONNX                           | Support                        | Support (requires conversion via X2Paddle) | Support                        | Support  |
+
+The hardware supported by Runtime is as follows
+
+| Hardware/Inference Backend | ONNXRuntime | Paddle Inference | TensorRT    | OpenVINO |
+|:-------------------------- |:----------- |:---------------- |:----------- |:-------- |
+| CPU                        | Support     | Support          | Not Support | Support  |
+| GPU                        | Support     | Support          | Support     | Support  |
+
+Each model uses `RuntimeOption` to configure the inference backend and parameters, e.g. in python, the inference configuration can be printed after loading the model with the following code
+
+```python
+model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
+print(model.runtime_option)
+```
+
+See below:
+
+```python
+RuntimeOption(
+  backend : Backend.ORT                # Inference Backend ONNXRuntime
+  cpu_thread_num : 8                   # Number of CPU threads (valid only when using CPU)
+  device : Device.CPU                  # Inference hardware is CPU
+  device_id : 0                        # Inference hardware id (for GPU)
+  model_file : yolov5s.onnx            # Path to the model file
+  params_file :                        # Parameter file path
+  model_format : Frontend.ONNX         # odel format
+  ort_execution_mode : -1              # The prefix ort indicates ONNXRuntime backend parameters
+  ort_graph_opt_level : -1
+  ort_inter_op_num_threads : -1
+  trt_enable_fp16 : False              # The prefix of trt indicates a TensorRT backend  parameter
+  trt_enable_int8 : False
+  trt_max_workspace_size : 1073741824
+  trt_serialize_file :
+  trt_fixed_shape : {}
+  trt_min_shape : {}
+  trt_opt_shape : {}
+  trt_max_shape : {}
+  trt_max_batch_size : 32
+)
+```
+
+## Python
+
+### RuntimeOption Class
+
+`fastdeploy.RuntimeOption()`Configuration
+
+#### Configuration options
+
+> * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO`
+> * **cpu_thread_num**(int): Number of CPU inference threads, valid only on CPU inference
+> * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU`
+> * **device_id**(int): Device id, used on GPU
+> * **model_file**(str): Model file path
+> * **params_file**(str): Parameter file path
+> * **model_format**(Frontend): Model format, `fd.Frontend.PADDLE`/`fd.Frontend.ONNX`
+> * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
+> * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
+> * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators
+> * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference
+> * **trt_enable_int8**(bool):TensorRT turns on INT8 inference
+> * **trt_max_workspace_size**(int):  `max_workspace_size` parameter configured on TensorRT
+> * **trt_fixed_shape**(dict[str : list[int]]):When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
+> * **trt_min_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
+> * **trt_opt_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
+> * **trt_max_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
+> * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference
+
+```python
+import fastdeploy as fd
+
+option = fd.RuntimeOption()
+option.backend = fd.Backend.TRT
+# When using a TRT backend with a dynamic input shape
+# Configure input shape information
+option.trt_min_shape = {"x": [1, 3, 224, 224]}
+option.trt_opt_shape = {"x": [4, 3, 224, 224]}
+option.trt_max_shape = {"x": [8, 3, 224, 224]}
+
+model = fd.vision.classification.PaddleClasModel(
+    "resnet50/inference.pdmodel",
+    "resnet50/inference.pdiparams",
+    "resnet50/inference_cls.yaml",
+    runtime_option=option)
+```
+
+## C++
+
+### RuntimeOption  Struct
+
+`fastdeploy::RuntimeOption()`Configuration options
+
+#### Configuration options
+
+> * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO`
+> * **cpu_thread_num**(int): 、Number of CPU inference threads, valid only on CPU inference
+> * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU`
+> * **device_id**(int): Device id, used on GPU
+> * **model_file**(string): Model file path
+> * **params_file**(string): Parameter file path
+> * **model_format**(fastdeploy::Frontend): Model format,`Frontend::PADDLE`/`Frontend::ONNX`
+> * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
+> * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
+> * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators
+> * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference
+> * **trt_enable_int8**(bool): TensorRT turns on INT8 inference
+> * **trt_max_workspace_size**(int): `max_workspace_size` parameter configured on TensorRT
+> * **trt_fixed_shape**(map<string, vector<int>>): When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
+> * **trt_min_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
+> * **trt_opt_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
+> * **trt_max_shape**(map<string, vector<int>>): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
+> * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference
+
+```c++
+#include "fastdeploy/vision.h"
+
+int main() {
+  auto option = fastdeploy::RuntimeOption();
+  option.trt_min_shape["x"] = {1, 3, 224, 224};
+  option.trt_opt_shape["x"] = {4, 3, 224, 224};
+  option.trt_max_shape["x"] = {8, 3, 224, 224};
+
+  auto model = fastdeploy::vision::classification::PaddleClasModel(
+                           "resnet50/inference.pdmodel",
+                           "resnet50/inference.pdiparams",
+                           "resnet50/inference_cls.yaml",
+                            option);
+  return 0;
+}
+```
--- a/docs/docs_en/api/text_results/README.md
+++ b/docs/docs_en/api/text_results/README.md
@@ -0,0 +1,7 @@
+# Natural Language Processing Inference Results
+
+FastDeploy defines different structs to represent the model inference results according to the task type of the natural language processing model. The details are shown as follow.
+
+| Struct    | Doc                           | Description       | Related Model |
+|:--------- |:----------------------------- |:----------------- |:------------- |
+| UIEResult | [C++/Python](./uie_result.md) | UIE model results | UIE Model     |
--- a/docs/docs_en/api/text_results/uie_result.md
+++ b/docs/docs_en/api/text_results/uie_result.md
@@ -0,0 +1,34 @@
+# UIEResult Image - Claasification Results
+
+The UIEResult function is defined in `fastdeploy/text/uie/model.h`, indicating the UIE model abstraction results and confidence levels.
+
+## C++ Definition
+
+`fastdeploy::text::UIEResult`
+
+```c++
+struct UIEResult {
+  size_t start_;
+  size_t end_;
+  double probability_;
+  std::string text_;
+  std::unordered_map<std::string, std::vector<UIEResult>> relation_;
+  std::string Str() const;
+};
+```
+
+- **start_**: Member variable that indicates the starting position of the abstraction result text_ in the original text (Unicode encoding).
+- **end**: Member variable that indicates the ending position of the abstraction result text_ in the original text (Unicode encoding).
+- **text_**: Member function that indicates the result of the abstraction, saved in UTF-8 format.
+- **relation_**: Member function that indicates the current result association. It is commonly used for relationship abstraction.
+- **Str()**: Member function that outputs the information in the struct as a string (for Debug)
+
+## Python Definition
+
+`fastdeploy.text.C.UIEResult`
+
+- **start_**(int): Member variable that indicates the starting position of the abstraction result text_ in the original text (Unicode encoding).
+- **end**(int): Member variable that indicates the ending position of the abstraction result text_ in the original text (Unicode encoding).
+- **text_**(str): Member function that indicates the result of the abstraction, saved in UTF-8 format.
+- **relation_**(dict(str, list(fastdeploy.text.C.UIEResult))): Member function that indicates the current result association. It is commonly used for relationship abstraction.
+- **get_dict()**: give fastdeploy.text.C.UIEResult in dict format.
--- a/docs/docs_en/api/vision_results/matting_result.md
+++ b/docs/docs_en/api/vision_results/matting_result.md
@@ -0,0 +1,34 @@
+# Matting Results
+
+The MattingResult function is defined in `csrcs/fastdeploy/vision/common/result.h` , indicating the value of alpha transparency predicted by the model, the predicted foreground.
+
+## C++ Definition
+
+`fastdeploy::vision::MattingResult`
+
+```c++
+struct MattingResult {
+  std::vector<float> alpha;
+  std::vector<float> foreground;
+  std::vector<int64_t> shape;
+  bool contain_foreground = false;
+  void Clear();
+  std::string Str();
+};
+```
+
+- **alpha**: a one-dimensional vector of predicted alpha transparency value in the range [0.,1.] and length hxw, with h,w being the height and width of the input image
+- **foreground**: a one-dimensional vector for the predicted foreground, with a value range of [0.,255.] and a length of hxwxc. 'h,w' is the height and width of the input image, and c=3 in general. The foreground feature is not always available. It is only valid if the model predicts the foreground
+- **contain_foreground**: indicates whether the predicted result contains a foreground
+- **shape**: indicates the results shape. When contain_foreground is false, the shape only contains (h,w); when contain_foreground is true, the shape contains (h,w,c), and c is generally 3
+- **Clear()**: Member function that clears the results stored in a struct.
+- **Str()**: Member function that outputs the information in the struct as a string (for Debug)
+
+## Python Definition
+
+`fastdeploy.vision.MattingResult`
+
+- **alpha**(list of float): a one-dimensional vector of predicted alpha transparency value in the range [0.,1.] and length hxw, with h,w being the height and width of the input image.
+- **foreground**(list of float): a one-dimensional vector for the predicted foreground, with a value range of [0.,255.] and a length of hxwxc. 'h,w' is the height and width of the input image, and c=3 in general. The foreground feature is not always available. It is only valid if the model predicts the foreground.
+- **contain_foreground**(bool): indicates whether the predicted result contains a foreground
+- **shape**(list of int): indicates the results shape. When contain_foreground is false, the shape only contains (h,w); when contain_foreground is true, the shape contains (h,w,c), and c is generally 3
--- a/docs/docs_en/api/vision_results/ocr_result.md
+++ b/docs/docs_en/api/vision_results/ocr_result.md
@@ -0,0 +1,42 @@
+# OCR Results
+
+The OCRResult function is defined in `fastdeploy/vision/common/result.h` , indicating the text box detected from the image, the text box direction classification, and the text content inside the text box.
+
+## C++ Definition
+
+```c++
+fastdeploy::vision::OCRResult
+```
+
+```c++
+struct OCRResult {
+  std::vector<std::array<int, 8>> boxes;
+  std::vector<std::string> text;
+  std::vector<float> rec_scores;
+  std::vector<float> cls_scores;
+  std::vector<int32_t> cls_labels;
+  ResultType type = ResultType::OCR;
+  void Clear();
+  std::string Str();
+};
+```
+
+- **boxes**: Member variable that indicates the coordinates of all object boxes detected in a single image. `boxes.size()` indicates the number of boxes detected in a single image, with each box's 4 coordinate points being represented in order of 8 int values: lower left, lower right, upper right, upper left.
+- **text**: Member variable that indicates the text content of multiple identified text boxes, with the same number of elements as `boxes.size()`.
+- **rec_scores**: Member variable that indicates the confidence level of the text identified in the text box, with the same number of elements as `boxes.size()`.
+- **cls_scores**: Member variable that indicates the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()`.
+- **cls_labels**: Member variable that indicates the direction classification of the text box, with the same number of elements as `boxes.size()`.
+- **Clear()**: Member function that clears the results stored in a struct.
+- **Str()**: Member function that outputs the information in the struct as a string (for Debug)
+
+## Python Definition
+
+```python
+fastdeploy.vision.OCRResult  
+```
+
+- **boxes**: Member variable that indicates the coordinates of all object boxes detected in a single image. `boxes.size()` indicates the number of boxes detected in a single image, with each box's 4 coordinate points being represented in order of 8 int values: lower left, lower right, upper right, upper left.
+- **text**: Member variable that indicates the text content of multiple identified text boxes, with the same number of elements as `boxes.size()`.
+- **rec_scores**: Member variable that indicates the confidence level of the text identified in the text box, with the same number of elements as `boxes.size()`.
+- **cls_scores**: Member variable that indicates the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()`.
+- **cls_labels**: Member variable that indicates the direction classification of the text box, with the same number of elements as `boxes.size()`.
--- a/docs/docs_en/api/vision_results/segmentation_result.md
+++ b/docs/docs_en/api/vision_results/segmentation_result.md
@@ -0,0 +1,32 @@
+# Segmentation Results
+
+The SegmentationResult function is defined in `csrcs/fastdeploy/vision/common/result.h` , indicating the predicted segmentation class and the probability value of the segmentation class from each pixel in the image.
+
+## C++  Definition
+
+`fastdeploy::vision::DetectionResult`
+
+```c++
+struct DetectionResult {
+  std::vector<uint8_t> label_map;
+  std::vector<float> score_map;
+  std::vector<int64_t> shape;
+  bool contain_score_map = false;
+  void Clear();
+  std::string Str();
+};
+```
+
+- **label_map**: Member variable that indicates the segmentation class for each pixel of a single image, and `label_map.size()` indicates the number of pixel points of the image
+- **score_map**: Member variable that indicates the predicted probability value of the segmentation class corresponding to label_map (define `without_argmax` when exporting the model); or the probability value normalised by softmax (define `without_argmax` and `with_softmax` when exporting the model or define ` without_argmax` while setting the model [Class Member Attribute](../../../examples/vision/segmentation/paddleseg/cpp/)`with_softmax=True`) during initialization.
+- **shape**: Member variable that indicates the shape of the output, e.g. (h,w)
+- **Clear()**: Member function that clears the results stored in a struct.
+- **Str()**: Member function that outputs the information in the struct as a string (for Debug)
+
+## Python Definition
+
+`fastdeploy.vision.SegmentationResult`
+
+- **label_map**(list of int): Member variable that indicates the segmentation class for each pixel of a single image
+- **score_map**(list of float): Member variable that indicates the predicted probability value of the segmentation class corresponding to label_map (define `without_argmax` when exporting the model); or the probability value normalised by softmax (define `without_argmax` and `with_softmax` when exporting the model or define `without_argmax` while setting the model [Class Member Attribute](../../../examples/vision/segmentation/paddleseg/cpp/)`with_softmax=True`) during initialization.
+- **shape**(list of int): Member variable that indicates the shape of the output, e.g. (h,w)
--- a/docs/docs_en/runtime/README.md
+++ b/docs/docs_en/runtime/README.md
@@ -0,0 +1,18 @@
+# FastDeploy Inference Backend
+
+FastDeploy currently integrates with a wide range of inference backends. The following table summarises these integrated backends and information, including the platforms and hardware.
+
+| Inference Backend | Platform                        | Hardware | Supported Model Format |
+|:----------------- |:------------------------------- |:-------- |:---------------------- |
+| Paddle Inference  | Windows(x64)/Linux(x64)         | GPU/CPU  | Paddle                 |
+| ONNX Runtime      | Windows(x64)/Linux(x64/aarch64) | GPU/CPU  | Paddle/ONNX            |
+| TensorRT          | Windows(x64)/Linux(x64/jetson)  | GPU      | Paddle/ONNX            |
+| OpenVINO          | Windows(x64)/Linux(x64)         | CPU      | Paddle/ONNX            |
+| Poros[Incoming]   | Linux(x64)                      | CPU/GPU  | TorchScript            |
+
+Backends in FastDeploy are independent and developers can choose to enable one or more of them for customized compilation.
+The `Runtime` module in FastDeploy provides a unified API for all backends. See the [FastDeploy Runtime User Guideline](usage.md) for more details.
+
+## Related Files
+
+- [FastDeploy Compile](../compile)
--- a/docs/docs_en/runtime/how_to_change_inference_backend.md
+++ b/docs/docs_en/runtime/how_to_change_inference_backend.md
@@ -0,0 +1,47 @@
+# How to change inference backend
+
+Vision models in FastDeploy support a wide range of backends, including
+
+- OpenVINO (support models in Paddle/ONNX formats and inference on CPU only)
+- ONNX Runtime (support models in Paddle/ONNX formats and inference on CPU or GPU）
+- TensorRT (Support models in Paddle/ONNX formats and inference on GPU only
+- Paddle Inference(support models in Paddle format and inference on CPU or GPU)
+
+All the models change its inference backend through RuntimeOption
+
+**Python**
+
+```
+import fastdeploy as fd
+option = fd.RuntimeOption()
+
+# Change inference on CPU/GPU
+option.use_cpu()
+option.use_gpu()
+
+# Change backend
+option.use_paddle_backend() # Paddle Inference
+option.use_trt_backend() # TensorRT
+option.use_openvino_backend() # OpenVINO
+option.use_ort_backend() # ONNX Runtime
+```
+
+**C++**
+
+```
+fastdeploy::RuntimeOption option;
+
+# Change inference on CPU/GPU
+option.UseCpu();
+option.UseGpu();
+
+# Change backend
+option.UsePaddleBackend(); // Paddle Inference
+option.UseTrtBackend(); // TensorRT
+option.UseOpenVINOBackend(); // OpenVINO
+option.UseOrtBackend(); // ONNX Runtime
+```
+
+Please refer to `FastDeploy/examples/vision` for python or c++ inference code of different models.
+
+For more deployment methods of `RuntimeOption`, please refer to [RuntimeOption API](../../docs/api/runtime/runtime_option.md)
--- a/docs/docs_en/runtime/usage.md
+++ b/docs/docs_en/runtime/usage.md
@@ -0,0 +1,45 @@
+# FastDeploy Runtime User Guideline
+
+`Runtime`, the module for model inference in FastDeploy, currently integrates a variety of backends. It allows users to quickly complete inference in different model formats on different hardware, platforms and backends through a unified backend. This demo shows the inference on each hardware and backend.
+
+## CPU Inference
+
+Python demo
+
+```python
+import fastdeploy as fd
+import numpy as np
+option = fd.RuntimeOption()
+# Set model path
+option.set_model_path("resnet50/inference.pdmodel", "resnet50/inference.pdiparams")
+# Use OpenVINO backend
+option.use_openvino_backend()
+# Initialize runtime
+runtime = fd.Runtime(option)
+# Get input info
+input_name = runtime.get_input_info(0).name
+# Constructing data for inference
+results = runtime.infer({input_name: np.random.rand(1, 3, 224, 224).astype("float32")})
+```
+
+## GPU Inference
+
+```python
+import fastdeploy as fd
+import numpy as np
+option = fd.RuntimeOption()
+# Set model path
+option.set_model_path("resnet50/inference.pdmodel", "resnet50/inference.pdiparams")
+# Use the GPU (0th GPU card)
+option.use_gpu(0)
+# Use Paddle Inference backend
+option.use_paddle_backend()
+# Initialize runtime
+runtime = fd.Runtime(option)
+# Get input info
+input_name = runtime.get_input_info(0).name
+# Constructing data for inference
+results = runtime.infer({input_name: np.random.rand(1, 3, 224, 224).astype("float32")})
+```
+
+More Python/C++ inference demo, please refer to [FastDeploy/examples/runtime](../../../examples/runtime)