From 77f189fbff80112a1a944019b746747caed5bd00 Mon Sep 17 00:00:00 2001 From: leiqing <54695910+leiqing1@users.noreply.github.com> Date: Thu, 22 Sep 2022 00:02:52 +0800 Subject: [PATCH] Create runtime_option.md --- docs/docs_en/api/runtime_option.md | 138 +++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 docs/docs_en/api/runtime_option.md diff --git a/docs/docs_en/api/runtime_option.md b/docs/docs_en/api/runtime_option.md new file mode 100644 index 000000000..3f690b977 --- /dev/null +++ b/docs/docs_en/api/runtime_option.md @@ -0,0 +1,138 @@ +# RuntimeOption Inference Backend Deployment + +The Runtime in the FastDeploy product contains multiple inference backends: + +| Model Format\Inference Backend | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO | +|:------------------------------ |:------------------------------ |:------------------------------------------ |:------------------------------ |:-------- | +| Paddle | Support (built-in Paddle2ONNX) | Support | Support (built-in Paddle2ONNX) | Support | +| ONNX | Support | Support (requires conversion via X2Paddle) | Support | Support | + +The hardware supported by Runtime is as follows + +| Hardware/Inference Backend | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO | +|:-------------------------- |:----------- |:---------------- |:----------- |:-------- | +| CPU | Support | Support | Not Support | Support | +| GPU | Support | Support | Support | Support | + +Each model uses `RuntimeOption` to configure the inference backend and parameters, e.g. in python, the inference configuration can be printed after loading the model with the following code + +```python +model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx") +print(model.runtime_option) +``` + +See below: + +```python +RuntimeOption( + backend : Backend.ORT # Inference Backend ONNXRuntime + cpu_thread_num : 8 # Number of CPU threads (valid only when using CPU) + device : Device.CPU # Inference hardware is CPU + device_id : 0 # Inference hardware id (for GPU) + model_file : yolov5s.onnx # Path to the model file + params_file : # Parameter file path + model_format : Frontend.ONNX # odel format + ort_execution_mode : -1 # The prefix ort indicates ONNXRuntime backend parameters + ort_graph_opt_level : -1 + ort_inter_op_num_threads : -1 + trt_enable_fp16 : False # The prefix of trt indicates a TensorRT backend parameter + trt_enable_int8 : False + trt_max_workspace_size : 1073741824 + trt_serialize_file : + trt_fixed_shape : {} + trt_min_shape : {} + trt_opt_shape : {} + trt_max_shape : {} + trt_max_batch_size : 32 +) +``` + +## Python + +### RuntimeOption Class + +`fastdeploy.RuntimeOption()`Configuration + +#### Configuration options + +> * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO` +> * **cpu_thread_num**(int): Number of CPU inference threads, valid only on CPU inference +> * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU` +> * **device_id**(int): Device id, used on GPU +> * **model_file**(str): Model file path +> * **params_file**(str): Parameter file path +> * **model_format**(Frontend): Model format, `fd.Frontend.PADDLE`/`fd.Frontend.ONNX` +> * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration +> * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration +> * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators +> * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference +> * **trt_enable_int8**(bool):TensorRT turns on INT8 inference +> * **trt_max_workspace_size**(int): `max_workspace_size` parameter configured on TensorRT +> * **trt_fixed_shape**(dict[str : list[int]]):When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter +> * **trt_min_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter +> * **trt_opt_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter +> * **trt_max_shape**(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter +> * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference + +```python +import fastdeploy as fd + +option = fd.RuntimeOption() +option.backend = fd.Backend.TRT +# When using a TRT backend with a dynamic input shape +# Configure input shape information +option.trt_min_shape = {"x": [1, 3, 224, 224]} +option.trt_opt_shape = {"x": [4, 3, 224, 224]} +option.trt_max_shape = {"x": [8, 3, 224, 224]} + +model = fd.vision.classification.PaddleClasModel( + "resnet50/inference.pdmodel", + "resnet50/inference.pdiparams", + "resnet50/inference_cls.yaml", + runtime_option=option) +``` + +## C++ + +### RuntimeOption Struct + +`fastdeploy::RuntimeOption()`Configuration options + +#### Configuration options + +> * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO` +> * **cpu_thread_num**(int): 、Number of CPU inference threads, valid only on CPU inference +> * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU` +> * **device_id**(int): Device id, used on GPU +> * **model_file**(string): Model file path +> * **params_file**(string): Parameter file path +> * **model_format**(fastdeploy::Frontend): Model format,`Frontend::PADDLE`/`Frontend::ONNX` +> * **ort_execution_mode**(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration +> * **ort_graph_opt_level**(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration +> * **ort_inter_op_num_threads**(int): When `ort_execution_mode` is 1, this parameter sets the number of threads in parallel between operators +> * **trt_enable_fp16**(bool): TensorRT turns on FP16 inference +> * **trt_enable_int8**(bool): TensorRT turns on INT8 inference +> * **trt_max_workspace_size**(int): `max_workspace_size` parameter configured on TensorRT +> * **trt_fixed_shape**(map>): When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter +> * **trt_min_shape**(map>): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter +> * **trt_opt_shape**(map>): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter +> * **trt_max_shape**(map>): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter +> * **trt_max_batch_size**(int): Maximum number of batches for TensorRT inference + +```c++ +#include "fastdeploy/vision.h" + +int main() { + auto option = fastdeploy::RuntimeOption(); + option.trt_min_shape["x"] = {1, 3, 224, 224}; + option.trt_opt_shape["x"] = {4, 3, 224, 224}; + option.trt_max_shape["x"] = {8, 3, 224, 224}; + + auto model = fastdeploy::vision::classification::PaddleClasModel( + "resnet50/inference.pdmodel", + "resnet50/inference.pdiparams", + "resnet50/inference_cls.yaml", + option); + return 0; +} +```