Files
FastDeploy/docs/docs_en/api/runtime_option.md

7.9 KiB

RuntimeOption Inference Backend Deployment

The Runtime in the FastDeploy product contains multiple inference backends:

Model Format\Inference Backend ONNXRuntime Paddle Inference TensorRT OpenVINO
Paddle Support (built-in Paddle2ONNX) Support Support (built-in Paddle2ONNX) Support
ONNX Support Support (requires conversion via X2Paddle) Support Support

The hardware supported by Runtime is as follows

Hardware/Inference Backend ONNXRuntime Paddle Inference TensorRT OpenVINO
CPU Support Support Not Support Support
GPU Support Support Support Support

Each model uses RuntimeOption to configure the inference backend and parameters, e.g. in python, the inference configuration can be printed after loading the model with the following code

model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
print(model.runtime_option)

See below:

RuntimeOption(
  backend : Backend.ORT                # Inference Backend ONNXRuntime
  cpu_thread_num : 8                   # Number of CPU threads (valid only when using CPU)
  device : Device.CPU                  # Inference hardware is CPU
  device_id : 0                        # Inference hardware id (for GPU)
  model_file : yolov5s.onnx            # Path to the model file
  params_file :                        # Parameter file path
  model_format : ModelFormat.ONNX         # odel format
  ort_execution_mode : -1              # The prefix ort indicates ONNXRuntime backend parameters
  ort_graph_opt_level : -1
  ort_inter_op_num_threads : -1
  trt_enable_fp16 : False              # The prefix of trt indicates a TensorRT backend  parameter
  trt_enable_int8 : False
  trt_max_workspace_size : 1073741824
  trt_serialize_file :
  trt_fixed_shape : {}
  trt_min_shape : {}
  trt_opt_shape : {}
  trt_max_shape : {}
  trt_max_batch_size : 32
)

Python

RuntimeOption Class

fastdeploy.RuntimeOption()Configuration

Configuration options

  • backend(fd.Backend): fd.Backend.ORT/fd.Backend.TRT/fd.Backend.PDINFER/fd.Backend.OPENVINO
  • cpu_thread_num(int): Number of CPU inference threads, valid only on CPU inference
  • device(fd.Device): fd.Device.CPU/fd.Device.GPU
  • device_id(int): Device id, used on GPU
  • model_file(str): Model file path
  • params_file(str): Parameter file path
  • model_format(ModelFormat): Model format, fd.ModelFormat.PADDLE/fd.ModelFormat.ONNX
  • ort_execution_mode(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
  • ort_graph_opt_level(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
  • ort_inter_op_num_threads(int): When ort_execution_mode is 1, this parameter sets the number of threads in parallel between operators
  • trt_enable_fp16(bool): TensorRT turns on FP16 inference
  • trt_enable_int8(bool):TensorRT turns on INT8 inference
  • trt_max_workspace_size(int): max_workspace_size parameter configured on TensorRT
  • trt_fixed_shape(dict[str : list[int]]):When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
  • trt_min_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
  • trt_opt_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
  • trt_max_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
  • trt_max_batch_size(int): Maximum number of batches for TensorRT inference
import fastdeploy as fd

option = fd.RuntimeOption()
option.backend = fd.Backend.TRT
# When using a TRT backend with a dynamic input shape
# Configure input shape information
option.trt_min_shape = {"x": [1, 3, 224, 224]}
option.trt_opt_shape = {"x": [4, 3, 224, 224]}
option.trt_max_shape = {"x": [8, 3, 224, 224]}

model = fd.vision.classification.PaddleClasModel(
    "resnet50/inference.pdmodel",
    "resnet50/inference.pdiparams",
    "resnet50/inference_cls.yaml",
    runtime_option=option)

C++

RuntimeOption Struct

fastdeploy::RuntimeOption()Configuration options

Configuration options

  • backend(fastdeploy::Backend): Backend::ORT/Backend::TRT/Backend::PDINFER/Backend::OPENVINO
  • cpu_thread_num(int): 、Number of CPU inference threads, valid only on CPU inference
  • device(fastdeploy::Device): Device::CPU/Device::GPU
  • device_id(int): Device id, used on GPU
  • model_file(string): Model file path
  • params_file(string): Parameter file path
  • model_format(fastdeploy::ModelFormat): Model format,ModelFormat::PADDLE/ModelFormat::ONNX
  • ort_execution_mode(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
  • ort_graph_opt_level(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
  • ort_inter_op_num_threads(int): When ort_execution_mode is 1, this parameter sets the number of threads in parallel between operators
  • trt_enable_fp16(bool): TensorRT turns on FP16 inference
  • trt_enable_int8(bool): TensorRT turns on INT8 inference
  • trt_max_workspace_size(int): max_workspace_size parameter configured on TensorRT
  • trt_fixed_shape(map<string, vector>): When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
  • trt_min_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
  • trt_opt_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
  • trt_max_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
  • trt_max_batch_size(int): Maximum number of batches for TensorRT inference
#include "fastdeploy/vision.h"

int main() {
  auto option = fastdeploy::RuntimeOption();
  option.trt_min_shape["x"] = {1, 3, 224, 224};
  option.trt_opt_shape["x"] = {4, 3, 224, 224};
  option.trt_max_shape["x"] = {8, 3, 224, 224};

  auto model = fastdeploy::vision::classification::PaddleClasModel(
                           "resnet50/inference.pdmodel",
                           "resnet50/inference.pdiparams",
                           "resnet50/inference_cls.yaml",
                            option);
  return 0;
}