mirror of
				https://github.com/PaddlePaddle/FastDeploy.git
				synced 2025-10-25 01:20:43 +08:00 
			
		
		
		
	
		
			
				
	
	
	
		
			7.9 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			7.9 KiB
		
	
	
	
	
	
	
	
RuntimeOption Inference Backend Deployment
The Runtime in the FastDeploy product contains multiple inference backends:
| Model Format\Inference Backend | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO | 
|---|---|---|---|---|
| Paddle | Support (built-in Paddle2ONNX) | Support | Support (built-in Paddle2ONNX) | Support | 
| ONNX | Support | Support (requires conversion via X2Paddle) | Support | Support | 
The hardware supported by Runtime is as follows
| Hardware/Inference Backend | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO | 
|---|---|---|---|---|
| CPU | Support | Support | Not Support | Support | 
| GPU | Support | Support | Support | Support | 
Each model uses RuntimeOption to configure the inference backend and parameters, e.g. in python, the inference configuration can be printed after loading the model with the following code
model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
print(model.runtime_option)
See below:
RuntimeOption(
  backend : Backend.ORT                # Inference Backend ONNXRuntime
  cpu_thread_num : 8                   # Number of CPU threads (valid only when using CPU)
  device : Device.CPU                  # Inference hardware is CPU
  device_id : 0                        # Inference hardware id (for GPU)
  model_file : yolov5s.onnx            # Path to the model file
  params_file :                        # Parameter file path
  model_format : ModelFormat.ONNX         # odel format
  ort_execution_mode : -1              # The prefix ort indicates ONNXRuntime backend parameters
  ort_graph_opt_level : -1
  ort_inter_op_num_threads : -1
  trt_enable_fp16 : False              # The prefix of trt indicates a TensorRT backend  parameter
  trt_enable_int8 : False
  trt_max_workspace_size : 1073741824
  trt_serialize_file :
  trt_fixed_shape : {}
  trt_min_shape : {}
  trt_opt_shape : {}
  trt_max_shape : {}
  trt_max_batch_size : 32
)
Python
RuntimeOption Class
fastdeploy.RuntimeOption()Configuration
Configuration options
- backend(fd.Backend):
fd.Backend.ORT/fd.Backend.TRT/fd.Backend.PDINFER/fd.Backend.OPENVINO- cpu_thread_num(int): Number of CPU inference threads, valid only on CPU inference
- device(fd.Device):
fd.Device.CPU/fd.Device.GPU- device_id(int): Device id, used on GPU
- model_file(str): Model file path
- params_file(str): Parameter file path
- model_format(ModelFormat): Model format,
fd.ModelFormat.PADDLE/fd.ModelFormat.ONNX- ort_execution_mode(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
- ort_graph_opt_level(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
- ort_inter_op_num_threads(int): When
ort_execution_modeis 1, this parameter sets the number of threads in parallel between operators- trt_enable_fp16(bool): TensorRT turns on FP16 inference
- trt_enable_int8(bool):TensorRT turns on INT8 inference
- trt_max_workspace_size(int):
max_workspace_sizeparameter configured on TensorRT- trt_fixed_shape(dict[str : list[int]]):When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
- trt_min_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
- trt_opt_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
- trt_max_shape(dict[str : list[int]]): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
- trt_max_batch_size(int): Maximum number of batches for TensorRT inference
import fastdeploy as fd
option = fd.RuntimeOption()
option.backend = fd.Backend.TRT
# When using a TRT backend with a dynamic input shape
# Configure input shape information
option.trt_min_shape = {"x": [1, 3, 224, 224]}
option.trt_opt_shape = {"x": [4, 3, 224, 224]}
option.trt_max_shape = {"x": [8, 3, 224, 224]}
model = fd.vision.classification.PaddleClasModel(
    "resnet50/inference.pdmodel",
    "resnet50/inference.pdiparams",
    "resnet50/inference_cls.yaml",
    runtime_option=option)
C++
RuntimeOption Struct
fastdeploy::RuntimeOption()Configuration options
Configuration options
- backend(fastdeploy::Backend):
Backend::ORT/Backend::TRT/Backend::PDINFER/Backend::OPENVINO- cpu_thread_num(int): 、Number of CPU inference threads, valid only on CPU inference
- device(fastdeploy::Device):
Device::CPU/Device::GPU- device_id(int): Device id, used on GPU
- model_file(string): Model file path
- params_file(string): Parameter file path
- model_format(fastdeploy::ModelFormat): Model format,
ModelFormat::PADDLE/ModelFormat::ONNX- ort_execution_mode(int): ORT back-end execution mode, 0 for sequential execution of all operators, 1 for parallel execution of operators, default is -1, i.e. execution in the ORT default configuration
- ort_graph_opt_level(int): ORT back-end image optimisation level; 0: disable image optimisation; 1: basic optimisation 2: additional expanded optimisation; 99: all optimisation; default is -1, i.e. executed in the ORT default configuration
- ort_inter_op_num_threads(int): When
ort_execution_modeis 1, this parameter sets the number of threads in parallel between operators- trt_enable_fp16(bool): TensorRT turns on FP16 inference
- trt_enable_int8(bool): TensorRT turns on INT8 inference
- trt_max_workspace_size(int):
max_workspace_sizeparameter configured on TensorRT- trt_fixed_shape(map<string, vector>): When the model is a dynamic shape, but the input shape remains constant for the actual inference, the input fixed shape is configured with this parameter
- trt_min_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the minimum shape of the input is configured with this parameter
- trt_opt_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the optimal shape of the input is configured with this parameter
- trt_max_shape(map<string, vector>): When the model is a dynamic shape and the input shape changes during the actual inference, the maximum shape of the input is configured with this parameter
- trt_max_batch_size(int): Maximum number of batches for TensorRT inference
#include "fastdeploy/vision.h"
int main() {
  auto option = fastdeploy::RuntimeOption();
  option.trt_min_shape["x"] = {1, 3, 224, 224};
  option.trt_opt_shape["x"] = {4, 3, 224, 224};
  option.trt_max_shape["x"] = {8, 3, 224, 224};
  auto model = fastdeploy::vision::classification::PaddleClasModel(
                           "resnet50/inference.pdmodel",
                           "resnet50/inference.pdiparams",
                           "resnet50/inference_cls.yaml",
                            option);
  return 0;
}
