mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 16:48:03 +08:00

Files

heliqi d57e997fa0 Fd serving add docker images correlation and docs (#311 )

* fd serving add dockerfile

* fix enable_paddle_mkldnn

* delete disable_paddle_mkldnn

Co-authored-by: Jason <jiangjiajun@baidu.com>

2022-10-08 16:08:07 +08:00

9.8 KiB

Raw Blame History

RuntimeOption

RuntimeOption is used to configure the inference parameters of the model on different backends and hardware.

Python Class

class RuntimeOption()

Member function

set_model_path(model_file, params_file="", model_format="paddle")

Set the model path for loading

Parameters

model_file(str): Model file path

params_file(str): Parameter file path. This parameter is not required for onnx model format

model_format(str): Model format. The model supports paddle, onnx format (Paddle by default).

use_gpu(device_id=0)

Inference on GPU

Parameters

device_id(int): When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.

use_cpu()

Inference on CPU

set_cpu_thread_num(thread_num=-1)

Set the number of threads on the CPU for inference

Parameters

thread_num(int): Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1

use_paddle_backend()

Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).

use_ort_backend()

Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).

use_trt_backend()

Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)

use_openvino_backend()

Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)

set_paddle_mkldnn()

When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.

enable_paddle_log_info()
disable_paddle_log_info()

When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.

set_paddle_mkldnn_cache_size(cache_size)

When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration

Parameters

cache_size(int): Cache size

set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)

When using the TensorRT backend, this interface is used to set the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.

FastDeploy will automatically update the shape range during the inference process according to the real-time data. But it will lead to a rebuilding of the back-end engine when it encounters a new shape range, costing more time. It is advisable to configure this interface in advance to avoid engine rebuilding during the inference process.

Parameters

tensor_name(str): tensor name of the range

**min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]

**opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.

**max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is None, i.e. it remains the same as min_shape. The default is None.

set_trt_cache_file(cache_file_path)

When using the TensorRT backend, developers can use this interface to cache the built TensorRT model engine to the designated path, or skip the building engine step and load the locally cached TensorRT model directly

When this interface is called and cache_file_path does not exist, FastDeploy will build the TensorRT model and save the built model to cache_file_path
When this interface is called and cache_file_path exists, FastDeploy will directly load the built TensorRT model stored in cache_file_path, thus greatly reducing the time spent on model load initialization.

This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset set_trt_input_shape, as well as replace the original paddle or onnx model, it is better to delete the cache_file_path file that has been cached locally first to avoid reloading the old cache, which could affect the program working.

Parameters

cache_file_path(str): cache file path. e.g./Downloads/resnet50.trt

enable_trt_fp16()
disable_trt_fp16()

When using the TensorRT backend, turning half-precision inference acceleration on or off via this interface brings a significant performance boost. However, half-precision inference is not supported on all GPUs. On GPUs that do not support half-precision inference, it will fall back to FP32 inference and give the prompt Detected FP16 is not supported in the current GPU, will use FP32 instead.

C++ Struct

struct RuntimeOption

Member function

void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")

Set the model path for loading

Parameters

model_file: Model file path

params_file: Parameter file path. This parameter could be optimized as "" for onnx model format

model_format: Model format. The model supports paddle, onnx format (Paddle by default).

void UseGpu(int device_id = 0)

Set to inference on GPU

Parameters

device_id: 0When there are multiple GPU cards in the environment, this parameter specifies the card for inference. The default is 0.

void UseCpu()

Set to inference on CPU

void SetCpuThreadNum(int thread_num=-1)

Set the number of threads on the CPU for inference

Parameters

thread_num: Number of threads, automatically allocated for the backend when the number is smaller than or equal to 0. The default is -1

void UsePaddleBackend()

Inference with Paddle Inference backend (CPU/GPU supported, Paddle model format supported).

void UseOrtBackend()

Inference with ONNX Runtime backend (CPU/GPU supported, Paddle and ONNX model format supported).

void UseTrtBackend()

Inference with TensorRT backend (GPU supported, Paddle/ONNX model format supported)

void UseOpenVINOBackend()

Inference with OpenVINO backend (CPU supported, Paddle/ONNX model format supported)

void SetPaddleMKLDNN(bool pd_mkldnn = true)

When using the Paddle Inference backend, this parameter determines whether the MKLDNN inference acceleration on the CPU is on or off. It is on by default.

void EnablePaddleLogInfo()
void DisablePaddleLogInfo()

When using the Paddle Inference backend, this parameter determines whether the optimization log on model loading is on or off. It is off by default.

void SetPaddleMKLDNNCacheSize(int cache_size)

When using the Paddle Inference backend, this interface controls the shape cache size of MKLDNN acceleration

Parameters

cache_size: Cache size

void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
                      const vector<int32_t>& opt_shape = vector<int32_t>(),
                      const vector<int32_t>& opt_shape = vector<int32_t>())

When using the TensorRT backend, this interface sets the Shape range of each input to the model. If only min_shape is set, the opt_shape and max_shape are automatically set to match min_shape.

Parameters

tensor_name(str): tensor name of the range

**min_shape(list of int): Minimum shape of the corresponding tensor, e.g. [1, 3, 224, 224]

**opt_shape(list of int): The most common shape for the corresponding tensor, e.g. [2, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.

**max_shape(list of int): The maximum shape for the corresponding tensor, e.g. [8, 3, 224, 224]. When it is empty vector, i.e. it remains the same as min_shape. The default is empty vector.

void SetTrtCacheFile(const string& cache_file_path)

When this interface is called and cache_file_path does not exist, FastDeploy will build the TensorRT model and save the built model to cache_file_path
When this interface is called and cache_file_path exists, FastDeploy will directly load the built TensorRT model stored in cache_file_path, thus greatly reducing the time spent on model load initialization.

This interface allows developers to speed up the initialisation of the model loading for later use. However, if developers change the model loading configuration, for example the max_workspace_size of TensorRT, or reset SetTrtInputShape, as well as replace the original paddle or onnx model, it is better to delete the cache_file_path file that has been cached locally first to avoid reloading the old cache, which could affect the program working.

Parameters

cache_file_path: cache file path, such as /Downloads/resnet50.trt

void EnableTrtFp16()
void DisableTrtFp16()

9.8 KiB Raw Blame History

RuntimeOption

Python Class

Member function

C++ Struct

Member function

9.8 KiB

Raw Blame History