mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-11-03 11:02:01 +08:00
Bump up to version 0.3.0 (#371)
* Update VERSION_NUMBER * Update paddle_inference.cmake * Delete docs directory * release new docs * update version number * add vision result doc * update version * fix dead link * fix vision * fix dead link * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_CN.md * Update README_EN.md * Update README_EN.md Co-authored-by: leiqing <54695910+leiqing1@users.noreply.github.com>
This commit is contained in:
@@ -1,278 +0,0 @@
|
||||
# FDTensor C++ 张量化函数
|
||||
|
||||
FDTensor是FastDeploy在C++层表示张量的结构体。该结构体主要用于管理推理部署时模型的输入输出数据,支持在不同的Runtime后端中使用。在基于C++的推理部署应用开发过程中,我们往往需要对输入输出的数据进行一些数据处理,用以得到模型的实际输入或者应用的实际输出。这种数据预处理的逻辑可以使用原生的C++标准库来实现,但开发难度会比较大,如对3维Tensor的第2维求最大值。针对这个问题,FastDeploy基于FDTensor开发了一套C++张量化函数,用于降低FastDeploy用户的开发成本,提高开发效率。目前主要分为三类函数:Reduce类函数,Manipulate类函数,Math类函数以及Elementwise类函数。
|
||||
|
||||
## Reduce类函数
|
||||
|
||||
目前FastDeploy支持7种Reduce类函数:Max, Min, Sum, All, Any, Mean, Prod。
|
||||
|
||||
### Max
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the maximum operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Max(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
|
||||
|
||||
// Calculate the max value for axis 0 of `inputs`
|
||||
// The output result would be [[7, 4, 5]].
|
||||
Max(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### Min
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the minimum operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Min(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
|
||||
|
||||
// Calculate the min value for axis 0 of `inputs`
|
||||
// The output result would be [[2, 1, 3]].
|
||||
Min(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### Sum
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the sum operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Sum(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
|
||||
|
||||
// Calculate the sum value for axis 0 of `inputs`
|
||||
// The output result would be [[9, 5, 8]].
|
||||
Sum(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### Mean
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the mean operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Mean(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
|
||||
|
||||
// Calculate the mean value for axis 0 of `inputs`
|
||||
// The output result would be [[4, 2, 4]].
|
||||
Mean(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### Prod
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the product operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Prod(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<int> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, inputs.data());
|
||||
|
||||
// Calculate the product value for axis 0 of `inputs`
|
||||
// The output result would be [[14, 4, 15]].
|
||||
Prod(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### Any
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the any operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void Any(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
|
||||
|
||||
// Calculate the any value for axis 0 of `inputs`
|
||||
// The output result would be [[true, false, true]].
|
||||
Any(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
### All
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the all operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which will be reduced.
|
||||
@param keep_dim Whether to keep the reduced dims, default false.
|
||||
@param reduce_all Whether to reduce all dims, default false.
|
||||
*/
|
||||
void All(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims,
|
||||
bool keep_dim = false, bool reduce_all = false);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::array<bool, 6> bool_inputs = {false, false, true, true, false, true};
|
||||
input.SetExternalData({2, 3}, FDDataType::INT32, bool_inputs.data());
|
||||
|
||||
// Calculate the all value for axis 0 of `inputs`
|
||||
// The output result would be [[false, false, true]].
|
||||
All(input, &output, {0}, /* keep_dim = */true);
|
||||
```
|
||||
|
||||
## Manipulate类函数
|
||||
|
||||
目前FastDeploy支持1种Manipulate类函数:Transpose。
|
||||
|
||||
### Transpose
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the transpose operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param dims The vector of axis which the input tensor will transpose.
|
||||
*/
|
||||
void Transpose(const FDTensor& x, FDTensor* out,
|
||||
const std::vector<int64_t>& dims);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
std::vector<float> inputs = {2, 4, 3, 7, 1, 5};
|
||||
input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
|
||||
|
||||
// Transpose the input tensor with axis {1, 0}.
|
||||
// The output result would be [[2, 7], [4, 1], [3, 5]]
|
||||
Transpose(input, &output, {1, 0});
|
||||
```
|
||||
|
||||
## Math类函数
|
||||
|
||||
目前FastDeploy支持1种Math类函数:Softmax。
|
||||
|
||||
### Softmax
|
||||
|
||||
#### 函数签名
|
||||
|
||||
```c++
|
||||
/** Excute the softmax operation for input FDTensor along given dims.
|
||||
@param x The input tensor.
|
||||
@param out The output tensor which stores the result.
|
||||
@param axis The axis to be computed softmax value.
|
||||
*/
|
||||
void Softmax(const FDTensor& x, FDTensor* out, int axis = -1);
|
||||
```
|
||||
|
||||
#### 使用示例
|
||||
|
||||
```c++
|
||||
FDTensor input, output;
|
||||
CheckShape check_shape;
|
||||
CheckData check_data;
|
||||
std::vector<float> inputs = {1, 2, 3, 4, 5, 6};
|
||||
input.SetExternalData({2, 3}, FDDataType::FP32, inputs.data());
|
||||
|
||||
// Transpose the input tensor with axis {1, 0}.
|
||||
// The output result would be
|
||||
// [[0.04742587, 0.04742587, 0.04742587],
|
||||
// [0.95257413, 0.95257413, 0.95257413]]
|
||||
Softmax(input, &output, 0);
|
||||
```
|
||||
|
||||
|
||||
## Elementwise类函数
|
||||
|
||||
正在开发中,敬请关注······
|
||||
@@ -1,84 +0,0 @@
|
||||
# Runtime
|
||||
|
||||
在配置`RuntimeOption`后,即可基于不同后端在不同硬件上创建Runtime用于模型推理。
|
||||
|
||||
## Python 类
|
||||
|
||||
```
|
||||
class Runtime(runtime_option)
|
||||
```
|
||||
**参数**
|
||||
> * **runtime_option**(fastdeploy.RuntimeOption): 配置好的RuntimeOption类实例
|
||||
|
||||
### 成员函数
|
||||
|
||||
```
|
||||
infer(data)
|
||||
```
|
||||
根据输入数据进行模型推理
|
||||
|
||||
**参数**
|
||||
|
||||
> * **data**(dict({str: np.ndarray}): 输入数据,字典dict类型,key为输入名,value为np.ndarray数据类型
|
||||
|
||||
**返回值**
|
||||
|
||||
返回list, list的长度与原始模型输出个数一致;list中元素为np.ndarray类型
|
||||
|
||||
|
||||
```
|
||||
num_inputs()
|
||||
```
|
||||
返回模型的输入个数
|
||||
|
||||
```
|
||||
num_outputs()
|
||||
```
|
||||
返回模型的输出个数
|
||||
|
||||
|
||||
## C++ 类
|
||||
|
||||
```
|
||||
class Runtime
|
||||
```
|
||||
|
||||
### 成员函数
|
||||
|
||||
```
|
||||
bool Init(const RuntimeOption& runtime_option)
|
||||
```
|
||||
模型加载初始化
|
||||
|
||||
**参数**
|
||||
|
||||
> * **runtime_option**: 配置好的RuntimeOption实例
|
||||
|
||||
**返回值**
|
||||
|
||||
初始化成功返回true,否则返回false
|
||||
|
||||
|
||||
```
|
||||
bool Infer(vector<FDTensor>& inputs, vector<FDTensor>* outputs)
|
||||
```
|
||||
根据输入进行推理,并将结果写回到outputs
|
||||
|
||||
**参数**
|
||||
|
||||
> * **inputs**: 输入数据
|
||||
> * **outputs**: 输出数据
|
||||
|
||||
**返回值**
|
||||
|
||||
推理成功返回true,否则返回false
|
||||
|
||||
```
|
||||
int NumInputs()
|
||||
```
|
||||
返回模型输入个数
|
||||
|
||||
```
|
||||
input NumOutputs()
|
||||
```
|
||||
返回模型输出个数
|
||||
@@ -1,231 +0,0 @@
|
||||
# RuntimeOption
|
||||
|
||||
`RuntimeOption`用于配置模型在不同后端、硬件上的推理参数。
|
||||
|
||||
## Python 类
|
||||
|
||||
```
|
||||
class RuntimeOption()
|
||||
```
|
||||
|
||||
### 成员函数
|
||||
|
||||
```
|
||||
set_model_path(model_file, params_file="", model_format="paddle")
|
||||
```
|
||||
设定加载的模型路径
|
||||
|
||||
**参数**
|
||||
|
||||
> * **model_file**(str): 模型文件路径
|
||||
> * **params_file**(str): 参数文件路径,当为onnx模型格式时,无需指定
|
||||
> * **model_format**(str): 模型格式,支持paddle, onnx, 默认paddle
|
||||
|
||||
```
|
||||
use_gpu(device_id=0)
|
||||
```
|
||||
设定使用GPU推理
|
||||
|
||||
**参数**
|
||||
|
||||
> * **device_id**(int): 环境中存在多个GPU卡时,此参数指定推理的卡,默认为0
|
||||
|
||||
```
|
||||
use_cpu()
|
||||
```
|
||||
设定使用CPU推理
|
||||
|
||||
|
||||
```
|
||||
set_cpu_thread_num(thread_num=-1)
|
||||
```
|
||||
设置CPU上推理时线程数量
|
||||
|
||||
**参数**
|
||||
|
||||
> * **thread_num**(int): 线程数量,当小于或等于0时为后端自动分配,默认-1
|
||||
|
||||
```
|
||||
use_paddle_backend()
|
||||
```
|
||||
使用Paddle Inference后端进行推理,支持CPU/GPU,支持Paddle模型格式
|
||||
|
||||
```
|
||||
use_ort_backend()
|
||||
```
|
||||
使用ONNX Runtime后端进行推理,支持CPU/GPU,支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
use_trt_backend()
|
||||
```
|
||||
使用TensorRT后端进行推理,支持GPU,支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
use_openvino_backend()
|
||||
```
|
||||
使用OpenVINO后端进行推理,支持CPU, 支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
set_paddle_mkldnn(pd_mkldnn=True)
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此开关开启或关闭CPU上MKLDNN推理加速,后端默认为开启
|
||||
|
||||
```
|
||||
enable_paddle_log_info()
|
||||
disable_paddle_log_info()
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此开关开启或关闭模型加载时的优化日志,后端默认为关闭
|
||||
|
||||
```
|
||||
set_paddle_mkldnn_cache_size(cache_size)
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此接口控制MKLDNN加速时的Shape缓存大小
|
||||
|
||||
**参数**
|
||||
> * **cache_size**(int): 缓存大小
|
||||
|
||||
```
|
||||
set_trt_input_shape(tensor_name, min_shape, opt_shape=None, max_shape=None)
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口设置模型各个输入的Shape范围,当只设置min_shape时,会自动将opt_shape和max_shape设定为与min_shape一致。
|
||||
|
||||
此接口用户也可以无需自行调用,FastDeploy在推理过程中,会根据推理真实数据自动更新Shape范围,但每次遇到新的shape更新范围后,会重新构造后端引擎,带来一定的耗时。可能过此接口提前配置,来避免推理过程中的引擎重新构建。
|
||||
|
||||
**参数**
|
||||
> * **tensor_name**(str): 需要设定输入范围的tensor名
|
||||
> * **min_shape(list of int): 对应tensor的最小shape,例如[1, 3, 224, 224]
|
||||
> * **opt_shape(list of int): 对应tensor的最常用shape,例如[2, 3, 224, 224], 当为None时,即保持与min_shape一致,默认为None
|
||||
> * **max_shape(list of int): 对应tensor的最大shape,例如[8, 3, 224, 224], 当为None时,即保持与min_shape一致,默认为None
|
||||
|
||||
```
|
||||
set_trt_cache_file(cache_file_path)
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口将构建好的TensorRT模型引擎缓存到指定路径,或跳过构造引擎步骤,直接加载本地缓存的TensorRT模型
|
||||
- 当调用此接口,且`cache_file_path`不存在时,FastDeploy将构建TensorRT模型,并将构建好的模型保持至`cache_file_path`
|
||||
- 当调用此接口,且`cache_file_path`存在时,FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型,从而大大减少模型加载初始化的耗时
|
||||
|
||||
通过此接口,可以在第二次运行代码时,加速模型加载初始化的时间,但因此也需注意,如需您修改了模型加载配置,例如TensorRT的max_workspace_size,或重新设置了`set_trt_input_shape`,以及更换了原始的paddle或onnx模型,需先删除已缓存在本地的`cache_file_path`文件,避免重新加载旧的缓存,影响程序正确性。
|
||||
|
||||
**参数**
|
||||
> * **cache_file_path**(str): 缓存文件路径,例如`/Downloads/resnet50.trt`
|
||||
|
||||
```
|
||||
enable_trt_fp16()
|
||||
disable_trt_fp16()
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口开启或关闭半精度推理加速,会带来明显的性能提升,但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上,将会回退到FP32推理,并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
|
||||
|
||||
## C++ 结构体
|
||||
|
||||
```
|
||||
struct RuntimeOption
|
||||
```
|
||||
|
||||
### 成员函数
|
||||
|
||||
```
|
||||
void SetModelPath(const string& model_file, const string& params_file = "", const string& model_format = "paddle")
|
||||
```
|
||||
设定加载的模型路径
|
||||
|
||||
**参数**
|
||||
|
||||
> * **model_file**: 模型文件路径
|
||||
> * **params_file**: 参数文件路径,当为onnx模型格式时,指定为""即可
|
||||
> * **model_format**: 模型格式,支持"paddle", "onnx", 默认"paddle"
|
||||
|
||||
```
|
||||
void UseGpu(int device_id = 0)
|
||||
```
|
||||
设定使用GPU推理
|
||||
|
||||
**参数**
|
||||
|
||||
> * **device_id**: 环境中存在多个GPU卡时,此参数指定推理的卡,默认为0
|
||||
|
||||
```
|
||||
void UseCpu()
|
||||
```
|
||||
设定使用CPU推理
|
||||
|
||||
|
||||
```
|
||||
void SetCpuThreadNum(int thread_num=-1)
|
||||
```
|
||||
设置CPU上推理时线程数量
|
||||
|
||||
**参数**
|
||||
|
||||
> * **thread_num**: 线程数量,当小于或等于0时为后端自动分配,默认-1
|
||||
|
||||
```
|
||||
void UsePaddleBackend()
|
||||
```
|
||||
使用Paddle Inference后端进行推理,支持CPU/GPU,支持Paddle模型格式
|
||||
|
||||
```
|
||||
void UseOrtBackend()
|
||||
```
|
||||
使用ONNX Runtime后端进行推理,支持CPU/GPU,支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
void UseTrtBackend()
|
||||
```
|
||||
使用TensorRT后端进行推理,支持GPU,支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
void UseOpenVINOBackend()
|
||||
```
|
||||
使用OpenVINO后端进行推理,支持CPU, 支持Paddle/ONNX模型格式
|
||||
|
||||
```
|
||||
void SetPaddleMKLDNN(bool pd_mkldnn = true)
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此开关开启或关闭CPU上MKLDNN推理加速,后端默认为开启
|
||||
|
||||
```
|
||||
void EnablePaddleLogInfo()
|
||||
void DisablePaddleLogInfo()
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此开关开启或关闭模型加载时的优化日志,后端默认为关闭
|
||||
|
||||
```
|
||||
void SetPaddleMKLDNNCacheSize(int cache_size)
|
||||
```
|
||||
当使用Paddle Inference后端时,通过此接口控制MKLDNN加速时的Shape缓存大小
|
||||
|
||||
**参数**
|
||||
> * **cache_size**: 缓存大小
|
||||
|
||||
```
|
||||
void SetTrtInputShape(const string& tensor_name, const vector<int32_t>& min_shape,
|
||||
const vector<int32_t>& opt_shape = vector<int32_t>(),
|
||||
const vector<int32_t>& opt_shape = vector<int32_t>())
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口设置模型各个输入的Shape范围,当只设置min_shape时,会自动将opt_shape和max_shape设定为与min_shape一致。
|
||||
|
||||
此接口用户也可以无需自行调用,FastDeploy在推理过程中,会根据推理真实数据自动更新Shape范围,但每次遇到新的shape更新范围后,会重新构造后端引擎,带来一定的耗时。可能过此接口提前配置,来避免推理过程中的引擎重新构建。
|
||||
|
||||
**参数**
|
||||
> * **tensor_name**: 需要设定输入范围的tensor名
|
||||
> * **min_shape: 对应tensor的最小shape,例如[1, 3, 224, 224]
|
||||
> * **opt_shape: 对应tensor的最常用shape,例如[2, 3, 224, 224], 当为默认参数即空vector时,则视为保持与min_shape一致,默认为空vector
|
||||
> * **max_shape: 对应tensor的最大shape,例如[8, 3, 224, 224], 当为默认参数即空vector时,则视为保持与min_shape一致,默认为空vector
|
||||
|
||||
```
|
||||
void SetTrtCacheFile(const string& cache_file_path)
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口将构建好的TensorRT模型引擎缓存到指定路径,或跳过构造引擎步骤,直接加载本地缓存的TensorRT模型
|
||||
- 当调用此接口,且`cache_file_path`不存在时,FastDeploy将构建TensorRT模型,并将构建好的模型保持至`cache_file_path`
|
||||
- 当调用此接口,且`cache_file_path`存在时,FastDeploy将直接加载`cache_file_path`存储的已构建好的TensorRT模型,从而大大减少模型加载初始化的耗时
|
||||
|
||||
通过此接口,可以在第二次运行代码时,加速模型加载初始化的时间,但因此也需注意,如需您修改了模型加载配置,例如TensorRT的max_workspace_size,或重新设置了`SetTrtInputShape`,以及更换了原始的paddle或onnx模型,需先删除已缓存在本地的`cache_file_path`文件,避免重新加载旧的缓存,影响程序正确性。
|
||||
|
||||
**参数**
|
||||
> * **cache_file_path**: 缓存文件路径,例如`/Downloads/resnet50.trt`
|
||||
|
||||
```
|
||||
void EnableTrtFp16()
|
||||
void DisableTrtFp16()
|
||||
```
|
||||
当使用TensorRT后端时,通过此接口开启或关闭半精度推理加速,会带来明显的性能提升,但并非所有GPU都支持半精度推理。 在不支持半精度推理的GPU上,将会回退到FP32推理,并给出提示`Detected FP16 is not supported in the current GPU, will use FP32 instead.`
|
||||
@@ -1,132 +0,0 @@
|
||||
# RuntimeOption 推理后端配置
|
||||
|
||||
FastDeploy产品中的Runtime包含多个推理后端,其各关系如下所示
|
||||
|
||||
| 模型格式\推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
|
||||
| :--------------- | :---------- | :--------------- | :------- | :------- |
|
||||
| Paddle | 支持(内置Paddle2ONNX) | 支持 | 支持(内置Paddle2ONNX) | 支持 |
|
||||
| ONNX | 支持 | 支持(需通过X2Paddle转换) | 支持 | 支持 |
|
||||
|
||||
各Runtime支持的硬件情况如下
|
||||
|
||||
| 硬件/推理后端 | ONNXRuntime | Paddle Inference | TensorRT | OpenVINO |
|
||||
| :--------------- | :---------- | :--------------- | :------- | :------- |
|
||||
| CPU | 支持 | 支持 | 不支持 | 支持 |
|
||||
| GPU | 支持 | 支持 | 支持 | 支持 |
|
||||
|
||||
在各模型的,均通过`RuntimeOption`来配置推理的后端,以及推理时的参数,例如在python中,加载模型后可通过如下代码打印推理配置
|
||||
```python
|
||||
model = fastdeploy.vision.detection.YOLOv5("yolov5s.onnx")
|
||||
print(model.runtime_option)
|
||||
```
|
||||
可看下如下输出
|
||||
|
||||
```python
|
||||
RuntimeOption(
|
||||
backend : Backend.ORT # 推理后端ONNXRuntime
|
||||
cpu_thread_num : 8 # CPU线程数(仅当使用CPU推理时有效)
|
||||
device : Device.CPU # 推理硬件为CPU
|
||||
device_id : 0 # 推理硬件id(针对GPU)
|
||||
model_file : yolov5s.onnx # 模型文件路径
|
||||
params_file : # 参数文件路径
|
||||
model_format : ModelFormat.ONNX # 模型格式
|
||||
ort_execution_mode : -1 # 前辍为ort的表示为ONNXRuntime后端专用参数
|
||||
ort_graph_opt_level : -1
|
||||
ort_inter_op_num_threads : -1
|
||||
trt_enable_fp16 : False # 前辍为trt的表示为TensorRT后端专用参数
|
||||
trt_enable_int8 : False
|
||||
trt_max_workspace_size : 1073741824
|
||||
trt_serialize_file :
|
||||
trt_fixed_shape : {}
|
||||
trt_min_shape : {}
|
||||
trt_opt_shape : {}
|
||||
trt_max_shape : {}
|
||||
trt_max_batch_size : 32
|
||||
)
|
||||
```
|
||||
|
||||
## Python 使用
|
||||
|
||||
### RuntimeOption类
|
||||
`fastdeploy.RuntimeOption()`配置选项
|
||||
|
||||
#### 配置选项
|
||||
> * **backend**(fd.Backend): `fd.Backend.ORT`/`fd.Backend.TRT`/`fd.Backend.PDINFER`/`fd.Backend.OPENVINO`等
|
||||
> * **cpu_thread_num**(int): CPU推理线程数,仅当CPU推理时有效
|
||||
> * **device**(fd.Device): `fd.Device.CPU`/`fd.Device.GPU`等
|
||||
> * **device_id**(int): 设备id,在GPU下使用
|
||||
> * **model_file**(str): 模型文件路径
|
||||
> * **params_file**(str): 参数文件路径
|
||||
> * **model_format**(ModelFormat): 模型格式, `fd.ModelFormat.PADDLE`/`fd.ModelFormat.ONNX`
|
||||
> * **ort_execution_mode**(int): ORT后端执行方式,0表示按顺序执行所有算子,1表示并行执行算子,默认为-1,即按ORT默认配置方式执行
|
||||
> * **ort_graph_opt_level**(int): ORT后端图优化等级;0:禁用图优化;1:基础优化 2:额外拓展优化;99:全部优化; 默认为-1,即按ORT默认配置方式执行
|
||||
> * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时,此参数设置算子间并行的线程数
|
||||
> * **trt_enable_fp16**(bool): TensorRT开启FP16推理
|
||||
> * **trt_enable_int8**(bool): TensorRT开启INT8推理
|
||||
> * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
|
||||
> * **trt_fixed_shape**(dict[str : list[int]]): 当模型为动态shape,但实际推理时输入shape保持不变,则通过此参数配置输入的固定shape
|
||||
> * **trt_min_shape**(dict[str : list[int]]): 当模型为动态shape,且实际推理时输入shape也会变化,通过此参数配置输入的最小shape
|
||||
> * **trt_opt_shape**(dict[str : list[int]]): 当模型为动态shape, 且实际推理时输入shape也会变化,通过此参数配置输入的最优shape
|
||||
> * **trt_max_shape**(dict[str : list[int]]): 当模型为动态shape,且实际推理时输入shape也会变化,通过此参数配置输入的最大shape
|
||||
> * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
|
||||
|
||||
```python
|
||||
import fastdeploy as fd
|
||||
|
||||
option = fd.RuntimeOption()
|
||||
option.backend = fd.Backend.TRT
|
||||
# 当使用TRT后端,且为动态输入shape时
|
||||
# 需配置输入shape信息
|
||||
option.trt_min_shape = {"x": [1, 3, 224, 224]}
|
||||
option.trt_opt_shape = {"x": [4, 3, 224, 224]}
|
||||
option.trt_max_shape = {"x": [8, 3, 224, 224]}
|
||||
|
||||
model = fd.vision.classification.PaddleClasModel(
|
||||
"resnet50/inference.pdmodel",
|
||||
"resnet50/inference.pdiparams",
|
||||
"resnet50/inference_cls.yaml",
|
||||
runtime_option=option)
|
||||
```
|
||||
|
||||
## C++ 使用
|
||||
|
||||
### RuntimeOption 结构体
|
||||
`fastdeploy::RuntimeOption()`配置选项
|
||||
|
||||
#### 配置选项
|
||||
> * **backend**(fastdeploy::Backend): `Backend::ORT`/`Backend::TRT`/`Backend::PDINFER`/`Backend::OPENVINO`等
|
||||
> * **cpu_thread_num**(int): CPU推理线程数,仅当CPU推理时有效
|
||||
> * **device**(fastdeploy::Device): `Device::CPU`/`Device::GPU`等
|
||||
> * **device_id**(int): 设备id,在GPU下使用
|
||||
> * **model_file**(string): 模型文件路径
|
||||
> * **params_file**(string): 参数文件路径
|
||||
> * **model_format**(fastdeploy::ModelFormat): 模型格式, `ModelFormat::PADDLE`/`ModelFormat::ONNX`
|
||||
> * **ort_execution_mode**(int): ORT后端执行方式,0表示按顺序执行所有算子,1表示并行执行算子,默认为-1,即按ORT默认配置方式执行
|
||||
> * **ort_graph_opt_level**(int): ORT后端图优化等级;0:禁用图优化;1:基础优化 2:额外拓展优化;99:全部优化; 默认为-1,即按ORT默认配置方式执行
|
||||
> * **ort_inter_op_num_threads**(int): 当`ort_execution_mode`为1时,此参数设置算子间并行的线程数
|
||||
> * **trt_enable_fp16**(bool): TensorRT开启FP16推理
|
||||
> * **trt_enable_int8**(bool): TensorRT开启INT8推理
|
||||
> * **trt_max_workspace_size**(int): TensorRT配置的`max_workspace_size`参数
|
||||
> * **trt_fixed_shape**(map<string, vector<int>>): 当模型为动态shape,但实际推理时输入shape保持不变,则通过此参数配置输入的固定shape
|
||||
> * **trt_min_shape**(map<string, vector<int>>): 当模型为动态shape,且实际推理时输入shape也会变化,通过此参数配置输入的最小shape
|
||||
> * **trt_opt_shape**(map<string, vector<int>>): 当模型为动态shape, 且实际推理时输入shape也会变化,通过此参数配置输入的最优shape
|
||||
> * **trt_max_shape**(map<string, vector<int>>): 当模型为动态shape,且实际推理时输入shape也会变化,通过此参数配置输入的最大shape
|
||||
> * **trt_max_batch_size**(int): TensorRT推理时的最大batch数
|
||||
|
||||
```c++
|
||||
#include "fastdeploy/vision.h"
|
||||
|
||||
int main() {
|
||||
auto option = fastdeploy::RuntimeOption();
|
||||
option.trt_min_shape["x"] = {1, 3, 224, 224};
|
||||
option.trt_opt_shape["x"] = {4, 3, 224, 224};
|
||||
option.trt_max_shape["x"] = {8, 3, 224, 224};
|
||||
|
||||
auto model = fastdeploy::vision::classification::PaddleClasModel(
|
||||
"resnet50/inference.pdmodel",
|
||||
"resnet50/inference.pdiparams",
|
||||
"resnet50/inference_cls.yaml",
|
||||
option);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
@@ -1,7 +0,0 @@
|
||||
# 自然语言模型预测结果说明
|
||||
|
||||
FastDeploy根据自然语言模型的任务类型,定义了不同的结构体来表达模型预测结果,具体如下表所示
|
||||
|
||||
| 结构体 | 文档 | 说明 | 相应模型 |
|
||||
| :----- | :--- | :---- | :------- |
|
||||
| UIEResult | [C++/Python文档](./uie_result.md) | UIE模型返回结果 | UIE模型 |
|
||||
@@ -1,34 +0,0 @@
|
||||
# UIEResult 图像分类结果
|
||||
|
||||
UIEResult代码定义在`fastdeploy/text/uie/model.h`中,用于表明UIE模型抽取结果和置信度。
|
||||
|
||||
## C++ 定义
|
||||
|
||||
`fastdeploy::text::UIEResult`
|
||||
|
||||
```c++
|
||||
struct UIEResult {
|
||||
size_t start_;
|
||||
size_t end_;
|
||||
double probability_;
|
||||
std::string text_;
|
||||
std::unordered_map<std::string, std::vector<UIEResult>> relation_;
|
||||
std::string Str() const;
|
||||
};
|
||||
```
|
||||
|
||||
- **start_**: 成员变量,表示抽取结果text_在原文本(Unicode编码)中的起始位置。
|
||||
- **end**: 成员变量,表示抽取结果text_在原文本(Unicode编码)中的结束位置。
|
||||
- **text_**: 成员函数,表示抽取的结果,以UTF-8编码方式保存。
|
||||
- **relation_**: 成员函数,表示当前结果关联的结果。常用于关系抽取。
|
||||
- **Str()**: 成员函数,将结构体中的信息以字符串形式输出(用于Debug)
|
||||
|
||||
## Python 定义
|
||||
|
||||
`fastdeploy.text.C.UIEResult`
|
||||
|
||||
- **start_**(int): 成员变量,表示抽取结果text_在原文本(Unicode编码)中的起始位置。
|
||||
- **end**(int): 成员变量,表示抽取结果text_在原文本(Unicode编码)中的结束位置。
|
||||
- **text_**(str): 成员函数,表示抽取的结果,以UTF-8编码方式保存。
|
||||
- **relation_**(dict(str, list(fastdeploy.text.C.UIEResult))): 成员函数,表示当前结果关联的结果。常用于关系抽取。
|
||||
- **get_dict()**: 以dict形式返回fastdeploy.text.C.UIEResult。
|
||||
Reference in New Issue
Block a user