[CVCUDA] Vision Processor Python API and Tutorial (#1394)

* bind success

* bind success fix

* FDMat pybind, ResizeByShort pybind

* FDMat pybind, ResizeByShort pybind, remove initialized_

* override BindProcessorManager::Run in python is available

* PyProcessorManager done

* vision_pybind fix

* manager.py fix

* add tutorials

* remove Apply() bind

* remove Apply() bind and fix

* fix reviewed problem

* fix reviewed problem

* fix reviewed problem readme

* fix reviewed problem readme etc

* apply return outputs

* nits

* update readme

* fix FDMatbatch

* add op pybind: CenterCrop, Pad

* add op overload for pass FDMatBatch

---------

Co-authored-by: Wang Xinyu <shaywxy@gmail.com>
This commit is contained in:
guxukai
2023-03-10 14:42:32 +08:00
committed by GitHub
parent cb7c8a07d4
commit c6480de736
22 changed files with 530 additions and 34 deletions

View File

@@ -0,0 +1,43 @@
English | [中文](README_CN.md)
# Vision Processor
Vision Processor is used to implement model preprocessing, postprocessing, etc. The following 3rd party vision libraries are integrated:
- OpenCV, general CPU image processing
- FlyCV, mainly optimized for ARM CPU
- CV-CUDA, for NVIDIA GPU
## C++
TODO(guxukai)
## Python
Python API, Currently supported operators are as follows:
- ResizeByShort
- NormalizeAndPermute
Users can implement a image processing modules by inheriting the `PyProcessorManager` class. The base class `PyProcessorManager` implements GPU memory management, CUDA stream management, etc. Users only need to implement the apply() function by calling vision processors in this library and implements processing logic. For specific implementation, please refer to the demo code.
### Demo
- [Python Demo](python)
### Performance comparison between CV-CUDA and OpenCV:
CPU: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
GPU: T4
CUDA: 11.6
Processing logic: Resize -> NormalizeAndPermute
Warmup 100 roundstested 1000 rounds and get avg. latency.
| Input Image Shape | Target shape | Batch Size | OpenCV | CV-CUDA | Gain |
| ----------- | -- | ---------- | ------- | ------ | ------ |
| 1920x1080 | 640x360 | 1 | 1.1572ms | 0.9067ms | 16.44% |
| 1280x720 | 640x360 | 1 | 2.7551ms | 0.5296ms | 80.78% |
| 360x240 | 640x360 | 1 | 3.3450ms | 0.2421ms | 92.76% |

View File

@@ -0,0 +1,42 @@
中文 | [English](README.md)
# 多硬件图像处理库
多硬件图像处理库Vision Processor可用于实现模型的预处理、后处理等图像操作底层封装了多个第三方图像处理库包括
- OpenCV用于通用CPU图像处理
- FlyCV主要针对ARM CPU加速
- CV-CUDA用于NVIDIA GPU
## C++
待编写
## Python
Python API目前支持的算子如下
- ResizeByShort
- NormalizeAndPermute
用户可通过继承PyProcessorManager类实现自己的图像处理模块。基类PyProcessorManager实现了GPU内存管理、CUDA stream管理等用户仅需要实现apply()函数,在其中调用多硬件图像处理库中的算子、实现处理逻辑即可,具体实现可参考示例代码。
### 示例代码
- [Python示例](python)
### CV-CUDA与OpenCV性能对比
CPU: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
GPU: T4
CUDA: 11.6
Processing logic: Resize -> NormalizeAndPermute
Warmup 100 roundstested 1000 rounds and get avg. latency.
| Input Image Shape | Target shape | Batch Size | OpenCV | CV-CUDA | Gain |
| ----------- | -- | ---------- | ------- | ------ | ------ |
| 1920x1080 | 640x360 | 1 | 1.1572ms | 0.9067ms | 16.44% |
| 1280x720 | 640x360 | 1 | 2.7551ms | 0.5296ms | 80.78% |
| 360x240 | 640x360 | 1 | 3.3450ms | 0.2421ms | 92.76% |

View File

@@ -0,0 +1,19 @@
English | [中文](README_CN.md)
# Preprocessor Python Demo
1. [build FastDeployPython](../../../docs/cn/build_and_install), or download[FastDeploy prebuilt libraryPython](../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
2. Run the Demo
```bash
# Download the test image
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
# Run the Demo
# OpenCV
python preprocess.py
# CV-CUDA
python preprocess.py --use_cvcuda True
```

View File

@@ -0,0 +1,19 @@
中文 | [English](README.md)
# Preprocessor Python 示例代码
1. [编译FastDeployPython](../docs/cn/build_and_install), 或直接下载[FastDeploy预编译库Python](../docs/cn/build_and_install/download_prebuilt_libraries.md)
2. 运行示例代码
```bash
# 下载测试图片
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
# 运行示例代码
# OpenCV
python preprocess.py
# CV-CUDA
python preprocess.py --use_cvcuda True
```

View File

@@ -0,0 +1,89 @@
import fastdeploy as fd
import cv2
from fastdeploy.vision.common.manager import PyProcessorManager
def parse_arguments():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"--use_cvcuda",
required=False,
type=bool,
help="Use CV-CUDA in preprocess")
return parser.parse_args()
# define CustomProcessor
class CustomProcessor(PyProcessorManager):
def __init__(self) -> None:
super().__init__()
# create op
hw = [500, 500]
self.resize_op = fd.C.vision.processors.ResizeByShort(100, 1, True, hw)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
is_scale = True
min = []
max = []
swap_rb = False
self.normalize_permute_op = fd.C.vision.processors.NormalizeAndPermute(
mean, std, is_scale, min, max, swap_rb)
width = 50
height = 50
self.centercrop_op = fd.C.vision.processors.CenterCrop(width, height)
top = 5
bottom = 5
left = 5
right = 5
pad_value = [225, 225, 225]
self.pad_op = fd.C.vision.processors.Pad(top, bottom, left, right,
pad_value)
def apply(self, image_batch):
outputs = []
self.resize_op(image_batch)
self.centercrop_op(image_batch)
self.pad_op(image_batch)
self.normalize_permute_op(image_batch)
for i in range(len(image_batch.mats)):
outputs.append(image_batch.mats[i])
return outputs
if __name__ == "__main__":
# read jpg
im1 = cv2.imread('ILSVRC2012_val_00000010.jpeg')
im2 = cv2.imread('ILSVRC2012_val_00000010.jpeg')
mat1 = fd.C.vision.FDMat()
mat1.from_numpy(im1)
mat2 = fd.C.vision.FDMat()
mat2.from_numpy(im2)
images = [mat1, mat2]
args = parse_arguments()
# creae processor
preprocessor = CustomProcessor()
# use CV-CUDA
if args.use_cvcuda:
preprocessor.use_cuda(True, -1)
# show input
for i in range(len(images)):
images[i].print_info('images' + str(i) + ': ')
# run the Processer with CVCUDA
outputs = preprocessor(images)
# show output
for i in range(len(outputs)):
outputs[i].print_info('outputs' + str(i) + ': ')