* yolov5 use external stream
* yolov5lite/v6/v7/v7e2etrt: optimize output tensor and cuda stream
* avoid reallocating output tensors
* add input output tensors to FastDeployModel
* add cuda.cmake
* rename to reused_input/output_tensors
* eliminate cmake cuda arch error
* use swap to release input and output tensors
Co-authored-by: Jason <jiangjiajun@baidu.com>
* feat(ipu): add ipu support for paddle_infer backend.
* fix(): remove unused env.
* fix(ipu): simplify user API for IPU.
* fix(cmake): fix merge conflict error in CMakeList.
Co-authored-by: Jason <jiangjiajun@baidu.com>
* add yolo cuda preprocessing
* cmake build cuda src
* yolov5 support cuda preprocessing
* yolov5 cuda preprocessing configurable
* yolov5 update get mat data api
* yolov5 check cuda preprocess args
* refactor cuda function name
* yolo cuda preprocess padding value configurable
* yolov5 release cuda memory
* cuda preprocess pybind api update
* move use_cuda_preprocessing option to yolov5 model
* yolov5lite cuda preprocessing
* yolov6 cuda preprocessing
* yolov7 cuda preprocessing
* yolov7_e2e cuda preprocessing
* remove cuda preprocessing in runtime option
* refine log and cmake variable name
* fix model runtime ptr type
Co-authored-by: Jason <jiangjiajun@baidu.com>