* cuda normalize and permute, cuda concat
* add use cuda option for preprocessor
* ppyoloe use cuda normalize
* ppseg use cuda normalize
* add proclib cuda in processor base
* ppcls add use cuda preprocess api
* ppcls preprocessor set gpu id
* fix pybind
* refine ppcls preprocessing use gpu logic
* fdtensor device id is -1 by default
* refine assert message
Co-authored-by: heliqi <1101791222@qq.com>
* Add InferShape func for all the vision processors
* fix infer shape of limit short
* Fix infer shape bug of stride_pad
* revert modify of processor
* add function pad
* TRT cast int64 to int32
* windows cmake build cuda src
* fix windows cmake error when build cuda src
* add a notice in windows gpu build doc
* cmake add cuda std=11
* TRT cast output from int32 to int64
* nits
* trt get original input output dtype
* Add FDTensor copy and move assignment and constructor
* Upgrade the transpose to receive the output tensor same as input tensor
* Add note
* Add realloc for FDTensor
* Support output equals to input for softmax
* Remove FDTensor::Alloc