* norm and permute batch processing
* move cache to mat, batch processors
* get batched tensor logic, resize on cpu logic
* fix cpu compile error
* remove vector mat api
* nits
* add comments
* nits
* fix batch size
* move initial resize on cpu option to use_cuda api
* fix pybind
* processor manager pybind
* rename mat and matbatch
* move initial resize on cpu to ppcls preprocessor
---------
Co-authored-by: Jason <jiangjiajun@baidu.com>
* cvcuda resize
* cvcuda center crop
* cvcuda resize
* add a fdtensor in fdmat
* get cv mat and get tensor support gpu
* paddleclas cvcuda preprocessor
* fix compile err
* fix windows compile error
* rename reused to cached
* address comment
* remove debug code
* add comment
* add manager run
* use cuda and cuda used
* use cv cuda doc
* address comment
---------
Co-authored-by: Jason <jiangjiajun@baidu.com>