* norm and permute batch processing
* move cache to mat, batch processors
* get batched tensor logic, resize on cpu logic
* fix cpu compile error
* remove vector mat api
* nits
* add comments
* nits
* fix batch size
* move initial resize on cpu option to use_cuda api
* fix pybind
* processor manager pybind
* rename mat and matbatch
* move initial resize on cpu to ppcls preprocessor
---------
Co-authored-by: Jason <jiangjiajun@baidu.com>