 6c4a08e416
			
		
	
	6c4a08e416
	
	
	
		
			
			* Refactor PaddleSeg with preprocessor && postprocessor * Fix bugs * Delete redundancy code * Modify by comments * Refactor according to comments * Add batch evaluation * Add single test script * Add ppliteseg single test script && fix eval(raise) error * fix bug * Fix evaluation segmentation.py batch predict * Fix segmentation evaluation bug * Fix evaluation segmentation bugs * Update segmentation result docs * Update old predict api and DisableNormalizeAndPermute * Update resize segmentation label map with cv::INTER_NEAREST * Add Model Clone function for PaddleClas && PaddleDet && PaddleSeg * Add multi thread demo * Add python model clone function * Add multi thread python && C++ example * Fix bug * Update python && cpp multi_thread examples * Add cpp && python directory * Add README.md for examples * Delete redundant code * Create README_CN.md * Rename README_CN.md to README.md * Update README.md * Update README.md * Update VERSION_NUMBER * Update requirements.txt * Update README.md * update version in doc: * [Serving]Update Dockerfile (#1037) Update Dockerfile * Add license notice for RVM onnx model file (#1060) * [Model] Add GPL-3.0 license (#1065) Add GPL-3.0 license * PPOCR model support model clone * Update README.md * Update PPOCRv2 && PPOCRv3 clone code * Update PPOCR python __init__ * Add multi thread ocr example code * Update README.md * Update README.md * Update ResNet50_vd_infer multi process code * Add PPOCR multi process && thread example * Update README.md * Update README.md * Update multi-thread docs Co-authored-by: Jason <jiangjiajun@baidu.com> Co-authored-by: leiqing <54695910+leiqing1@users.noreply.github.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: WJJ1995 <wjjisloser@163.com>
		
			
				
	
	
	
		
			6.1 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	English | 中文
Usage of FastDeploy model multi-thread or multi-process prediction
FastDeploy provides the following multi-thread or multi-process examples for python and cpp developers
- Example of using python multi-thread and multi-process prediction
- Example of using cpp multithreaded prediction
Models that currently support multi-thread and multi-process predictions
| task type | illustrate | model download link | 
|---|---|---|
| Detection | support PaddleDetection series models | PaddleDetection | 
| Segmentation | support PaddleSeg series models | PaddleSeg | 
| Classification | support PaddleClas series models | PaddleClas | 
| OCR | support PaddleOCR series models | PaddleOCR | 
Notice:
- click the model download link above to download the model from the Download pre-training modelmodule
- OCR is a pipeline model. For multi-thread examples, please refer to the pipelinefolder. Other single-model multi-thread examples are in thesingle_modelfolder.
Clone model when using multi-thread prediction
the inference process of vision model is consist of three stages
- load the image, then the image is preprocessed, finally get the Tensor to be input to the model Runtime, that is the preprocess stage
- the model Runtime receives Tensor, do the inference, and obtains the output tensor of Runtime, that is the infer stage
- process the output tensor of Runtime to get the final structured information, such as DetectionResult, SegmentationResult, etc., that is the postprocess stage
For the above three stages: preprocess, inference, and postprocess, FastDeploy abstracted three corresponding classes, namely Preprocessor, Runtime, and PostProcessor
When using FastDeploy for multi-thread inference, several issues should be considered
- Can the Preprocessor, Runtime, and Postprocessor support parallel processing respectively?
- 在支持多线程并发的前提下,能否最大限度的减少内存或显存占用
- Under the premise of supporting multi-thread concurrency, can the memory or video memory usage be minimized?
FastDeploy adopts the method of copying multiple objects separately for multi-thread inference, so each thread has an independent instance of Preprocessor, Runtime, and PostProcessor. In order to reduce the memory usage, the Runtime adopt sharing the model weights copy method. In this way, the memory usage caused by copying multiple objects is reduced.
FastDeploy provides the following interface to clone the model (take PaddleClas as an example)
- Python: PaddleClasModel.clone()
- C++: PaddleClasModel::Clone()
Python
import fastdeploy as fd
option = fd.RuntimeOption()
model = fd.vision.classification.PaddleClasModel(model_file,
                                                 params_file,
                                                 config_file,
                                                 runtime_option=option)
model2 = model.clone()
im = cv2.imread(image)
res = model.predict(im)
C++
auto model = fastdeploy::vision::classification::PaddleClasModel(model_file,
                                                                 params_file,
                                                                 config_file,
                                                                 option);
auto model2 = model.Clone();
auto im = cv::imread(image_file);
fastdeploy::vision::ClassifyResult res;
model->Predict(im, &res)
Notice:Other models API refer to官方C++文档 and 官方Python文档
Python multi-thread and multi-process
Due to language limitations, Python has the existence of GIL lock. In computing-intensive scenarios, multithreading cannot make full use of hardware resources. Therefore, two examples of multi-process and multi-thread are provided on Python. The similarities and differences are as follows:
Comparison of multi-process and multi-thread inference in FastDeploy model
| resource usage | computationally intensive | I/O intensive | inter-process or inter-thread communication | |
|---|---|---|---|---|
| multi-process | large | fast | fast | slow | 
| multi-thread | little | slow | relatively fast | fast | 
注意: The above analysis is a theoretical analysis. In fact, Python has also made certain optimizations for different computing tasks. For example, the calculation of numpy can already be computed by multi-thread parallelly. In addition, the result aggregation between multiple processes involves time-consuming operation(inter-process communication), Besides, it is difficult to identify whether the task is computationally intensive or I/O intensive, so everything needs to be tested according to the task.
C++ multi-thread
The C++ multi-thread has the characteristics of occupying less resources and high speed.Therefore, multi-threaded inference is the best choice in C++
C++ comparition between multi-thread Clone and not Clone memory occupation
硬件:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
模型:ResNet50_vd_infer
后端:CPU OPENVINO Backend
memory occupation of initializing multiple models in a single process
| number of models | after model.Clone() | after model->predict() with model.Clone() | initializing model without model.Clone() | after model->predict() without model.Clone() | 
|---|---|---|---|---|
| 1 | 322M | 325M | 322M | 325M | 
| 2 | 322M | 325M | 559M | 560M | 
| 3 | 322M | 325M | 771M | 771M | 
memory occupation of multi-thread
| thread number | after model.Clone() | after model->predict() with model.Clone() | initialize model without model.Clone() | after model->predict() without model.Clone() | 
|---|---|---|---|---|
| 1 | 322M | 337M | 322M | 337M | 
| 2 | 322M | 343M | 548M | 566M | 
| 3 | 322M | 347M | 752M | 784M |