Files
FastDeploy/docs/model_compression/quant.md
yunyaoXYY 1efc0fa6b0 Add Quantization Function. (#256)
* Add PaddleOCR Support

* Add PaddleOCR Support

* Add PaddleOCRv3 Support

* Add PaddleOCRv3 Support

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add PaddleOCRv3 Support

* Add PaddleOCRv3 Supports

* Add PaddleOCRv3 Suport

* Fix Rec diff

* Remove useless functions

* Remove useless comments

* Add PaddleOCRv2 Support

* Add PaddleOCRv3 & PaddleOCRv2 Support

* remove useless parameters

* Add utils of sorting det boxes

* Fix code naming convention

* Fix code naming convention

* Fix code naming convention

* Fix bug in the Classify process

* Imporve OCR Readme

* Fix diff in Cls model

* Update Model Download Link in Readme

* Fix diff in PPOCRv2

* Improve OCR readme

* Imporve OCR readme

* Improve OCR readme

* Improve OCR readme

* Imporve OCR readme

* Improve OCR readme

* Fix conflict

* Add readme for OCRResult

* Improve OCR readme

* Add OCRResult readme

* Improve OCR readme

* Improve OCR readme

* Add Model Quantization Demo

* Fix Model Quantization Readme

* Fix Model Quantization Readme

* Add the function to do PTQ quantization

* Improve quant tools readme

* Improve quant tool readme

* Improve quant tool readme

* Add PaddleInference-GPU for OCR Rec model

* Add QAT method to fastdeploy-quantization tool

* Remove examples/slim for now

* Move configs folder

* Add Quantization Support for Classification Model

* Imporve ways of importing preprocess

* Upload YOLO Benchmark on readme

* Upload YOLO Benchmark on readme

* Upload YOLO Benchmark on readme

* Improve Quantization configs and readme

* Add support for multi-inputs model
2022-10-08 15:45:28 +08:00

38 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FastDeploy 支持量化模型部署
量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度. FastDeploy支持部署量化后的模型帮助用户实现推理加速.
## 1. FastDeploy 多个引擎支持量化模型部署
当前FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下:
| 硬件/推理后端 | ONNXRuntime | Paddle Inference | TensorRT |
| :-----------| :-------- | :--------------- | :------- |
| CPU | 支持 | 支持 | |
| GPU | | | 支持 |
## 2. 用户如何量化模型
### 量化方式
用户可以通过PaddleSlim来量化模型, 量化主要有量化训练和离线量化两种方式, 量化训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署.
两种方法的主要对比如下表所示:
| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 |
| :-----------| :--------| :-------| :------- | :------- |
| 离线量化 | 无需训练,耗时短 | 比量化训练稍低 | 两者一致 | 两者一致 |
| 量化训练 | 需要训练,耗时高 | 较未量化模型有少量损失 | 两者一致 |两者一致 |
### 用户使用fastdeploy_quant命令量化模型
Fastdeploy 为用户提供了一键模型量化的功能,请参考如下文档进行模型量化.
- [FastDeploy 一键模型量化](../../tools/quantization/)
当用户获得产出的量化模型之后即可以使用FastDeploy来部署量化模型.
## 3. FastDeploy 部署量化模型
用户只需要简单地传入量化后的模型路径及相应参数即可以使用FastDeploy进行部署.
具体请用户参考示例文档:
- [YOLOv5s 量化模型Python部署](../../examples/slim/yolov5s/python/)
- [YOLOv5s 量化模型C++部署](../../examples/slim/yolov5s/cpp/)
- [YOLOv6s 量化模型Python部署](../../examples/slim/yolov6s/python/)
- [YOLOv6s 量化模型C++部署](../../examples/slim/yolov6s/cpp/)
- [YOLOv7 量化模型Python部署](../../examples/slim/yolov7/python/)
- [YOLOv7 量化模型C++部署](../../examples/slim/yolov7/cpp/)