Files
FastDeploy/benchmark/README.md
WJJ1995 4d2fbcb030 Add Benchmark readme (#236)
* add ppcls benchmark

* add ppcls benchmark

* add ppcls benchmark

* add ppcls benchmark

* fixed txt path

* resolve conflict

* resolve conflict

* deal with comments

* Add enable_trt_fp16 option

* Add OV backend for seg and det

* fixed valid backends in ppdet

* fixed valid backends in yolo

* add copyright and rm Chinese Notes

* add ppdet&ppseg&yolo benchmark

* add cpu/gpu mem info

* Add benchmark readme

* fixed bug

Co-authored-by: Jason <jiangjiajun@baidu.com>
2022-09-15 21:36:10 +08:00

92 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FastDeploy Benchmarks
在跑benchmark前需确认以下两个步骤
* 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../docs/environment.md)
* 2. FastDeploy Python whl包安装参考[FastDeploy Python安装](../docs/quick_start)
FastDeploy 目前支持多种推理后端,下面以 PaddleClas MobileNetV1 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据
```bash
# 下载 MobileNetV1 模型
wget https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz
tar -xvf MobileNetV1_x0_25_infer
# 下载图片
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
# CPU
# Paddle Inference
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend paddle
# ONNX Runtime
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ort
# OpenVINO
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ov
# GPU
# Paddle Inference
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle
# ONNX Runtime
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend ort
# TensorRT
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt
# TensorRT fp16
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True
```
**具体参数说明**
| 参数 | 作用 |
| -------------------- | ------------------------------------------ |
| --model | 模型路径 |
| --image | 图片路径 |
| --device | 选择 CPU 还是 GPU默认 CPU |
| --cpu_num_thread | CPU 线程数 |
| --device_id | GPU 卡号 |
| --iter_num | 跑 benchmark 的迭代次数 |
| --backend | 指定后端类型有ort, ov, trt, paddle四个选项 |
| --enable_trt_fp16 | 当后端为trt时是否开启fp16 |
**最终txt结果**
将当前目录的所有txt汇总并结构化执行下列命令
```bash
# 汇总
cat *.txt >> ./result_ppcls.txt
# 结构化信息
python convert_info.py --txt_path result_ppcls.txt --domain ppcls
```
得到 CPU 结果```struct_cpu_ppcls.txt```以及 GPU 结果```struct_gpu_ppcls.txt```如下所示
```bash
# struct_cpu_ppcls.txt
model_name thread_nums ort_run ort_end2end cpu_rss_mb ov_run ov_end2end cpu_rss_mb paddle_run paddle_end2end cpu_rss_mb
MobileNetV1_x0_25 8 1.18 3.27 270.43 0.87 1.98 272.26 3.13 5.29 899.57
# struct_gpu_ppcls.txt
model_name ort_run ort_end2end gpu_rss_mb paddle_run paddle_end2end gpu_rss_mb trt_run trt_end2end gpu_rss_mb trt_fp16_run trt_fp16_end2end gpu_rss_mb
MobileNetV1_x0_25 1.25 3.24 677.06 2.00 3.77 945.06 0.67 2.66 851.06 0.53 2.46 839.06
```
**结果说明**
* ```_run```后缀代表一次infer耗时包括H2D以及D2H```_end2end```后缀代表包含前后处理耗时
* ```cpu_rss_mb```代表内存占用;```gpu_rss_mb```代表显存占用
若有多个PaddleClas模型在当前目录新建ppcls_model目录将所有模型放入该目录即可运行下列命令
```bash
sh run_benchmark_ppcls.sh
```
一键得到所有模型在 CPU 以及 GPU 的 benchmark 数据