mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-27 18:41:02 +08:00
[Doc] Update cpp benchmark docs for CPU/GPU (#1377)
* [Benchmark] Init benchmark precision api * [Benchmark] Init benchmark precision api * [Benchmark] Add benchmark precision api * [Benchmark] Calculate the statis of diff * [Benchmark] Calculate the statis of diff * [Benchmark] Calculate the statis of diff * [Benchmark] Calculate the statis of diff * [Benchmark] Calculate the statis of diff * [Benchmark] Add SplitDataLine utils * [Benchmark] Add LexSortByXY func * [Benchmark] Add LexSortByXY func * [Benchmark] Add LexSortDetectionResultByXY func * [Benchmark] Add LexSortDetectionResultByXY func * [Benchmark] Add tensor diff presicion test * [Benchmark] fixed conflicts * [Benchmark] fixed calc tensor diff * fixed build bugs * fixed ci bugs when WITH_TESTING=ON * [Docs] init cpp benchmark docs * [Doc] update cpp benchmark docs * [Doc] update cpp benchmark docs * [Doc] update cpp benchmark docs * [Doc] update cpp benchmark docs
This commit is contained in:
137
benchmark/cpp/README.md
Normal file
137
benchmark/cpp/README.md
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
# FastDeploy C++ Benchmarks
|
||||||
|
|
||||||
|
## 1. 编译选项
|
||||||
|
以下选项为benchmark相关的编译选项,在编译用来跑benchmark的sdk时,必须开启。
|
||||||
|
|
||||||
|
|选项|需要设置的值|说明|
|
||||||
|
|---|---|---|
|
||||||
|
| ENABLE_BENCHMARK | ON | 默认OFF, 是否打开BENCHMARK模式 |
|
||||||
|
| ENABLE_VISION | ON | 默认OFF,是否编译集成视觉模型的部署模块 |
|
||||||
|
| ENABLE_TEXT | ON | 默认OFF,是否编译集成文本NLP模型的部署模块 |
|
||||||
|
|
||||||
|
运行FastDeploy C++ Benchmark,需先准备好相应的环境,并在ENABLE_BENCHMARK=ON模式下从源码编译FastDeploy C++ SDK. 以下将按照硬件维度,来说明相应的系统环境要求。不同环境下的详细要求,请参考[FastDeploy环境要求](../../docs/cn/build_and_install)
|
||||||
|
|
||||||
|
## 2. Benchmark 参数设置说明
|
||||||
|
|
||||||
|
<div id="参数设置说明"></div>
|
||||||
|
|
||||||
|
|
||||||
|
| 参数 | 作用 |
|
||||||
|
| -------------------- | ------------------------------------------ |
|
||||||
|
| --model | 模型路径 |
|
||||||
|
| --image | 图片路径 |
|
||||||
|
| --device | 选择 CPU/GPU/XPU,默认为 CPU |
|
||||||
|
| --cpu_thread_nums | CPU 线程数,默认为 8 |
|
||||||
|
| --device_id | GPU/XPU 卡号,默认为 0 |
|
||||||
|
| --warmup | 跑benchmark的warmup次数,默认为 200 |
|
||||||
|
| --repeat | 跑benchmark的循环次数,默认为 1000 |
|
||||||
|
| --profile_mode | 指定需要测试性能的模式,可选值为`[runtime, end2end]`,默认为 runtime |
|
||||||
|
| --include_h2d_d2h | 是否把H2D+D2H的耗时统计在内,该参数只在profile_mode为runtime时有效,默认为 false |
|
||||||
|
| --backend | 指定后端类型,有default, ort, ov, trt, paddle, paddle_trt, lite 等,为default时,会自动选择最优后端,推荐设置为显式设置明确的backend。默认为 default |
|
||||||
|
| --use_fp16 | 是否开启fp16,当前只对 trt, paddle-trt, lite后端有效,默认为 false |
|
||||||
|
| --collect_memory_info | 是否记录 cpu/gpu memory信息,默认 false |
|
||||||
|
| --sampling_interval | 记录 cpu/gpu memory信息采样时间间隔,单位ms,默认为 50 |
|
||||||
|
|
||||||
|
## 3. X86_64 CPU 和 NVIDIA GPU 环境下运行 Benchmark
|
||||||
|
|
||||||
|
### 3.1 环境准备
|
||||||
|
|
||||||
|
Linux上编译需满足:
|
||||||
|
- gcc/g++ >= 5.4(推荐8.2)
|
||||||
|
- cmake >= 3.18.0
|
||||||
|
- CUDA >= 11.2
|
||||||
|
- cuDNN >= 8.2
|
||||||
|
- TensorRT >= 8.5
|
||||||
|
|
||||||
|
在GPU上编译FastDeploy需要准备好相应的CUDA环境以及TensorRT,详细文档请参考[GPU编译文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/build_and_install/gpu.md)。
|
||||||
|
|
||||||
|
### 3.2 编译FastDeploy C++ SDK
|
||||||
|
```bash
|
||||||
|
# 源码编译SDK
|
||||||
|
git clone https://github.com/PaddlePaddle/FastDeploy.git -b develop
|
||||||
|
cd FastDeploy
|
||||||
|
mkdir build && cd build
|
||||||
|
cmake .. -DWITH_GPU=ON \
|
||||||
|
-DENABLE_ORT_BACKEND=ON \
|
||||||
|
-DENABLE_PADDLE_BACKEND=ON \
|
||||||
|
-DENABLE_OPENVINO_BACKEND=ON \
|
||||||
|
-DENABLE_TRT_BACKEND=ON \
|
||||||
|
-DENABLE_VISION=ON \
|
||||||
|
-DENABLE_TEXT=ON \
|
||||||
|
-DENABLE_BENCHMARK=ON \ # 开启benchmark模式
|
||||||
|
-DTRT_DIRECTORY=/Paddle/TensorRT-8.5.2.2 \
|
||||||
|
-DCUDA_DIRECTORY=/usr/local/cuda \
|
||||||
|
-DCMAKE_INSTALL_PREFIX=${PWD}/compiled_fastdeploy_sdk
|
||||||
|
|
||||||
|
make -j12
|
||||||
|
make install
|
||||||
|
|
||||||
|
# 配置SDK路径
|
||||||
|
cd ..
|
||||||
|
export FD_GPU_SDK=${PWD}/build/compiled_fastdeploy_sdk
|
||||||
|
```
|
||||||
|
### 3.3 编译 Benchmark 示例
|
||||||
|
```bash
|
||||||
|
cd benchmark/cpp
|
||||||
|
mkdir build && cd build
|
||||||
|
cmake .. -DFASTDEPLOY_INSTALL_DIR=${FD_GPU_SDK}
|
||||||
|
make -j4
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.4 运行 Benchmark 示例
|
||||||
|
|
||||||
|
在X86 CPU + NVIDIA GPU下,FastDeploy 目前支持多种推理后端,下面以 PaddleYOLOv8 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据。
|
||||||
|
|
||||||
|
- 下载模型文件和测试图片
|
||||||
|
```bash
|
||||||
|
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov8_s_500e_coco.tgz
|
||||||
|
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
|
||||||
|
tar -zxvf yolov8_s_500e_coco.tgz
|
||||||
|
```
|
||||||
|
|
||||||
|
- 运行 yolov8 benchmark 示例
|
||||||
|
|
||||||
|
```bash
|
||||||
|
|
||||||
|
# 统计性能
|
||||||
|
# CPU
|
||||||
|
# Paddle Inference
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device cpu --cpu_thread_nums 8 --backend paddle --profile_mode runtime
|
||||||
|
|
||||||
|
# ONNX Runtime
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device cpu --cpu_thread_nums 8 --backend ort --profile_mode runtime
|
||||||
|
|
||||||
|
# OpenVINO
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device cpu --cpu_thread_nums 8 --backend ov --profile_mode runtime
|
||||||
|
|
||||||
|
# GPU
|
||||||
|
# Paddle Inference
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend paddle --profile_mode runtime --warmup 200 --repeat 2000
|
||||||
|
|
||||||
|
# Paddle Inference + TensorRT
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend paddle_trt --profile_mode runtime --warmup 200 --repeat 2000
|
||||||
|
|
||||||
|
# Paddle Inference + TensorRT + FP16
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend paddle --profile_mode runtime --warmup 200 --repeat 2000 --use_fp16
|
||||||
|
|
||||||
|
# ONNX Runtime
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend ort --profile_mode runtime --warmup 200 --repeat 2000
|
||||||
|
|
||||||
|
# TensorRT
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend paddle --profile_mode runtime --warmup 200 --repeat 2000
|
||||||
|
|
||||||
|
# TensorRT + FP16
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device gpu --device_id 0 --backend trt --profile_mode runtime --warmup 200 --repeat 2000 --use_fp16
|
||||||
|
|
||||||
|
# 统计内存显存占用
|
||||||
|
# 增加--collect_memory_info选项
|
||||||
|
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --device cpu --cpu_thread_nums 8 --backend paddle --profile_mode runtime --collect_memory_info
|
||||||
|
```
|
||||||
|
注意,为避免对性能统计产生影响,测试性能时,最好不要开启内存显存统计的功能,当指定--collect_memory_info参数时,只有内存显存参数是稳定可靠的。更多参数设置,请参考[参数设置说明](#参数设置说明)
|
||||||
|
|
||||||
|
|
||||||
|
## 4. ARM CPU 环境下运行 Benchmark
|
||||||
|
- TODO
|
||||||
|
|
||||||
|
## 5. 昆仑芯 XPU 环境下运行 Benchmark
|
||||||
|
- TODO
|
||||||
@@ -63,6 +63,7 @@ static void PrintUsage() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
static void PrintBenchmarkInfo() {
|
static void PrintBenchmarkInfo() {
|
||||||
|
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
|
||||||
// Get model name
|
// Get model name
|
||||||
std::vector<std::string> model_names;
|
std::vector<std::string> model_names;
|
||||||
fastdeploy::benchmark::Split(FLAGS_model, model_names, sep);
|
fastdeploy::benchmark::Split(FLAGS_model, model_names, sep);
|
||||||
@@ -97,5 +98,6 @@ static void PrintBenchmarkInfo() {
|
|||||||
<< "ms" << std::endl;
|
<< "ms" << std::endl;
|
||||||
}
|
}
|
||||||
std::cout << ss.str() << std::endl;
|
std::cout << ss.str() << std::endl;
|
||||||
|
#endif
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|||||||
0
benchmark/cpp/run_benchmark_ppyolov8.sh
Normal file
0
benchmark/cpp/run_benchmark_ppyolov8.sh
Normal file
@@ -2,8 +2,8 @@
|
|||||||
|
|
||||||
在跑benchmark前,需确认以下两个步骤
|
在跑benchmark前,需确认以下两个步骤
|
||||||
|
|
||||||
* 1. 软硬件环境满足要求,参考[FastDeploy环境要求](..//docs/cn/build_and_install/download_prebuilt_libraries.md)
|
* 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../docs/cn/build_and_install/download_prebuilt_libraries.md)
|
||||||
* 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../docs/cn/build_and_install/download_prebuilt_libraries.md)
|
* 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../docs/cn/build_and_install/download_prebuilt_libraries.md)
|
||||||
|
|
||||||
FastDeploy 目前支持多种推理后端,下面以 PaddleClas MobileNetV1 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据
|
FastDeploy 目前支持多种推理后端,下面以 PaddleClas MobileNetV1 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user