mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-05 16:48:03 +08:00
Revert "[Benchmark]Benchmark cpp for YOLOv5" (#1250)
Revert "[Benchmark]Benchmark cpp for YOLOv5 (#1224)"
This reverts commit c487359e33
.
This commit is contained in:
@@ -1,111 +0,0 @@
|
||||
# FastDeploy Benchmarks
|
||||
|
||||
在跑benchmark前,需确认以下两个步骤
|
||||
|
||||
* 1. 软硬件环境满足要求,参考[FastDeploy环境要求](..//docs/cn/build_and_install/download_prebuilt_libraries.md)
|
||||
* 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../docs/cn/build_and_install/download_prebuilt_libraries.md)
|
||||
|
||||
FastDeploy 目前支持多种推理后端,下面以 PaddleClas MobileNetV1 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据
|
||||
|
||||
```bash
|
||||
# 下载 MobileNetV1 模型
|
||||
wget https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz
|
||||
tar -xvf MobileNetV1_x0_25_infer.tgz
|
||||
|
||||
# 下载图片
|
||||
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
|
||||
|
||||
# CPU
|
||||
# Paddle Inference
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend paddle
|
||||
|
||||
# ONNX Runtime
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ort
|
||||
|
||||
# OpenVINO
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ov
|
||||
|
||||
# GPU
|
||||
# Paddle Inference
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle
|
||||
|
||||
# Paddle Inference + TensorRT
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt
|
||||
|
||||
# Paddle Inference + TensorRT fp16
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True
|
||||
|
||||
# ONNX Runtime
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend ort
|
||||
|
||||
# TensorRT
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt
|
||||
|
||||
# TensorRT fp16
|
||||
python benchmark_ppcls.py --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True
|
||||
|
||||
```
|
||||
|
||||
**具体参数说明**
|
||||
|
||||
| 参数 | 作用 |
|
||||
| -------------------- | ------------------------------------------ |
|
||||
| --model | 模型路径 |
|
||||
| --image | 图片路径 |
|
||||
| --device | 选择 CPU 还是 GPU,默认 CPU |
|
||||
| --cpu_num_thread | CPU 线程数 |
|
||||
| --device_id | GPU 卡号 |
|
||||
| --iter_num | 跑 benchmark 的迭代次数 |
|
||||
| --backend | 指定后端类型,有ort, ov, trt, paddle, paddle_trt 五个选项 |
|
||||
| --enable_trt_fp16 | 当后端为trt或paddle_trt时,是否开启fp16 |
|
||||
| --enable_collect_memory_info | 是否记录 cpu/gpu memory信息,默认 False |
|
||||
|
||||
**最终txt结果**
|
||||
|
||||
将当前目录的所有txt汇总并结构化,执行下列命令
|
||||
|
||||
```bash
|
||||
# 汇总
|
||||
cat *.txt >> ./result_ppcls.txt
|
||||
|
||||
# 结构化信息
|
||||
python convert_info.py --txt_path result_ppcls.txt --domain ppcls --enable_collect_memory_info True
|
||||
```
|
||||
|
||||
得到 CPU 结果```struct_cpu_ppcls.txt```以及 GPU 结果```struct_gpu_ppcls.txt```如下所示
|
||||
|
||||
```bash
|
||||
# struct_cpu_ppcls.txt
|
||||
model_name thread_nums ort_run ort_end2end cpu_rss_mb ov_run ov_end2end cpu_rss_mb paddle_run paddle_end2end cpu_rss_mb
|
||||
MobileNetV1_x0_25 8 1.18 3.27 270.43 0.87 1.98 272.26 3.13 5.29 899.57
|
||||
|
||||
# struct_gpu_ppcls.txt
|
||||
model_name ort_run ort_end2end gpu_rss_mb paddle_run paddle_end2end gpu_rss_mb trt_run trt_end2end gpu_rss_mb trt_fp16_run trt_fp16_end2end gpu_rss_mb
|
||||
MobileNetV1_x0_25 1.25 3.24 677.06 2.00 3.77 945.06 0.67 2.66 851.06 0.53 2.46 839.06
|
||||
```
|
||||
|
||||
**结果说明**
|
||||
|
||||
* ```_run```后缀代表一次infer耗时,包括H2D以及D2H;```_end2end```后缀代表包含前后处理耗时
|
||||
* ```cpu_rss_mb```代表内存占用;```gpu_rss_mb```代表显存占用
|
||||
|
||||
若有多个PaddleClas模型,在当前目录新建ppcls_model目录,将所有模型放入该目录即可,运行下列命令
|
||||
|
||||
```bash
|
||||
sh run_benchmark_ppcls.sh
|
||||
```
|
||||
|
||||
一键得到所有模型在 CPU 以及 GPU 的 benchmark 数据
|
||||
|
||||
|
||||
**添加新设备**
|
||||
|
||||
如果添加了一种新设备,想进行 benchmark 测试,以```ipu```为例
|
||||
|
||||
在对应 benchmark 脚本```--device```中加入```ipu```选项,并通过```option.use_ipu()```进行开启
|
||||
|
||||
输入下列命令,进行 benchmark 测试
|
||||
|
||||
```shell
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --iter_num 2000 --backend paddle --device ipu
|
||||
```
|
@@ -1,282 +0,0 @@
|
||||
import paddlenlp
|
||||
import numpy as np
|
||||
from paddlenlp.transformers import AutoTokenizer
|
||||
from paddlenlp.datasets import load_dataset
|
||||
import fastdeploy as fd
|
||||
import os
|
||||
import time
|
||||
import distutils.util
|
||||
import sys
|
||||
import pynvml
|
||||
import psutil
|
||||
import GPUtil
|
||||
from prettytable import PrettyTable
|
||||
import multiprocessing
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model_dir",
|
||||
required=True,
|
||||
help="The directory of model and tokenizer.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
type=str,
|
||||
default='gpu',
|
||||
choices=['gpu', 'cpu'],
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default='pp',
|
||||
choices=['ort', 'pp', 'trt', 'pp-trt'],
|
||||
help="The inference runtime backend.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--batch_size", type=int, default=32, help="The batch size of data.")
|
||||
parser.add_argument(
|
||||
"--max_length",
|
||||
type=int,
|
||||
default=128,
|
||||
help="The max length of sequence.")
|
||||
parser.add_argument(
|
||||
"--log_interval",
|
||||
type=int,
|
||||
default=10,
|
||||
help="The interval of logging.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_threads",
|
||||
type=int,
|
||||
default=1,
|
||||
help="The number of threads when inferring on cpu.")
|
||||
parser.add_argument(
|
||||
"--use_fp16",
|
||||
type=distutils.util.strtobool,
|
||||
default=False,
|
||||
help="Use FP16 mode")
|
||||
parser.add_argument(
|
||||
"--use_fast",
|
||||
type=distutils.util.strtobool,
|
||||
default=True,
|
||||
help="Whether to use fast_tokenizer to accelarate the tokenization.")
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def create_fd_runtime(args):
|
||||
option = fd.RuntimeOption()
|
||||
model_path = os.path.join(args.model_dir, "infer.pdmodel")
|
||||
params_path = os.path.join(args.model_dir, "infer.pdiparams")
|
||||
option.set_model_path(model_path, params_path)
|
||||
if args.device == 'cpu':
|
||||
option.use_cpu()
|
||||
option.set_cpu_thread_num(args.cpu_num_threads)
|
||||
else:
|
||||
option.use_gpu(args.device_id)
|
||||
if args.backend == 'pp':
|
||||
option.use_paddle_backend()
|
||||
elif args.backend == 'ort':
|
||||
option.use_ort_backend()
|
||||
else:
|
||||
option.use_trt_backend()
|
||||
if args.backend == 'pp-trt':
|
||||
option.enable_paddle_to_trt()
|
||||
option.enable_paddle_trt_collect_shape()
|
||||
trt_file = os.path.join(args.model_dir, "infer.trt")
|
||||
option.set_trt_input_shape(
|
||||
'input_ids',
|
||||
min_shape=[1, args.max_length],
|
||||
opt_shape=[args.batch_size, args.max_length],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
option.set_trt_input_shape(
|
||||
'token_type_ids',
|
||||
min_shape=[1, args.max_length],
|
||||
opt_shape=[args.batch_size, args.max_length],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
if args.use_fp16:
|
||||
option.enable_trt_fp16()
|
||||
trt_file = trt_file + ".fp16"
|
||||
option.set_trt_cache_file(trt_file)
|
||||
return fd.Runtime(option)
|
||||
|
||||
|
||||
def convert_examples_to_data(dataset, batch_size):
|
||||
texts, text_pairs, labels = [], [], []
|
||||
batch_text, batch_text_pair, batch_label = [], [], []
|
||||
|
||||
for i, item in enumerate(dataset):
|
||||
batch_text.append(item['sentence1'])
|
||||
batch_text_pair.append(item['sentence2'])
|
||||
batch_label.append(item['label'])
|
||||
if (i + 1) % batch_size == 0:
|
||||
texts.append(batch_text)
|
||||
text_pairs.append(batch_text_pair)
|
||||
labels.append(batch_label)
|
||||
batch_text, batch_text_pair, batch_label = [], [], []
|
||||
return texts, text_pairs, labels
|
||||
|
||||
|
||||
def postprocess(logits):
|
||||
max_value = np.max(logits, axis=1, keepdims=True)
|
||||
exp_data = np.exp(logits - max_value)
|
||||
probs = exp_data / np.sum(exp_data, axis=1, keepdims=True)
|
||||
out_dict = {
|
||||
"label": probs.argmax(axis=-1),
|
||||
"confidence": probs.max(axis=-1)
|
||||
}
|
||||
return out_dict
|
||||
|
||||
|
||||
def get_statistics_table(tokenizer_time_costs, runtime_time_costs,
|
||||
postprocess_time_costs):
|
||||
x = PrettyTable()
|
||||
x.field_names = [
|
||||
"Stage", "Mean latency", "P50 latency", "P90 latency", "P95 latency"
|
||||
]
|
||||
x.add_row([
|
||||
"Tokenization", f"{np.mean(tokenizer_time_costs):.4f}",
|
||||
f"{np.percentile(tokenizer_time_costs, 50):.4f}",
|
||||
f"{np.percentile(tokenizer_time_costs, 90):.4f}",
|
||||
f"{np.percentile(tokenizer_time_costs, 95):.4f}"
|
||||
])
|
||||
x.add_row([
|
||||
"Runtime", f"{np.mean(runtime_time_costs):.4f}",
|
||||
f"{np.percentile(runtime_time_costs, 50):.4f}",
|
||||
f"{np.percentile(runtime_time_costs, 90):.4f}",
|
||||
f"{np.percentile(runtime_time_costs, 95):.4f}"
|
||||
])
|
||||
x.add_row([
|
||||
"Postprocessing", f"{np.mean(postprocess_time_costs):.4f}",
|
||||
f"{np.percentile(postprocess_time_costs, 50):.4f}",
|
||||
f"{np.percentile(postprocess_time_costs, 90):.4f}",
|
||||
f"{np.percentile(postprocess_time_costs, 95):.4f}"
|
||||
])
|
||||
return x
|
||||
|
||||
|
||||
def get_current_memory_mb(gpu_id=None):
|
||||
pid = os.getpid()
|
||||
p = psutil.Process(pid)
|
||||
info = p.memory_full_info()
|
||||
cpu_mem = info.uss / 1024. / 1024.
|
||||
gpu_mem = 0
|
||||
if gpu_id is not None:
|
||||
pynvml.nvmlInit()
|
||||
handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_id)
|
||||
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
|
||||
gpu_mem = meminfo.used / 1024. / 1024.
|
||||
return cpu_mem, gpu_mem
|
||||
|
||||
|
||||
def get_current_gputil(gpu_id):
|
||||
GPUs = GPUtil.getGPUs()
|
||||
gpu_load = GPUs[gpu_id].load
|
||||
return gpu_load
|
||||
|
||||
|
||||
def sample_gpuutil(gpu_id, gpu_utilization=[]):
|
||||
while True:
|
||||
gpu_utilization.append(get_current_gputil(gpu_id))
|
||||
time.sleep(0.01)
|
||||
|
||||
|
||||
def show_statistics(tokenizer_time_costs,
|
||||
runtime_time_costs,
|
||||
postprocess_time_costs,
|
||||
correct_num,
|
||||
total_num,
|
||||
cpu_mem,
|
||||
gpu_mem,
|
||||
gpu_util,
|
||||
prefix=""):
|
||||
print(
|
||||
f"{prefix}Acc = {correct_num/total_num*100:.2f} ({correct_num} /{total_num})."
|
||||
f" CPU memory: {np.mean(cpu_mem):.2f} MB, GPU memory: {np.mean(gpu_mem):.2f} MB,"
|
||||
f" GPU utilization {np.max(gpu_util) * 100:.2f}%.")
|
||||
print(
|
||||
get_statistics_table(tokenizer_time_costs, runtime_time_costs,
|
||||
postprocess_time_costs))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parse_arguments()
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
"ernie-3.0-medium-zh", use_faster=args.use_fast)
|
||||
runtime = create_fd_runtime(args)
|
||||
input_ids_name = runtime.get_input_info(0).name
|
||||
token_type_ids_name = runtime.get_input_info(1).name
|
||||
|
||||
test_ds = load_dataset("clue", "afqmc", splits=['dev'])
|
||||
texts, text_pairs, labels = convert_examples_to_data(test_ds,
|
||||
args.batch_size)
|
||||
gpu_id = args.device_id
|
||||
|
||||
def run_inference(warmup_steps=None):
|
||||
tokenizer_time_costs = []
|
||||
runtime_time_costs = []
|
||||
postprocess_time_costs = []
|
||||
cpu_mem = []
|
||||
gpu_mem = []
|
||||
|
||||
total_num = 0
|
||||
correct_num = 0
|
||||
|
||||
manager = multiprocessing.Manager()
|
||||
gpu_util = manager.list()
|
||||
p = multiprocessing.Process(
|
||||
target=sample_gpuutil, args=(gpu_id, gpu_util))
|
||||
p.start()
|
||||
for i, (text, text_pair,
|
||||
label) in enumerate(zip(texts, text_pairs, labels)):
|
||||
# Start the process to sample gpu utilization
|
||||
start = time.time()
|
||||
encoded_inputs = tokenizer(
|
||||
text=text,
|
||||
text_pair=text_pair,
|
||||
max_length=args.max_length,
|
||||
padding='max_length',
|
||||
truncation=True,
|
||||
return_tensors='np')
|
||||
tokenizer_time_costs += [(time.time() - start) * 1000]
|
||||
|
||||
start = time.time()
|
||||
input_map = {
|
||||
input_ids_name: encoded_inputs["input_ids"].astype('int64'),
|
||||
token_type_ids_name:
|
||||
encoded_inputs["token_type_ids"].astype('int64'),
|
||||
}
|
||||
results = runtime.infer(input_map)
|
||||
runtime_time_costs += [(time.time() - start) * 1000]
|
||||
|
||||
start = time.time()
|
||||
output = postprocess(results[0])
|
||||
postprocess_time_costs += [(time.time() - start) * 1000]
|
||||
|
||||
cm, gm = get_current_memory_mb(gpu_id)
|
||||
cpu_mem.append(cm)
|
||||
gpu_mem.append(gm)
|
||||
|
||||
total_num += len(label)
|
||||
correct_num += (label == output["label"]).sum()
|
||||
if warmup_steps is not None and i >= warmup_steps:
|
||||
break
|
||||
if (i + 1) % args.log_interval == 0:
|
||||
show_statistics(tokenizer_time_costs, runtime_time_costs,
|
||||
postprocess_time_costs, correct_num, total_num,
|
||||
cpu_mem, gpu_mem, gpu_util,
|
||||
f"Step {i + 1: 6d}: ")
|
||||
show_statistics(tokenizer_time_costs, runtime_time_costs,
|
||||
postprocess_time_costs, correct_num, total_num,
|
||||
cpu_mem, gpu_mem, gpu_util, f"Final statistics: ")
|
||||
p.terminate()
|
||||
|
||||
# Warm up
|
||||
print("Warm up")
|
||||
run_inference(10)
|
||||
print("Start to test the benchmark")
|
||||
run_inference()
|
||||
print("Finish")
|
@@ -1,325 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import fastdeploy as fd
|
||||
import cv2
|
||||
import os
|
||||
import numpy as np
|
||||
import time
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model", required=True, help="Path of PaddleClas model.")
|
||||
parser.add_argument(
|
||||
"--image", type=str, required=False, help="Path of test image file.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_thread",
|
||||
type=int,
|
||||
default=8,
|
||||
help="default number of cpu thread.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--profile_mode",
|
||||
type=str,
|
||||
default="runtime",
|
||||
help="runtime or end2end.")
|
||||
parser.add_argument(
|
||||
"--repeat",
|
||||
required=True,
|
||||
type=int,
|
||||
default=1000,
|
||||
help="number of repeats for profiling.")
|
||||
parser.add_argument(
|
||||
"--warmup",
|
||||
required=True,
|
||||
type=int,
|
||||
default=50,
|
||||
help="number of warmup for profiling.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default="cpu",
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="default",
|
||||
help="inference backend, default, ort, ov, trt, paddle, paddle_trt.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
parser.add_argument(
|
||||
"--include_h2d_d2h",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether run profiling with h2d and d2h")
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
device = args.device
|
||||
backend = args.backend
|
||||
enable_trt_fp16 = args.enable_trt_fp16
|
||||
if args.profile_mode == "runtime":
|
||||
option.enable_profiling(args.include_h2d_d2h, args.repeat, args.warmup)
|
||||
option.set_cpu_thread_num(args.cpu_num_thread)
|
||||
if device == "gpu":
|
||||
option.use_gpu()
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
option.set_openvino_device(name="GPU")
|
||||
# change name and shape for models
|
||||
option.set_openvino_shape_info({"x": [1, 3, 224, 224]})
|
||||
elif backend in ["trt", "paddle_trt"]:
|
||||
option.use_trt_backend()
|
||||
if backend == "paddle_trt":
|
||||
option.enable_paddle_to_trt()
|
||||
if enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with GPU, only support default/ort/paddle/trt/paddle_trt now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "cpu":
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/ov/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
else:
|
||||
raise Exception(
|
||||
"Only support device CPU/GPU now, {} is not supported.".format(
|
||||
device))
|
||||
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
args = parse_arguments()
|
||||
option = build_option(args)
|
||||
model_file = os.path.join(args.model, "inference.pdmodel")
|
||||
params_file = os.path.join(args.model, "inference.pdiparams")
|
||||
config_file = os.path.join(args.model, "inference_cls.yaml")
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_thread) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model + "_model_" + \
|
||||
args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
try:
|
||||
model = fd.vision.classification.PaddleClasModel(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
|
||||
im_ori = cv2.imread(args.image)
|
||||
if args.profile_mode == "runtime":
|
||||
result = model.predict(im_ori)
|
||||
profile_time = model.get_profile_time()
|
||||
dump_result["runtime"] = profile_time * 1000
|
||||
f.writelines("Runtime(ms): {} \n".format(
|
||||
str(dump_result["runtime"])))
|
||||
print("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
else:
|
||||
# end2end
|
||||
for i in range(args.warmup):
|
||||
result = model.predict(im_ori)
|
||||
|
||||
start = time.time()
|
||||
for i in tqdm(range(args.repeat)):
|
||||
result = model.predict(im_ori)
|
||||
end = time.time()
|
||||
dump_result["end2end"] = ((end - start) / args.repeat) * 1000.0
|
||||
f.writelines("End2End(ms): {} \n".format(
|
||||
str(dump_result["end2end"])))
|
||||
print("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
if enable_collect_memory_info:
|
||||
f.writelines("cpu_rss_mb: {} \n".format(
|
||||
str(dump_result["cpu_rss_mb"])))
|
||||
f.writelines("gpu_rss_mb: {} \n".format(
|
||||
str(dump_result["gpu_rss_mb"])))
|
||||
f.writelines("gpu_util: {} \n".format(
|
||||
str(dump_result["gpu_util"])))
|
||||
print("cpu_rss_mb: {} \n".format(str(dump_result["cpu_rss_mb"])))
|
||||
print("gpu_rss_mb: {} \n".format(str(dump_result["gpu_rss_mb"])))
|
||||
print("gpu_util: {} \n".format(str(dump_result["gpu_util"])))
|
||||
except Exception as e:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
raise e
|
||||
|
||||
f.close()
|
@@ -1,384 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import fastdeploy as fd
|
||||
import cv2
|
||||
import os
|
||||
import numpy as np
|
||||
import time
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model", required=True, help="Path of PaddleClas model.")
|
||||
parser.add_argument(
|
||||
"--image", type=str, required=False, help="Path of test image file.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_thread",
|
||||
type=int,
|
||||
default=8,
|
||||
help="default number of cpu thread.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--profile_mode",
|
||||
type=str,
|
||||
default="runtime",
|
||||
help="runtime or end2end.")
|
||||
parser.add_argument(
|
||||
"--repeat",
|
||||
required=True,
|
||||
type=int,
|
||||
default=1000,
|
||||
help="number of repeats for profiling.")
|
||||
parser.add_argument(
|
||||
"--warmup",
|
||||
required=True,
|
||||
type=int,
|
||||
default=50,
|
||||
help="number of warmup for profiling.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default="cpu",
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="default",
|
||||
help="inference backend, default, ort, ov, trt, paddle, paddle_trt.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--enable_lite_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in Paddle Lite backend")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
parser.add_argument(
|
||||
"--include_h2d_d2h",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether run profiling with h2d and d2h")
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
device = args.device
|
||||
backend = args.backend
|
||||
enable_trt_fp16 = args.enable_trt_fp16
|
||||
enable_lite_fp16 = args.enable_lite_fp16
|
||||
if args.profile_mode == "runtime":
|
||||
option.enable_profiling(args.include_h2d_d2h, args.repeat, args.warmup)
|
||||
option.set_cpu_thread_num(args.cpu_num_thread)
|
||||
if device == "gpu":
|
||||
option.use_gpu()
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
# Using GPU and CPU heterogeneous execution mode
|
||||
option.set_openvino_device("HETERO:GPU,CPU")
|
||||
# change name and shape for models
|
||||
option.set_openvino_shape_info({
|
||||
"image": [1, 3, 320, 320],
|
||||
"scale_factor": [1, 2]
|
||||
})
|
||||
# Set CPU up operator
|
||||
option.set_openvino_cpu_operators(["MulticlassNms"])
|
||||
elif backend in ["trt", "paddle_trt"]:
|
||||
option.use_trt_backend()
|
||||
if backend == "paddle_trt":
|
||||
option.enable_paddle_to_trt()
|
||||
if enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with GPU, only support default/ort/paddle/trt/paddle_trt now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "cpu":
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/ov/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "kunlunxin":
|
||||
option.use_kunlunxin()
|
||||
if backend == "lite":
|
||||
option.use_lite_backend()
|
||||
elif backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/lite/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "ascend":
|
||||
option.use_ascend()
|
||||
if backend == "lite":
|
||||
option.use_lite_backend()
|
||||
if enable_lite_fp16:
|
||||
option.enable_lite_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/lite now, {} is not supported.".
|
||||
format(backend))
|
||||
else:
|
||||
raise Exception(
|
||||
"Only support device CPU/GPU/Kunlunxin/Ascend now, {} is not supported.".
|
||||
format(device))
|
||||
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
args = parse_arguments()
|
||||
option = build_option(args)
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
config_file = os.path.join(args.model, "infer_cfg.yml")
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_thread) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model + "_model_" + args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
try:
|
||||
if "ppyoloe" in args.model:
|
||||
model = fd.vision.detection.PPYOLOE(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "picodet" in args.model:
|
||||
model = fd.vision.detection.PicoDet(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "yolox" in args.model:
|
||||
model = fd.vision.detection.PaddleYOLOX(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "yolov3" in args.model:
|
||||
model = fd.vision.detection.YOLOv3(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "yolov8" in args.model:
|
||||
model = fd.vision.detection.PaddleYOLOv8(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "ppyolo_r50vd_dcn_1x_coco" in args.model or "ppyolov2_r101vd_dcn_365e_coco" in args.model:
|
||||
model = fd.vision.detection.PPYOLO(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
elif "faster_rcnn" in args.model:
|
||||
model = fd.vision.detection.FasterRCNN(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
else:
|
||||
raise Exception("model {} not support now in ppdet series".format(
|
||||
args.model))
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
|
||||
im_ori = cv2.imread(args.image)
|
||||
if args.profile_mode == "runtime":
|
||||
result = model.predict(im_ori)
|
||||
profile_time = model.get_profile_time()
|
||||
dump_result["runtime"] = profile_time * 1000
|
||||
f.writelines("Runtime(ms): {} \n".format(
|
||||
str(dump_result["runtime"])))
|
||||
print("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
else:
|
||||
# end2end
|
||||
for i in range(args.warmup):
|
||||
result = model.predict(im_ori)
|
||||
|
||||
start = time.time()
|
||||
for i in tqdm(range(args.repeat)):
|
||||
result = model.predict(im_ori)
|
||||
end = time.time()
|
||||
dump_result["end2end"] = ((end - start) / args.repeat) * 1000.0
|
||||
f.writelines("End2End(ms): {} \n".format(
|
||||
str(dump_result["end2end"])))
|
||||
print("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
if enable_collect_memory_info:
|
||||
f.writelines("cpu_rss_mb: {} \n".format(
|
||||
str(dump_result["cpu_rss_mb"])))
|
||||
f.writelines("gpu_rss_mb: {} \n".format(
|
||||
str(dump_result["gpu_rss_mb"])))
|
||||
f.writelines("gpu_util: {} \n".format(
|
||||
str(dump_result["gpu_util"])))
|
||||
print("cpu_rss_mb: {} \n".format(str(dump_result["cpu_rss_mb"])))
|
||||
print("gpu_rss_mb: {} \n".format(str(dump_result["gpu_rss_mb"])))
|
||||
print("gpu_util: {} \n".format(str(dump_result["gpu_util"])))
|
||||
except Exception as e:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
raise e
|
||||
|
||||
f.close()
|
@@ -1,380 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import fastdeploy as fd
|
||||
import cv2
|
||||
import os
|
||||
import numpy as np
|
||||
import time
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model_dir", required=True, help="Model dir of PPOCR.")
|
||||
parser.add_argument(
|
||||
"--det_model", required=True, help="Path of Detection model of PPOCR.")
|
||||
parser.add_argument(
|
||||
"--cls_model",
|
||||
required=True,
|
||||
help="Path of Classification model of PPOCR.")
|
||||
parser.add_argument(
|
||||
"--rec_model",
|
||||
required=True,
|
||||
help="Path of Recognization model of PPOCR.")
|
||||
parser.add_argument(
|
||||
"--rec_label_file",
|
||||
required=True,
|
||||
help="Path of Recognization model of PPOCR.")
|
||||
parser.add_argument(
|
||||
"--image", type=str, required=False, help="Path of test image file.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_thread",
|
||||
type=int,
|
||||
default=8,
|
||||
help="default number of cpu thread.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--iter_num",
|
||||
required=True,
|
||||
type=int,
|
||||
default=300,
|
||||
help="number of iterations for computing performace.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default="cpu",
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="default",
|
||||
help="inference backend, default, ort, ov, trt, paddle, paddle_trt.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
device = args.device
|
||||
backend = args.backend
|
||||
enable_trt_fp16 = args.enable_trt_fp16
|
||||
option.set_cpu_thread_num(args.cpu_num_thread)
|
||||
if device == "gpu":
|
||||
option.use_gpu()
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend in ["trt", "paddle_trt"]:
|
||||
option.use_trt_backend()
|
||||
if backend == "paddle_trt":
|
||||
option.enable_paddle_trt_collect_shape()
|
||||
option.enable_paddle_to_trt()
|
||||
if enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with GPU, only support default/ort/paddle/trt/paddle_trt now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "cpu":
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/ov/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
else:
|
||||
raise Exception(
|
||||
"Only support device CPU/GPU now, {} is not supported.".format(
|
||||
device))
|
||||
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
args = parse_arguments()
|
||||
option = build_option(args)
|
||||
# Detection Model
|
||||
det_model_file = os.path.join(args.model_dir, args.det_model,
|
||||
"inference.pdmodel")
|
||||
det_params_file = os.path.join(args.model_dir, args.det_model,
|
||||
"inference.pdiparams")
|
||||
# Classification Model
|
||||
cls_model_file = os.path.join(args.model_dir, args.cls_model,
|
||||
"inference.pdmodel")
|
||||
cls_params_file = os.path.join(args.model_dir, args.cls_model,
|
||||
"inference.pdiparams")
|
||||
# Recognition Model
|
||||
rec_model_file = os.path.join(args.model_dir, args.rec_model,
|
||||
"inference.pdmodel")
|
||||
rec_params_file = os.path.join(args.model_dir, args.rec_model,
|
||||
"inference.pdiparams")
|
||||
rec_label_file = os.path.join(args.model_dir, args.rec_label_file)
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
end2end_statis = list()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model_dir + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_thread) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model_dir + "_model_" + args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model_dir + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
try:
|
||||
if "OCRv2" in args.model_dir:
|
||||
det_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
det_option.set_trt_input_shape(
|
||||
"x", [1, 3, 64, 64], [1, 3, 640, 640], [1, 3, 960, 960])
|
||||
det_model = fd.vision.ocr.DBDetector(
|
||||
det_model_file, det_params_file, runtime_option=det_option)
|
||||
cls_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
cls_option.set_trt_input_shape(
|
||||
"x", [1, 3, 48, 10], [10, 3, 48, 320], [64, 3, 48, 1024])
|
||||
cls_model = fd.vision.ocr.Classifier(
|
||||
cls_model_file, cls_params_file, runtime_option=cls_option)
|
||||
rec_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
rec_option.set_trt_input_shape(
|
||||
"x", [1, 3, 32, 10], [10, 3, 32, 320], [32, 3, 32, 2304])
|
||||
rec_model = fd.vision.ocr.Recognizer(
|
||||
rec_model_file,
|
||||
rec_params_file,
|
||||
rec_label_file,
|
||||
runtime_option=rec_option)
|
||||
model = fd.vision.ocr.PPOCRv2(
|
||||
det_model=det_model, cls_model=cls_model, rec_model=rec_model)
|
||||
elif "OCRv3" in args.model_dir:
|
||||
det_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
det_option.set_trt_input_shape(
|
||||
"x", [1, 3, 64, 64], [1, 3, 640, 640], [1, 3, 960, 960])
|
||||
det_model = fd.vision.ocr.DBDetector(
|
||||
det_model_file, det_params_file, runtime_option=det_option)
|
||||
cls_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
cls_option.set_trt_input_shape(
|
||||
"x", [1, 3, 48, 10], [10, 3, 48, 320], [64, 3, 48, 1024])
|
||||
cls_model = fd.vision.ocr.Classifier(
|
||||
cls_model_file, cls_params_file, runtime_option=cls_option)
|
||||
rec_option = option
|
||||
if args.backend in ["trt", "paddle_trt"]:
|
||||
rec_option.set_trt_input_shape(
|
||||
"x", [1, 3, 48, 10], [10, 3, 48, 320], [64, 3, 48, 2304])
|
||||
rec_model = fd.vision.ocr.Recognizer(
|
||||
rec_model_file,
|
||||
rec_params_file,
|
||||
rec_label_file,
|
||||
runtime_option=rec_option)
|
||||
model = fd.vision.ocr.PPOCRv3(
|
||||
det_model=det_model, cls_model=cls_model, rec_model=rec_model)
|
||||
else:
|
||||
raise Exception("model {} not support now in ppocr series".format(
|
||||
args.model_dir))
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
|
||||
det_model.enable_record_time_of_runtime()
|
||||
cls_model.enable_record_time_of_runtime()
|
||||
rec_model.enable_record_time_of_runtime()
|
||||
im_ori = cv2.imread(args.image)
|
||||
for i in range(args.iter_num):
|
||||
im = im_ori
|
||||
start = time.time()
|
||||
result = model.predict(im)
|
||||
end2end_statis.append(time.time() - start)
|
||||
|
||||
runtime_statis_det = det_model.print_statis_info_of_runtime()
|
||||
runtime_statis_cls = cls_model.print_statis_info_of_runtime()
|
||||
runtime_statis_rec = rec_model.print_statis_info_of_runtime()
|
||||
|
||||
warmup_iter = args.iter_num // 5
|
||||
end2end_statis_repeat = end2end_statis[warmup_iter:]
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
dump_result["runtime"] = (
|
||||
runtime_statis_det["avg_time"] + runtime_statis_cls["avg_time"] +
|
||||
runtime_statis_rec["avg_time"]) * 1000
|
||||
dump_result["end2end"] = np.mean(end2end_statis_repeat) * 1000
|
||||
|
||||
f.writelines("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
f.writelines("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
print("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
print("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
if enable_collect_memory_info:
|
||||
f.writelines("cpu_rss_mb: {} \n".format(
|
||||
str(dump_result["cpu_rss_mb"])))
|
||||
f.writelines("gpu_rss_mb: {} \n".format(
|
||||
str(dump_result["gpu_rss_mb"])))
|
||||
f.writelines("gpu_util: {} \n".format(
|
||||
str(dump_result["gpu_util"])))
|
||||
print("cpu_rss_mb: {} \n".format(str(dump_result["cpu_rss_mb"])))
|
||||
print("gpu_rss_mb: {} \n".format(str(dump_result["gpu_rss_mb"])))
|
||||
print("gpu_util: {} \n".format(str(dump_result["gpu_util"])))
|
||||
except:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
|
||||
f.close()
|
@@ -1,317 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import fastdeploy as fd
|
||||
import cv2
|
||||
import os
|
||||
import numpy as np
|
||||
import time
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model", required=True, help="Path of PaddleSeg model.")
|
||||
parser.add_argument(
|
||||
"--image", type=str, required=False, help="Path of test image file.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_thread",
|
||||
type=int,
|
||||
default=8,
|
||||
help="default number of cpu thread.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--iter_num",
|
||||
required=True,
|
||||
type=int,
|
||||
default=300,
|
||||
help="number of iterations for computing performace.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default="cpu",
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="default",
|
||||
help="inference backend, default, ort, ov, trt, paddle, paddle_trt.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
device = args.device
|
||||
backend = args.backend
|
||||
enable_trt_fp16 = args.enable_trt_fp16
|
||||
option.set_cpu_thread_num(args.cpu_num_thread)
|
||||
if device == "gpu":
|
||||
option.use_gpu()
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
option.set_openvino_device(name="GPU") # use gpu
|
||||
# change name and shape for models
|
||||
option.set_openvino_shape_info({"x": [1, 3, 512, 512]})
|
||||
elif backend in ["trt", "paddle_trt"]:
|
||||
option.use_trt_backend()
|
||||
if "Deeplabv3_ResNet101" in args.model or "FCN_HRNet_W18" in args.model or "Unet_cityscapes" in args.model or "PP_LiteSeg_B_STDC2_cityscapes" in args.model:
|
||||
option.set_trt_input_shape("x", [1, 3, 1024, 2048],
|
||||
[1, 3, 1024,
|
||||
2048], [1, 3, 1024, 2048])
|
||||
elif "Portrait_PP_HumanSegV2_Lite_256x144" in args.model:
|
||||
option.set_trt_input_shape("x", [1, 3, 144, 256],
|
||||
[1, 3, 144, 256], [1, 3, 144, 256])
|
||||
elif "PP_HumanSegV1_Server" in args.model:
|
||||
option.set_trt_input_shape("x", [1, 3, 512, 512],
|
||||
[1, 3, 512, 512], [1, 3, 512, 512])
|
||||
else:
|
||||
option.set_trt_input_shape("x", [1, 3, 192, 192],
|
||||
[1, 3, 192, 192], [1, 3, 192, 192])
|
||||
if backend == "paddle_trt":
|
||||
option.enable_paddle_trt_collect_shape()
|
||||
option.enable_paddle_to_trt()
|
||||
if enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with GPU, only support default/ort/paddle/trt/paddle_trt now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "cpu":
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/ov/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
else:
|
||||
raise Exception(
|
||||
"Only support device CPU/GPU now, {} is not supported.".format(
|
||||
device))
|
||||
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
args = parse_arguments()
|
||||
option = build_option(args)
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
config_file = os.path.join(args.model, "deploy.yaml")
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
end2end_statis = list()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_thread) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model + "_model_" + \
|
||||
args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
try:
|
||||
model = fd.vision.segmentation.PaddleSegModel(
|
||||
model_file, params_file, config_file, runtime_option=option)
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
|
||||
model.enable_record_time_of_runtime()
|
||||
im_ori = cv2.imread(args.image)
|
||||
for i in range(args.iter_num):
|
||||
im = im_ori
|
||||
start = time.time()
|
||||
result = model.predict(im)
|
||||
end2end_statis.append(time.time() - start)
|
||||
|
||||
runtime_statis = model.print_statis_info_of_runtime()
|
||||
|
||||
warmup_iter = args.iter_num // 5
|
||||
end2end_statis_repeat = end2end_statis[warmup_iter:]
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
dump_result["runtime"] = runtime_statis["avg_time"] * 1000
|
||||
dump_result["end2end"] = np.mean(end2end_statis_repeat) * 1000
|
||||
|
||||
f.writelines("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
f.writelines("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
print("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
print("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
if enable_collect_memory_info:
|
||||
f.writelines("cpu_rss_mb: {} \n".format(
|
||||
str(dump_result["cpu_rss_mb"])))
|
||||
f.writelines("gpu_rss_mb: {} \n".format(
|
||||
str(dump_result["gpu_rss_mb"])))
|
||||
f.writelines("gpu_util: {} \n".format(
|
||||
str(dump_result["gpu_util"])))
|
||||
print("cpu_rss_mb: {} \n".format(str(dump_result["cpu_rss_mb"])))
|
||||
print("gpu_rss_mb: {} \n".format(str(dump_result["gpu_rss_mb"])))
|
||||
print("gpu_util: {} \n".format(str(dump_result["gpu_util"])))
|
||||
except:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
|
||||
f.close()
|
@@ -1,321 +0,0 @@
|
||||
import numpy as np
|
||||
import os
|
||||
import time
|
||||
import distutils.util
|
||||
import sys
|
||||
import json
|
||||
|
||||
import fastdeploy as fd
|
||||
from fastdeploy.text import UIEModel, SchemaLanguage
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model_dir",
|
||||
required=True,
|
||||
help="The directory of model and tokenizer.")
|
||||
parser.add_argument(
|
||||
"--data_path", required=True, help="The path of uie data.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
type=str,
|
||||
default='cpu',
|
||||
choices=['gpu', 'cpu'],
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default='paddle',
|
||||
choices=['ort', 'paddle', 'trt', 'paddle_trt', 'ov'],
|
||||
help="The inference runtime backend.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--batch_size", type=int, default=1, help="The batch size of data.")
|
||||
parser.add_argument(
|
||||
"--max_length",
|
||||
type=int,
|
||||
default=128,
|
||||
help="The max length of sequence.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_threads",
|
||||
type=int,
|
||||
default=8,
|
||||
help="The number of threads when inferring on cpu.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=distutils.util.strtobool,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--epoch", type=int, default=1, help="The epoch of test")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
if args.device == 'cpu':
|
||||
option.use_cpu()
|
||||
option.set_cpu_thread_num(args.cpu_num_threads)
|
||||
else:
|
||||
option.use_gpu(args.device_id)
|
||||
if args.backend == 'paddle':
|
||||
option.use_paddle_backend()
|
||||
elif args.backend == 'ort':
|
||||
option.use_ort_backend()
|
||||
elif args.backend == 'ov':
|
||||
option.use_openvino_backend()
|
||||
else:
|
||||
option.use_trt_backend()
|
||||
if args.backend == 'paddle_trt':
|
||||
option.enable_paddle_to_trt()
|
||||
option.enable_paddle_trt_collect_shape()
|
||||
trt_file = os.path.join(args.model_dir, "infer.trt")
|
||||
option.set_trt_input_shape(
|
||||
'input_ids',
|
||||
min_shape=[1, 1],
|
||||
opt_shape=[args.batch_size, args.max_length // 2],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
option.set_trt_input_shape(
|
||||
'token_type_ids',
|
||||
min_shape=[1, 1],
|
||||
opt_shape=[args.batch_size, args.max_length // 2],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
option.set_trt_input_shape(
|
||||
'pos_ids',
|
||||
min_shape=[1, 1],
|
||||
opt_shape=[args.batch_size, args.max_length // 2],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
option.set_trt_input_shape(
|
||||
'att_mask',
|
||||
min_shape=[1, 1],
|
||||
opt_shape=[args.batch_size, args.max_length // 2],
|
||||
max_shape=[args.batch_size, args.max_length])
|
||||
if args.enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
trt_file = trt_file + ".fp16"
|
||||
option.set_trt_cache_file(trt_file)
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
def get_dataset(data_path, max_seq_len=512):
|
||||
json_lines = []
|
||||
with open(data_path, 'r', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
json_line = json.loads(line)
|
||||
content = json_line['content'].strip()
|
||||
prompt = json_line['prompt']
|
||||
# Model Input is aslike: [CLS] Prompt [SEP] Content [SEP]
|
||||
# It include three summary tokens.
|
||||
if max_seq_len <= len(prompt) + 3:
|
||||
raise ValueError(
|
||||
"The value of max_seq_len is too small, please set a larger value"
|
||||
)
|
||||
json_lines.append(json_line)
|
||||
|
||||
return json_lines
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
args = parse_arguments()
|
||||
runtime_option = build_option(args)
|
||||
model_path = os.path.join(args.model_dir, "inference.pdmodel")
|
||||
param_path = os.path.join(args.model_dir, "inference.pdiparams")
|
||||
vocab_path = os.path.join(args.model_dir, "vocab.txt")
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
end2end_statis = list()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model_dir + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_threads) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model_dir + "_model_" + \
|
||||
args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model_dir + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
ds = get_dataset(args.data_path)
|
||||
schema = ["时间"]
|
||||
uie = UIEModel(
|
||||
model_path,
|
||||
param_path,
|
||||
vocab_path,
|
||||
position_prob=0.5,
|
||||
max_length=args.max_length,
|
||||
batch_size=args.batch_size,
|
||||
schema=schema,
|
||||
runtime_option=runtime_option,
|
||||
schema_language=SchemaLanguage.ZH)
|
||||
|
||||
try:
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
uie.enable_record_time_of_runtime()
|
||||
|
||||
for ep in range(args.epoch):
|
||||
for i, sample in enumerate(ds):
|
||||
curr_start = time.time()
|
||||
uie.set_schema([sample['prompt']])
|
||||
result = uie.predict([sample['content']])
|
||||
end2end_statis.append(time.time() - curr_start)
|
||||
runtime_statis = uie.print_statis_info_of_runtime()
|
||||
|
||||
warmup_iter = args.epoch * len(ds) // 5
|
||||
|
||||
end2end_statis_repeat = end2end_statis[warmup_iter:]
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
dump_result["runtime"] = runtime_statis["avg_time"] * 1000
|
||||
dump_result["end2end"] = np.mean(end2end_statis_repeat) * 1000
|
||||
|
||||
time_cost_str = f"Runtime(ms): {dump_result['runtime']}\n" \
|
||||
f"End2End(ms): {dump_result['end2end']}\n"
|
||||
f.writelines(time_cost_str)
|
||||
print(time_cost_str)
|
||||
|
||||
if enable_collect_memory_info:
|
||||
mem_info_str = f"cpu_rss_mb: {dump_result['cpu_rss_mb']}\n" \
|
||||
f"gpu_rss_mb: {dump_result['gpu_rss_mb']}\n" \
|
||||
f"gpu_util: {dump_result['gpu_util']}\n"
|
||||
f.writelines(mem_info_str)
|
||||
print(mem_info_str)
|
||||
except:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
|
||||
f.close()
|
@@ -1,351 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import fastdeploy as fd
|
||||
import cv2
|
||||
import os
|
||||
import numpy as np
|
||||
import time
|
||||
|
||||
from fastdeploy import ModelFormat
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
import argparse
|
||||
import ast
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--model", required=True, help="Path of Yolo onnx model.")
|
||||
parser.add_argument(
|
||||
"--image", type=str, required=False, help="Path of test image file.")
|
||||
parser.add_argument(
|
||||
"--cpu_num_thread",
|
||||
type=int,
|
||||
default=8,
|
||||
help="default number of cpu thread.")
|
||||
parser.add_argument(
|
||||
"--device_id", type=int, default=0, help="device(gpu) id")
|
||||
parser.add_argument(
|
||||
"--iter_num",
|
||||
required=True,
|
||||
type=int,
|
||||
default=300,
|
||||
help="number of iterations for computing performace.")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default="cpu",
|
||||
help="Type of inference device, support 'cpu' or 'gpu'.")
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="default",
|
||||
help="inference backend, default, ort, ov, trt, paddle, paddle_trt.")
|
||||
parser.add_argument(
|
||||
"--enable_trt_fp16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable fp16 in trt backend")
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def build_option(args):
|
||||
option = fd.RuntimeOption()
|
||||
device = args.device
|
||||
backend = args.backend
|
||||
enable_trt_fp16 = args.enable_trt_fp16
|
||||
option.set_cpu_thread_num(args.cpu_num_thread)
|
||||
if device == "gpu":
|
||||
option.use_gpu()
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
option.set_openvino_device(name="GPU")
|
||||
# change name and shape for models
|
||||
option.set_openvino_shape_info({"images": [1, 3, 640, 640]})
|
||||
elif backend in ["trt", "paddle_trt"]:
|
||||
option.use_trt_backend()
|
||||
if backend == "paddle_trt":
|
||||
option.enable_paddle_to_trt()
|
||||
if enable_trt_fp16:
|
||||
option.enable_trt_fp16()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with GPU, only support default/ort/paddle/trt/paddle_trt now, {} is not supported.".
|
||||
format(backend))
|
||||
elif device == "cpu":
|
||||
if backend == "ort":
|
||||
option.use_ort_backend()
|
||||
elif backend == "ov":
|
||||
option.use_openvino_backend()
|
||||
elif backend == "paddle":
|
||||
option.use_paddle_backend()
|
||||
elif backend == "default":
|
||||
return option
|
||||
else:
|
||||
raise Exception(
|
||||
"While inference with CPU, only support default/ort/ov/paddle now, {} is not supported.".
|
||||
format(backend))
|
||||
else:
|
||||
raise Exception(
|
||||
"Only support device CPU/GPU now, {} is not supported.".format(
|
||||
device))
|
||||
|
||||
return option
|
||||
|
||||
|
||||
class StatBase(object):
|
||||
"""StatBase"""
|
||||
nvidia_smi_path = "nvidia-smi"
|
||||
gpu_keys = ('index', 'uuid', 'name', 'timestamp', 'memory.total',
|
||||
'memory.free', 'memory.used', 'utilization.gpu',
|
||||
'utilization.memory')
|
||||
nu_opt = ',nounits'
|
||||
cpu_keys = ('cpu.util', 'memory.util', 'memory.used')
|
||||
|
||||
|
||||
class Monitor(StatBase):
|
||||
"""Monitor"""
|
||||
|
||||
def __init__(self, use_gpu=False, gpu_id=0, interval=0.1):
|
||||
self.result = {}
|
||||
self.gpu_id = gpu_id
|
||||
self.use_gpu = use_gpu
|
||||
self.interval = interval
|
||||
self.cpu_stat_q = multiprocessing.Queue()
|
||||
|
||||
def start(self):
|
||||
cmd = '%s --id=%s --query-gpu=%s --format=csv,noheader%s -lms 50' % (
|
||||
StatBase.nvidia_smi_path, self.gpu_id, ','.join(StatBase.gpu_keys),
|
||||
StatBase.nu_opt)
|
||||
if self.use_gpu:
|
||||
self.gpu_stat_worker = subprocess.Popen(
|
||||
cmd,
|
||||
stderr=subprocess.STDOUT,
|
||||
stdout=subprocess.PIPE,
|
||||
shell=True,
|
||||
close_fds=True,
|
||||
preexec_fn=os.setsid)
|
||||
# cpu stat
|
||||
pid = os.getpid()
|
||||
self.cpu_stat_worker = multiprocessing.Process(
|
||||
target=self.cpu_stat_func,
|
||||
args=(self.cpu_stat_q, pid, self.interval))
|
||||
self.cpu_stat_worker.start()
|
||||
|
||||
def stop(self):
|
||||
try:
|
||||
if self.use_gpu:
|
||||
os.killpg(self.gpu_stat_worker.pid, signal.SIGUSR1)
|
||||
# os.killpg(p.pid, signal.SIGTERM)
|
||||
self.cpu_stat_worker.terminate()
|
||||
self.cpu_stat_worker.join(timeout=0.01)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return
|
||||
|
||||
# gpu
|
||||
if self.use_gpu:
|
||||
lines = self.gpu_stat_worker.stdout.readlines()
|
||||
lines = [
|
||||
line.strip().decode("utf-8") for line in lines
|
||||
if line.strip() != ''
|
||||
]
|
||||
gpu_info_list = [{
|
||||
k: v
|
||||
for k, v in zip(StatBase.gpu_keys, line.split(', '))
|
||||
} for line in lines]
|
||||
if len(gpu_info_list) == 0:
|
||||
return
|
||||
result = gpu_info_list[0]
|
||||
for item in gpu_info_list:
|
||||
for k in item.keys():
|
||||
if k not in ["name", "uuid", "timestamp"]:
|
||||
result[k] = max(int(result[k]), int(item[k]))
|
||||
else:
|
||||
result[k] = max(result[k], item[k])
|
||||
self.result['gpu'] = result
|
||||
|
||||
# cpu
|
||||
cpu_result = {}
|
||||
if self.cpu_stat_q.qsize() > 0:
|
||||
cpu_result = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
while not self.cpu_stat_q.empty():
|
||||
item = {
|
||||
k: v
|
||||
for k, v in zip(StatBase.cpu_keys, self.cpu_stat_q.get())
|
||||
}
|
||||
for k in StatBase.cpu_keys:
|
||||
cpu_result[k] = max(cpu_result[k], item[k])
|
||||
cpu_result['name'] = cpuinfo.get_cpu_info()['brand_raw']
|
||||
self.result['cpu'] = cpu_result
|
||||
|
||||
def output(self):
|
||||
return self.result
|
||||
|
||||
def cpu_stat_func(self, q, pid, interval=0.0):
|
||||
"""cpu stat function"""
|
||||
stat_info = psutil.Process(pid)
|
||||
while True:
|
||||
# pid = os.getpid()
|
||||
cpu_util, mem_util, mem_use = stat_info.cpu_percent(
|
||||
), stat_info.memory_percent(), round(stat_info.memory_info().rss /
|
||||
1024.0 / 1024.0, 4)
|
||||
q.put([cpu_util, mem_util, mem_use])
|
||||
time.sleep(interval)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
args = parse_arguments()
|
||||
option = build_option(args)
|
||||
model_file = args.model
|
||||
|
||||
gpu_id = args.device_id
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
dump_result = dict()
|
||||
end2end_statis = list()
|
||||
cpu_mem = list()
|
||||
gpu_mem = list()
|
||||
gpu_util = list()
|
||||
if args.device == "cpu":
|
||||
file_path = args.model + "_model_" + args.backend + "_" + \
|
||||
args.device + "_" + str(args.cpu_num_thread) + ".txt"
|
||||
else:
|
||||
if args.enable_trt_fp16:
|
||||
file_path = args.model + "_model_" + args.backend + "_fp16_" + args.device + ".txt"
|
||||
else:
|
||||
file_path = args.model + "_model_" + args.backend + "_" + args.device + ".txt"
|
||||
f = open(file_path, "w")
|
||||
f.writelines("===={}====: \n".format(os.path.split(file_path)[-1][:-4]))
|
||||
|
||||
try:
|
||||
if "yolox" in model_file:
|
||||
if ".onnx" in model_file:
|
||||
model = fd.vision.detection.YOLOX(
|
||||
model_file, runtime_option=option)
|
||||
else:
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
model = fd.vision.detection.YOLOX(
|
||||
model_file,
|
||||
params_file,
|
||||
runtime_option=option,
|
||||
model_format=ModelFormat.PADDLE)
|
||||
elif "yolov5" in model_file:
|
||||
if ".onnx" in model_file:
|
||||
model = fd.vision.detection.YOLOv5(
|
||||
model_file, runtime_option=option)
|
||||
else:
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
model = fd.vision.detection.YOLOv5(
|
||||
model_file,
|
||||
params_file,
|
||||
runtime_option=option,
|
||||
model_format=ModelFormat.PADDLE)
|
||||
elif "yolov6" in model_file:
|
||||
if ".onnx" in model_file:
|
||||
model = fd.vision.detection.YOLOv6(
|
||||
model_file, runtime_option=option)
|
||||
else:
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
model = fd.vision.detection.YOLOv6(
|
||||
model_file,
|
||||
params_file,
|
||||
runtime_option=option,
|
||||
model_format=ModelFormat.PADDLE)
|
||||
elif "yolov7" in model_file:
|
||||
if ".onnx" in model_file:
|
||||
model = fd.vision.detection.YOLOv7(
|
||||
model_file, runtime_option=option)
|
||||
else:
|
||||
model_file = os.path.join(args.model, "model.pdmodel")
|
||||
params_file = os.path.join(args.model, "model.pdiparams")
|
||||
model = fd.vision.detection.YOLOv7(
|
||||
model_file,
|
||||
params_file,
|
||||
runtime_option=option,
|
||||
model_format=ModelFormat.PADDLE)
|
||||
else:
|
||||
raise Exception("model {} not support now in yolo series".format(
|
||||
args.model))
|
||||
if enable_collect_memory_info:
|
||||
import multiprocessing
|
||||
import subprocess
|
||||
import psutil
|
||||
import signal
|
||||
import cpuinfo
|
||||
enable_gpu = args.device == "gpu"
|
||||
monitor = Monitor(enable_gpu, gpu_id)
|
||||
monitor.start()
|
||||
|
||||
model.enable_record_time_of_runtime()
|
||||
im_ori = cv2.imread(args.image)
|
||||
for i in range(args.iter_num):
|
||||
im = im_ori
|
||||
start = time.time()
|
||||
result = model.predict(im)
|
||||
end2end_statis.append(time.time() - start)
|
||||
|
||||
runtime_statis = model.print_statis_info_of_runtime()
|
||||
|
||||
warmup_iter = args.iter_num // 5
|
||||
end2end_statis_repeat = end2end_statis[warmup_iter:]
|
||||
if enable_collect_memory_info:
|
||||
monitor.stop()
|
||||
mem_info = monitor.output()
|
||||
dump_result["cpu_rss_mb"] = mem_info['cpu'][
|
||||
'memory.used'] if 'cpu' in mem_info else 0
|
||||
dump_result["gpu_rss_mb"] = mem_info['gpu'][
|
||||
'memory.used'] if 'gpu' in mem_info else 0
|
||||
dump_result["gpu_util"] = mem_info['gpu'][
|
||||
'utilization.gpu'] if 'gpu' in mem_info else 0
|
||||
|
||||
dump_result["runtime"] = runtime_statis["avg_time"] * 1000
|
||||
dump_result["end2end"] = np.mean(end2end_statis_repeat) * 1000
|
||||
|
||||
f.writelines("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
f.writelines("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
print("Runtime(ms): {} \n".format(str(dump_result["runtime"])))
|
||||
print("End2End(ms): {} \n".format(str(dump_result["end2end"])))
|
||||
if enable_collect_memory_info:
|
||||
f.writelines("cpu_rss_mb: {} \n".format(
|
||||
str(dump_result["cpu_rss_mb"])))
|
||||
f.writelines("gpu_rss_mb: {} \n".format(
|
||||
str(dump_result["gpu_rss_mb"])))
|
||||
f.writelines("gpu_util: {} \n".format(
|
||||
str(dump_result["gpu_util"])))
|
||||
print("cpu_rss_mb: {} \n".format(str(dump_result["cpu_rss_mb"])))
|
||||
print("gpu_rss_mb: {} \n".format(str(dump_result["gpu_rss_mb"])))
|
||||
print("gpu_util: {} \n".format(str(dump_result["gpu_util"])))
|
||||
except:
|
||||
f.writelines("!!!!!Infer Failed\n")
|
||||
|
||||
f.close()
|
@@ -1,179 +0,0 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description='manual to this script')
|
||||
parser.add_argument('--txt_path', type=str, default='result.txt')
|
||||
parser.add_argument('--domain', type=str, default='ppcls')
|
||||
parser.add_argument(
|
||||
"--enable_collect_memory_info",
|
||||
type=bool,
|
||||
default=False,
|
||||
help="whether enable collect memory info")
|
||||
args = parser.parse_args()
|
||||
txt_path = args.txt_path
|
||||
domain = args.domain
|
||||
enable_collect_memory_info = args.enable_collect_memory_info
|
||||
|
||||
f1 = open(txt_path, "r")
|
||||
lines = f1.readlines()
|
||||
line_nums = len(lines)
|
||||
ort_cpu_thread1 = dict()
|
||||
ort_cpu_thread8 = dict()
|
||||
ort_gpu = dict()
|
||||
ov_cpu_thread1 = dict()
|
||||
ov_cpu_thread8 = dict()
|
||||
paddle_cpu_thread1 = dict()
|
||||
paddle_cpu_thread8 = dict()
|
||||
paddle_gpu = dict()
|
||||
paddle_trt_gpu = dict()
|
||||
paddle_trt_gpu_fp16 = dict()
|
||||
trt_gpu = dict()
|
||||
trt_gpu_fp16 = dict()
|
||||
model_name_set = set()
|
||||
|
||||
for i in range(line_nums):
|
||||
if "====" in lines[i]:
|
||||
model_name = lines[i].strip().split("_model")[0][4:]
|
||||
model_name_set.add(model_name)
|
||||
runtime = "-"
|
||||
end2end = "-"
|
||||
cpu_rss_mb = "-"
|
||||
gpu_rss_mb = "-"
|
||||
if "Runtime(ms)" in lines[i + 1]:
|
||||
runtime_ori = lines[i + 1].split(": ")[1]
|
||||
# two decimal places
|
||||
runtime_list = runtime_ori.split(".")
|
||||
runtime = runtime_list[0] + "." + runtime_list[1][:2]
|
||||
if "End2End(ms)" in lines[i + 2]:
|
||||
end2end_ori = lines[i + 2].split(": ")[1]
|
||||
# two decimal places
|
||||
end2end_list = end2end_ori.split(".")
|
||||
end2end = end2end_list[0] + "." + end2end_list[1][:2]
|
||||
if enable_collect_memory_info:
|
||||
if "cpu_rss_mb" in lines[i + 3]:
|
||||
cpu_rss_mb_ori = lines[i + 3].split(": ")[1]
|
||||
# two decimal places
|
||||
cpu_rss_mb_list = cpu_rss_mb_ori.split(".")
|
||||
cpu_rss_mb = cpu_rss_mb_list[0] + "." + cpu_rss_mb_list[1][:2]
|
||||
if "gpu_rss_mb" in lines[i + 4]:
|
||||
gpu_rss_mb_ori = lines[i + 4].split(": ")[1].strip()
|
||||
gpu_rss_mb = str(gpu_rss_mb_ori) + ".0"
|
||||
if "ort_cpu_1" in lines[i]:
|
||||
ort_cpu_thread1[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "ort_cpu_8" in lines[i]:
|
||||
ort_cpu_thread8[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "ort_gpu" in lines[i]:
|
||||
ort_gpu[model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
elif "ov_cpu_1" in lines[i]:
|
||||
ov_cpu_thread1[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "ov_cpu_8" in lines[i]:
|
||||
ov_cpu_thread8[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "paddle_cpu_1" in lines[i]:
|
||||
paddle_cpu_thread1[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "paddle_cpu_8" in lines[i]:
|
||||
paddle_cpu_thread8[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + cpu_rss_mb
|
||||
elif "paddle_gpu" in lines[i]:
|
||||
paddle_gpu[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
elif "paddle_trt_gpu" in lines[i]:
|
||||
paddle_trt_gpu[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
elif "paddle_trt_fp16_gpu" in lines[i]:
|
||||
paddle_trt_gpu_fp16[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
elif "trt_gpu" in lines[i]:
|
||||
trt_gpu[model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
elif "trt_fp16_gpu" in lines[i]:
|
||||
trt_gpu_fp16[
|
||||
model_name] = runtime + "\t" + end2end + "\t" + gpu_rss_mb
|
||||
|
||||
f2 = open("struct_cpu_" + domain + ".txt", "w")
|
||||
f2.writelines(
|
||||
"model_name\tthread_nums\tort_run\tort_end2end\tcpu_mem\tov_run\tov_end2end\tcpu_mem\tpaddle_run\tpaddle_end2end\tcpu_mem\n"
|
||||
)
|
||||
for model_name in model_name_set:
|
||||
lines1 = model_name + '\t1\t'
|
||||
lines2 = model_name + '\t8\t'
|
||||
if model_name in ort_cpu_thread1 and ort_cpu_thread1[model_name] != "":
|
||||
lines1 += ort_cpu_thread1[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in ov_cpu_thread1 and ov_cpu_thread1[model_name] != "":
|
||||
lines1 += ov_cpu_thread1[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in paddle_cpu_thread1 and paddle_cpu_thread1[
|
||||
model_name] != "":
|
||||
lines1 += paddle_cpu_thread1[model_name] + '\n'
|
||||
else:
|
||||
lines1 += "-\t-\t-\n"
|
||||
f2.writelines(lines1)
|
||||
if model_name in ort_cpu_thread8 and ort_cpu_thread8[model_name] != "":
|
||||
lines2 += ort_cpu_thread8[model_name] + '\t'
|
||||
else:
|
||||
lines2 += "-\t-\t-\t"
|
||||
if model_name in ov_cpu_thread8 and ov_cpu_thread8[model_name] != "":
|
||||
lines2 += ov_cpu_thread8[model_name] + '\t'
|
||||
else:
|
||||
lines2 += "-\t-\t-\t"
|
||||
if model_name in paddle_cpu_thread8 and paddle_cpu_thread8[
|
||||
model_name] != "":
|
||||
lines2 += paddle_cpu_thread8[model_name] + '\n'
|
||||
else:
|
||||
lines2 += "-\t-\t-\n"
|
||||
f2.writelines(lines2)
|
||||
f2.close()
|
||||
|
||||
f3 = open("struct_gpu_" + domain + ".txt", "w")
|
||||
f3.writelines(
|
||||
"model_name\tort_run\tort_end2end\tgpu_mem\tpaddle_run\tpaddle_end2end\tgpu_mem\tpaddle_trt_run\tpaddle_trt_end2end\tgpu_mem\tpaddle_trt_fp16_run\tpaddle_trt_fp16_end2end\tgpu_mem\ttrt_run\ttrt_end2end\tgpu_mem\ttrt_fp16_run\ttrt_fp16_end2end\tgpu_mem\n"
|
||||
)
|
||||
for model_name in model_name_set:
|
||||
lines1 = model_name + '\t'
|
||||
if model_name in ort_gpu and ort_gpu[model_name] != "":
|
||||
lines1 += ort_gpu[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in paddle_gpu and paddle_gpu[model_name] != "":
|
||||
lines1 += paddle_gpu[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in paddle_trt_gpu and paddle_trt_gpu[model_name] != "":
|
||||
lines1 += paddle_trt_gpu[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in paddle_trt_gpu_fp16 and paddle_trt_gpu_fp16[
|
||||
model_name] != "":
|
||||
lines1 += paddle_trt_gpu_fp16[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in trt_gpu and trt_gpu[model_name] != "":
|
||||
lines1 += trt_gpu[model_name] + '\t'
|
||||
else:
|
||||
lines1 += "-\t-\t-\t"
|
||||
if model_name in trt_gpu_fp16 and trt_gpu_fp16[model_name] != "":
|
||||
lines1 += trt_gpu_fp16[model_name] + '\n'
|
||||
else:
|
||||
lines1 += "-\t-\t-\n"
|
||||
f3.writelines(lines1)
|
||||
f3.close()
|
@@ -1 +0,0 @@
|
||||
numpy
|
@@ -1,25 +0,0 @@
|
||||
# Download and decompress the ERNIE 3.0 Medium model finetuned on AFQMC
|
||||
# wget https://bj.bcebos.com/fastdeploy/models/ernie-3.0/ernie-3.0-medium-zh-afqmc.tgz
|
||||
# tar xvfz ernie-3.0-medium-zh-afqmc.tgz
|
||||
|
||||
# Download and decompress the quantization model of ERNIE 3.0 Medium model
|
||||
# wget https://bj.bcebos.com/fastdeploy/models/ernie-3.0/ernie-3.0-medium-zh-afqmc-new-quant.tgz
|
||||
# tar xvfz ernie-3.0-medium-zh-afqmc-new-quant.tgz
|
||||
|
||||
# PP-TRT
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc --backend pp-trt
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc --backend pp-trt --use_fp16 True
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc-new-quant --backend pp-trt
|
||||
|
||||
# TRT
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc --backend trt
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc --backend trt --use_fp16 True
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --model_dir ernie-3.0-medium-zh-afqmc-new-quant --backend trt --use_fp16 True
|
||||
|
||||
# CPU PP
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --cpu_num_threads 10 --model_dir ernie-3.0-medium-zh-afqmc --backend pp --device cpu
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --cpu_num_threads 10 --model_dir ernie-3.0-medium-zh-afqmc-new-quant --backend pp --device cpu
|
||||
|
||||
# CPU ORT
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --cpu_num_threads 10 --model_dir ernie-3.0-medium-zh-afqmc --backend ort --device cpu
|
||||
python benchmark_ernie_seq_cls.py --batch_size 40 --cpu_num_threads 10 --model_dir ernie-3.0-medium-zh-afqmc-new-quant --backend ort --device cpu
|
@@ -1,35 +0,0 @@
|
||||
echo "[FastDeploy] Running PPcls benchmark..."
|
||||
|
||||
num_of_models=$(ls -d ppcls_model/* | wc -l)
|
||||
|
||||
counter=1
|
||||
for model in $(ls -d ppcls_model/* )
|
||||
do
|
||||
echo "[Benchmark-PPcls] ${counter}/${num_of_models} $model ..."
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
counter=$(($counter+1))
|
||||
step=$(( $counter % 1 ))
|
||||
if [ $step = 0 ]
|
||||
then
|
||||
wait
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
rm -rf result_ppcls.txt
|
||||
touch result_ppcls.txt
|
||||
cat ppcls_model/*.txt >> ./result_ppcls.txt
|
||||
|
||||
python convert_info.py --txt_path result_ppcls.txt --domain ppcls --enable_collect_memory_info True
|
@@ -1,35 +0,0 @@
|
||||
echo "[FastDeploy] Running PPdet benchmark..."
|
||||
|
||||
num_of_models=$(ls -d ppdet_model/* | wc -l)
|
||||
|
||||
counter=1
|
||||
for model in $(ls -d ppdet_model/* )
|
||||
do
|
||||
echo "[Benchmark-PPdet] ${counter}/${num_of_models} $model ..."
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 1 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 1 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 1 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_ppdet.py --model $model --image 000000014439.jpg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
counter=$(($counter+1))
|
||||
step=$(( $counter % 1 ))
|
||||
if [ $step = 0 ]
|
||||
then
|
||||
wait
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
rm -rf result_ppdet.txt
|
||||
touch result_ppdet.txt
|
||||
cat ppdet_model/*.txt >> ./result_ppdet.txt
|
||||
|
||||
python convert_info.py --txt_path result_ppdet.txt --domain ppdet --enable_collect_memory_info True
|
@@ -1,23 +0,0 @@
|
||||
echo "[FastDeploy] Running PPOCR benchmark..."
|
||||
|
||||
# for PPOCRv2
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv2 --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv2_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
|
||||
# for PPOCRv3
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --cpu_num_thread 8 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_ppocr.py --model_dir ch_PP-OCRv3 --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2.0_cls_infer --rec_model ch_PP-OCRv3_rec_infer --rec_label_file ppocr_keys_v1.txt --image 12.jpg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
@@ -1,35 +0,0 @@
|
||||
echo "[FastDeploy] Running PPseg benchmark..."
|
||||
|
||||
num_of_models=$(ls -d ppseg_model/* | wc -l)
|
||||
|
||||
counter=1
|
||||
for model in $(ls -d ppseg_model/* )
|
||||
do
|
||||
echo "[Benchmark-PPseg] ${counter}/${num_of_models} $model ..."
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 1 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --cpu_num_thread 8 --iter_num 2000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_ppseg.py --model $model --image ILSVRC2012_val_00000010.jpeg --device gpu --iter_num 2000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
counter=$(($counter+1))
|
||||
step=$(( $counter % 1 ))
|
||||
if [ $step = 0 ]
|
||||
then
|
||||
wait
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
rm -rf result_ppseg.txt
|
||||
touch result_ppseg.txt
|
||||
cat ppseg_model/*.txt >> ./result_ppseg.txt
|
||||
|
||||
python convert_info.py --txt_path result_ppseg.txt --domain ppseg --enable_collect_memory_info True
|
@@ -1,27 +0,0 @@
|
||||
# wget https://bj.bcebos.com/fastdeploy/benchmark/uie/reimbursement_form_data.txt
|
||||
# wget https://bj.bcebos.com/fastdeploy/models/uie/uie-base.tgz
|
||||
# tar xvfz uie-base.tgz
|
||||
|
||||
DEVICE_ID=0
|
||||
|
||||
echo "[FastDeploy] Running UIE benchmark..."
|
||||
|
||||
# GPU
|
||||
echo "-------------------------------GPU Benchmark---------------------------------------"
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend paddle --device_id $DEVICE_ID --device gpu --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend ort --device_id $DEVICE_ID --device gpu --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend paddle_trt --device_id $DEVICE_ID --device gpu --enable_trt_fp16 False --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend trt --device_id $DEVICE_ID --device gpu --enable_trt_fp16 False --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend paddle_trt --device_id $DEVICE_ID --device gpu --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend trt --device_id $DEVICE_ID --device gpu --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
echo "-----------------------------------------------------------------------------------"
|
||||
|
||||
# CPU
|
||||
echo "-------------------------------CPU Benchmark---------------------------------------"
|
||||
for cpu_num_threads in 1 8;
|
||||
do
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend paddle --device cpu --cpu_num_threads ${cpu_num_threads} --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend ort --device cpu --cpu_num_threads ${cpu_num_threads} --enable_collect_memory_info True
|
||||
python benchmark_uie.py --epoch 5 --model_dir uie-base --data_path reimbursement_form_data.txt --backend ov --device cpu --cpu_num_threads ${cpu_num_threads} --enable_collect_memory_info True
|
||||
done
|
||||
echo "-----------------------------------------------------------------------------------"
|
@@ -1,32 +0,0 @@
|
||||
echo "[FastDeploy] Running Yolo benchmark..."
|
||||
|
||||
num_of_models=$(ls -d yolo_model/* | wc -l)
|
||||
|
||||
counter=1
|
||||
for model in $(ls -d yolo_model/* )
|
||||
do
|
||||
echo "[Benchmark-Yolo] ${counter}/${num_of_models} $model ..."
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 1000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 1000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --cpu_num_thread 8 --iter_num 1000 --backend ov --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend ort --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend paddle --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend paddle_trt --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend paddle_trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend trt --enable_collect_memory_info True
|
||||
python benchmark_yolo.py --model $model --image 000000014439.jpg --device gpu --iter_num 1000 --backend trt --enable_trt_fp16 True --enable_collect_memory_info True
|
||||
counter=$(($counter+1))
|
||||
step=$(( $counter % 1 ))
|
||||
if [ $step = 0 ]
|
||||
then
|
||||
wait
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
rm -rf result_yolo.txt
|
||||
touch result_yolo.txt
|
||||
cat yolo_model/*.txt >> ./result_yolo.txt
|
||||
|
||||
python convert_info.py --txt_path result_yolo.txt --domain yolo --enable_collect_memory_info True
|
Reference in New Issue
Block a user