YOLOv5-seg Example
This demo uses a YOLOv5-seg model to detect objects and provides instance segmentation.
Usage
Make sure you have downloaded the data files first for the examples. You only need to do this once for all examples.
cd example/
git clone --depth=1 https://github.com/swdee/go-rknnlite-data.git data
Run the YOLOv5-seg example on rk3588 or replace with your Platform model.
cd example/yolov5-seg
go run yolov5-seg.go -p rk3588
This will result in the output of:
Driver Version: 0.9.6, API Version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)
Model Input Number: 1, Ouput Number: 7
Input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
Output tensors:
index=0, name=output0, n_dims=4, dims=[1, 255, 80, 80], n_elems=1632000, size=1632000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=1, name=output1, n_dims=4, dims=[1, 96, 80, 80], n_elems=614400, size=614400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=20, scale=0.022222
index=2, name=376, n_dims=4, dims=[1, 255, 40, 40], n_elems=408000, size=408000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=3, name=377, n_dims=4, dims=[1, 96, 40, 40], n_elems=153600, size=153600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=29, scale=0.023239
index=4, name=379, n_dims=4, dims=[1, 255, 20, 20], n_elems=102000, size=102000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003918
index=5, name=380, n_dims=4, dims=[1, 96, 20, 20], n_elems=38400, size=38400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=32, scale=0.024074
index=6, name=371, n_dims=4, dims=[1, 32, 160, 160], n_elems=819200, size=819200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-116, scale=0.022475
dog @ (197 83 357 299) 0.786010
cat @ (714 101 900 336) 0.706588
dog @ (312 93 526 304) 0.693387
cat @ (28 113 171 292) 0.641764
cat @ (530 141 712 299) 0.616804
Model first run speed: inference=45.977719ms, post processing=46.101967ms, rendering=1.395305ms, total time=93.474991ms
Saved object detection result to ../data/catdog-yolov5-seg-out.jpg
Benchmark time=7.346591785s, count=100, average total time=73.465917ms
done
The saved JPG image with instance segmentation outlines.
See the help for command line parameters.
$ go run yolov5-seg.go --help
Usage of /tmp/go-build2169893350/b001/exe/yolov5-seg:
-i string
Image file to run object detection on (default "../data/catdog.jpg")
-l string
Text file containing model labels (default "../data/coco_80_labels_list.txt")
-m string
RKNN compiled YOLO model file (default "../data/yolov5s-seg-rk3588.rknn")
-o string
The output JPG file with object detection markers (default "../data/catdog-yolov5-seg-out.jpg")
-p string
Rockchip CPU Model number [rk3562|rk3566|rk3568|rk3576|rk3582|rk3582|rk3588] (default "rk3588")
-r string
The rendering format used for instance segmentation [outline|mask|dump] (default "outline")
Docker
To run the YOLOv5-seg example using the prebuilt docker image, make sure the data files have been downloaded first, then run.
# from project root directory
docker run --rm \
--device /dev/dri:/dev/dri \
-v "$(pwd):/go/src/app" \
-v "$(pwd)/example/data:/go/src/data" \
-v "/usr/include/rknn_api.h:/usr/include/rknn_api.h" \
-v "/usr/lib/librknnrt.so:/usr/lib/librknnrt.so" \
-w /go/src/app \
swdee/go-rknnlite:latest \
go run ./example/yolov5-seg/yolov5-seg.go -p rk3588
Rendering Methods
The default rendering method is to draw an outline around the edge of the detected object as depicted in the image above. This method however takes the most resources to calculate and is more noticable if the scene has more objects in it.
A faster method of rendering is also provided which draws the bounding boxes around the object and provides a single transparent overlay to indicate the segment mask.
This can be output with the following flag.
go run yolov5-seg.go -p rk3588 -r mask
For visualisation and debugging purposes the segmentation mask can also be dumped to an image.
go run yolov5-seg.go -p rk3588 -r dump
Model Segment Mask Size
The Rockchip examples are based on Models with an input tensor size of 640x640 with 3 channels (RGB). The output tensor for the segment mask size is 160x160.
If your Model has been trained with a different output segment mask size such as 320x320
you will need to pass those sizes to the YOLOv8SegParams.Prototype*
variables, eg:
// start with default COCO Parameters
yParams := postprocess.YOLOv5SegCOCOParams()
// Set your Models output mask size
yParams.PrototypeHeight = 320
yParams.PrototypeWeight = 320
// create YOLO Processor instance
yoloProcesser := postprocess.NewYOLOv5Seg(yParams)
Benchmarks
The following table shows a comparison of the benchmark results across the three distinct platforms.
Platform | Execution Time | Average Inference Time Per Image |
---|---|---|
rk3588 | 7.34s | 73.46ms |
rk3576 | 8.91s | 89.14ms |
rk3566 | 32.64s | 326.44ms |
Note that these examples are only using a single NPU core to run inference on. The results would be different when running a Pool of models using all NPU cores available. Secondly the Rock 4D (rk3576) has DDR5 memory versus the Rock 5B (rk3588) with slower DDR4 memory.
Background
This YOLOv5-seg example is a Go conversion of the C API example with improvements made to it inspired by Ultralytics Instance Segmentation.