YOLO-NAS Example

Usage

Make sure you have downloaded the data files first for the examples. You only need to do this once for all examples.

cd example/
git clone --depth=1 https://github.com/swdee/go-rknnlite-data.git data

Run the YOLO-NAS example on rk3588 or replace with your Platform model.

cd example/yolo-nas
go run yolo-nas.go -p rk3588

This will result in the output of:

Driver Version: 0.9.6, API Version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)
Model Input Number: 1, Ouput Number: 2
Input tensors:
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
Output tensors:
  index=0, name=output, n_dims=3, dims=[1, 8400, 4, 0], n_elems=33600, size=33600, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=-90, scale=3.387419
  index=1, name=1100, n_dims=3, dims=[1, 8400, 80, 0], n_elems=672000, size=672000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003898
bus @ (98 132 565 447) 0.970715
person @ (213 240 284 508) 0.962918
person @ (108 240 230 535) 0.951223
person @ (474 233 565 521) 0.943426
person @ (77 321 132 518) 0.682229
Model first run speed: inference=74.299132ms, post processing=2.090633ms, rendering=715.155µs, total time=77.10492ms
Saved object detection result to ../data/bus-yolo-nas-out.jpg
Benchmark time=5.940757606s, count=100, average total time=59.407576ms
done

The saved JPG image with object detection markers.

To use your own RKNN compiled model and images.

go run yolo-nas.go -m <RKNN model file> -i <image file> -l <labels txt file> -o <output jpg file> -p <platform>

The labels file should be a text file containing the labels the Model was trained on. It should have one label per line.

See the help for command line parameters.

$ go run yolo-nas.go --help

Usage of /tmp/go-build4215758863/b001/exe/yolo-nas:
  -i string
        Image file to run object detection on (default "../data/bus.jpg")
  -l string
        Text file containing model labels (default "../data/coco_80_labels_list.txt")
  -m string
        RKNN compiled YOLO model file (default "../data/models/rk3588/yolonas-s-rk3588.rknn")
  -o string
        The output JPG file with object detection markers (default "../data/bus-yolo-nas-out.jpg")
  -p string
        Rockchip CPU Model number [rk3562|rk3566|rk3568|rk3576|rk3582|rk3582|rk3588] (default "rk3588")

Docker

To run the YOLO-NAS example using the prebuilt docker image, make sure the data files have been downloaded first, then run.

# from project root directory

docker run --rm \
  --device /dev/dri:/dev/dri \
  -v "$(pwd):/go/src/app" \
  -v "$(pwd)/example/data:/go/src/data" \
  -v "/usr/include/rknn_api.h:/usr/include/rknn_api.h" \
  -v "/usr/lib/librknnrt.so:/usr/lib/librknnrt.so" \
  -w /go/src/app \
  swdee/go-rknnlite:latest \
  go run ./example/yolo-nas/yolo-nas.go -p rk3588

Benchmarks

The following table shows a comparison of the benchmark results across the three distinct platforms.

Platform	Execution Time	Average Inference Time Per Image
rk3588	5.94s	59.40ms
rk3576	4.68s	46.86ms
rk3566	13.16s	131.61ms

RK3588 Bug

Note that the rk3588 benchmark is using a RKNN model compiled with rknn-toolkit2 version 2.1.0 which has slower inference time than the newer version 2.3.2 which was used on the other platforms.

This is due to a bug reported here. If you wish to use your own trained YOLO-NAS model on rk3588, make sure you compile to RKNN format using the older version of rknn-toolkit2.

Converting YOLO-NAS Model

YOLO-NAS is not a model that is supported by the upstream vendor in their model zoo, so to export it to ONNX and convert to RKNN format follow these steps.

Setup a python virtual environment using Python 3.10.

Install python dependencies for the YOLO-NAS project.

pip install super-gradients torch onnx onnx-simplifier

Patch the download links in the super-gradients source code.

sed -i -e "s/sghub.deci.ai/sg-hub-nv.s3.amazonaws.com/g" .venv/lib/python3.10/site-packages/super_gradients/training/pretrained_models.py
sed -i -e "s/sghub.deci.ai/sg-hub-nv.s3.amazonaws.com/g" .venv/lib/python3.10/site-packages/super_gradients/training/utils/checkpoint_utils.py

Create a python script named export_onnx.py with contents:

import torch
from super_gradients.training import models
from super_gradients.common.object_names import Models

# Load the YOLO-NAS model (you can also load your own checkpoint by using the `checkpoint_path` arg)
model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco")

# Switch to eval mode and prepare for conversion
model.eval()

# Specify the expected input shape: [batch_size, channels, height, width]
model.prep_model_for_conversion(input_size=(1, 3, 640, 640))

# Export with torch.onnx.export
dummy_input = torch.randn(1, 3, 640, 640)
torch.onnx.export(
    model,
    dummy_input,
    "yolo_nas_s_manual.onnx",
    opset_version=11,
    input_names=["images"],
    output_names=["output"],
)

Then run the script in your Python 3.10 virtual environment and it will save the model as ONNX format.

python export_onnx.py

The above script uses the pretrained COCO dataset so you can use the YOLOv8 convert.py script from the model zoo to export as RKNN format.