mirror of
https://github.com/dev6699/yolotriton.git
synced 2025-09-26 19:51:13 +08:00
master
yolotriton
Go (Golang) gRPC client for YOLO-NAS, YOLO inference using the Triton Inference Server.
Installation
Use go get
to install this package:
go get github.com/dev6699/yolotriton
Get YOLO-NAS, YOLO TensorRT model
Export of quantized YOLO model
Install ultralytics
pip install ultralytics
NOTE: Replace yolo12n.pt
with your target model
# Export ONNX format then use trtexec to convert
yolo export model=yolo12n.pt format=onnx
trtexec --onnx=yolo12n.onnx --saveEngine=model_repository/yolov12/1/model.plan
NOTE: Inputs/Outputs still remained as FP32
for compatibility reasons.
# export FP32 TensorRT format directly
yolo export model=yolo12n.pt format=engine
# export quantized FP16 TensorRT
yolo export model=yolo12n.pt format=engine half
# export quantized INT8 TensorRT
yolo export model=yolo12n.pt format=engine int8
References:
- https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html
- https://docs.ultralytics.com/modes/export/#export-formats
- https://github.com/NVIDIA/TensorRT/tree/master/samples/trtexec
Troubleshooting:
- Use
trtexec --loadEngine=yolo12n.engine
to check the engine. - Failed to load the exported engine, check Related issue
Convert to FP16 with onnxconverter_common
NOTE: set keep_io_types=True
to keep inputs/outputs as FP32, else it will be changed to FP16
import onnx
from onnxconverter_common import float16
# Load original model
model = onnx.load("model.onnx")
model_fp16 = float16.convert_float_to_float16(
model,
# keep_io_types=True,
node_block_list=[]
)
# Save
onnx.save(model_fp16, "model_fp16.onnx")
Export of quantized YOLO-NAS INT8 model
- Export quantized onnx model
from super_gradients.conversion.conversion_enums import ExportQuantizationMode
from super_gradients.conversion import DetectionOutputFormatMode
from super_gradients.common.object_names import Models
from super_gradients.training import models
# From custom model
# model = models.get(Models.YOLO_NAS_S, num_classes=1, checkpoint_path='ckpt_best.pth')
model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco")
export_result = model.export(
"yolo_nas_s_int8.onnx",
output_predictions_format=DetectionOutputFormatMode.BATCH_FORMAT,
quantization_mode=ExportQuantizationMode.INT8 # or ExportQuantizationMode.FP16
)
print(export_result)
- Convert to TensorRT with INT8 builder
trtexec --onnx=yolo_nas_s_int8.onnx --saveEngine=yolo_nas_s_int8.plan --int8
References:
Start triton inference server
docker compose up tritonserver
References:
Sample usage
Check cmd/main.go for more details.
- For help
go run cmd/main.go --help
-b Run benchmark.
-i string
Inference Image. (default "images/1.jpg")
-m string
Name of model being served (Required) (default "yolonas")
-n int
Number of benchmark run. (default 1)
-o float
Intersection over Union (IoU) (default 0.7)
-p float
Minimum probability (default 0.5)
-t string
Type of model. Available options: [yolonas, yolonasint8, yolofp16, yolofp32] (default "yolonas")
-u string
Inference Server URL. (default "tritonserver:8001")
-x string
Version of model. Default: Latest Version
- Sample usage with yolonasint8 model
go run cmd/main.go -m yolonasint8 -t yolonasint8 -i images/1.jpg
1. processing time: 123.027909ms
prediction: 0
class: dog
confidence: 0.96
bboxes: [ 669 130 1061 563 ]
---------------------
prediction: 1
class: person
confidence: 0.96
bboxes: [ 440 30 760 541 ]
---------------------
prediction: 2
class: dog
confidence: 0.93
bboxes: [ 168 83 495 592 ]
---------------------
- Sample usage to get benchmark results
go run cmd/main.go -m yolonasint8 -t yolonasint8 -i images/1.jpg -b -n 10
1. processing time: 64.253978ms
2. processing time: 51.812457ms
3. processing time: 80.037468ms
4. processing time: 96.73738ms
5. processing time: 87.22928ms
6. processing time: 95.28627ms
7. processing time: 61.609115ms
8. processing time: 87.625844ms
9. processing time: 70.356198ms
10. processing time: 74.130759ms
Avg processing time: 76.93539ms
Results
Input | Ouput |
---|---|
![]() |
![]() |
![]() |
![]() |
Languages
Go
98.8%
Dockerfile
1.2%