Working with Neural Processing Units (NPUs) using OpenCV

Authors: WU Jia, GAO Jinwei

NPU, short for neural processing unit, is a specialized processor designed to accelerate the performance of common machine learning tasks and typically of neural networks applications. Besides acceleration, NPU frees the CPU and it is pretty power efficient.

OpenCV’s Dynamic Neural Network (DNN) module is a light and efficient deep learning inference engine. It’s highly optimized and can deploy deep learning models on a wide range of hardwares. Since the release of 4.6.0 last June, OpenCV now supports running models on NPU! Khadas VIM3 is the first dev board officially supported by OpenCV to run quantized deep learnings models on NPU via the DNN module. Not only that, OpenCV DNN works with various dev boards which use the A311D System-on-Chip (SoC), the same as that on VIM3.

Let’s first take a look at how VIM3 NPU accelerates the inference (Tests below use the benchmark in OpenCV Model Zoo with OpenCV 4.6.0-pre).

DL Model	Khadas VIM3 CPU(ms)	Khadas VIM3 NPU(ms)
YuNet	5.42	4.04
SFace	82.22	46.25
CRNN-EN	181.89	125.30
CRNN-CN	238.95	166.79
PP-ResNet	543.69	75.45
PP-HumanSeg	82.85	31.36
YoutuReID	486.33	44.61

As mentioned above, models can run on other dev boards’ NPU as well, as long as they have the A311D SoC. Here’s the test on another dev board.

DL Model	EAIS-750E CPU(ms)	EAIS-750E NPU(ms)
YuNet	5.99	4.20
SFace	85.70	52.00
CRNN-EN	237.40	158.00
CRNN-CN	310.50	210.00
PP-ResNet	541.00	67.70
PP-HumanSeg	92.20	36.20
YoutuReID	502.70	45.50

We can see from the above table that NPU can bring as much as 10+ times speedup to the inference. What an exciting news! What’s more, it takes little effort to enable NPU via OpenCV DNN. The only thing required is to set the backend and the target device.

import cv2 as cv

# some preprocessing
# ...

# load model
net = cv.readNet(PATH_TO_MODEL)

# settings
net.setPreferableBackend(cv.dnn.DNN_BACKEND_TIMVX) # set backend
net.setPreferableTarget(cv.dnn.DNN_TARGET_NPU)     # set target device

# infer
output = net.forward()

# the postprocessing
# ...

A litter bit more work needs to be done to enable the OpenCV DNN NPU backend though. The instructions are available on GitHub. If you are using dev boards other than VIM3 but with the same A311D chip, you can refer to this blog post to get more tips.

Khadas is a Shenzhen-based single board computer (SBC) manufacturer for the open source community and streaming media player industry. It is one of the OpenCV Development Partners. The VIM3 SBC is a powerful dev board with the 12nm Amlogic A311D SoC which has x4 Cortex A73 performance-cores (2.2GHz) and x2 Cortex A53 efficiency-cores (1.8GHz) merged into a hexa-core configuration and an onboard 5.0 trillion operations per second (TOPS) NPU for neural network applications.

To end this post let’s do something simple but cool with the VIM3: Control the robot arm using ‘vision’, tracking the face in front of its camera by running deep learning face detection model on the NPU in real time. See the GIF below:

Become a Member

Free Courses

Courses

Partnership

Resources

General Link

Free Courses

Courses

Partnership

Resources

General Link

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Working with Neural Processing Units (NPUs) using OpenCV

Become a Member

Related Posts

Become a Member

Free Courses

Courses

Partnership

Resources

General Link

Free Courses

Courses

Partnership

Resources

General Link

Subscribe to receive the download link, receive updates, and be notified of bug fixes