Deep Learning Inference Engine backend from the Intel OpenVINO toolkit is one of the supported OpenCV DNN backends. It was mentioned in the previous post that ARM CPUs support has been recently added to Inference Engine via the dedicated ARM CPU plugin. Let’s review how OpenCV DNN module can leverage Inference Engine and this plugin to run DL networks on ARM CPUs.
Several options for how to configure Inference Engine with OpenCV are mentioned in OpenCV wiki. We will build all components from the scratch: OpenVINO, ARM CPU plugin, OpenCV, and then run YOLOv4-tiny inference on Raspberry Pi.
OpenVINO and OpenCV cross-compilation
We will cross-compile OpenVINO with the plugin and OpenCV in Docker container on the x86 platform. It allows us to speed up compilation – the native compilation process on Raspberry Pi would take a while.
First, we will create a Docker image with configured build environment which contains OpenCV and OpenVINO dependencies and runs a build script. To do that, create Dockerfile with the following content:
FROM debian:buster USER root RUN echo 'deb http://deb.debian.org/debian unstable main' > /etc/apt/sources.list.d/unstable.list RUN dpkg --add-architecture armhf && \ apt-get update && \ apt-get install -y --no-install-recommends \ build-essential \ crossbuild-essential-armhf \ python3-dev \ python3-pip \ python3-numpy/unstable \ git-lfs \ scons \ wget \ xz-utils \ cmake \ libusb-1.0-0-dev:armhf \ libgtk-3-dev:armhf \ libavcodec-dev:armhf \ libavformat-dev:armhf \ libswscale-dev:armhf \ libgstreamer1.0-dev:armhf \ libgstreamer-plugins-base1.0-dev:armhf \ libpython3-dev:armhf && \ rm -rf /var/lib/apt/lists/* COPY arm_build.sh /arm_build.sh RUN mkdir /arm WORKDIR /arm/ CMD ["sh", "/arm_build.sh"]
Before we build the Docker image, we need to create the build script. This script contains 3 parts:
- clone OpenCV, OpenVINO, and OpenVINO contrib repositories
- build OpenVINO with ARM CPU plugin
- build OpenCV with IE backend support
Create a file with the name “arm_build.sh” and add the following content to the script:
#!/bin/sh set -x fail() { echo $1 exit 1 } git clone --recurse-submodules --shallow-submodules --depth 1 https://github.com/opencv/opencv.git && \ git clone --recurse-submodules --shallow-submodules --depth 1 https://github.com/openvinotoolkit/openvino.git && \ git clone --recurse-submodules --shallow-submodules --depth 1 https://github.com/openvinotoolkit/openvino_contrib.git || \ fail "Failed to clone source repositories" cd /arm/openvino && \ mkdir openvino_install && mkdir openvino_build && cd openvino_build && \ cmake -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX="../openvino_install" \ -DCMAKE_TOOLCHAIN_FILE="../cmake/arm.toolchain.cmake" \ -DTHREADING=SEQ \ -DIE_EXTRA_MODULES=/arm/openvino_contrib/modules/arm_plugin \ -DTHREADS_PTHREAD_ARG="-pthread" .. \ -DCMAKE_INSTALL_LIBDIR=lib \ -DCMAKE_CXX_FLAGS=-latomic \ -DENABLE_TESTS=OFF -DENABLE_BEH_TESTS=OFF -DENABLE_FUNCTIONAL_TESTS=OFF && \ make --jobs=$(nproc --all) && make install && \ cp /arm/openvino/bin/armv7l/Release/lib/libarmPlugin.so \ /arm/openvino/openvino_install/deployment_tools/inference_engine/lib/armv7l/ || \ fail "OpenVINO build failed" cd /arm/opencv && mkdir opencv_install && mkdir opencv_build && cd opencv_build && \ PYTHONVER=`ls /usr/include | grep "python3.*"` && \ cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_LIST=imgcodecs,videoio,highgui,dnn,python3 \ -DCMAKE_INSTALL_PREFIX="../opencv_install" \ -DOPENCV_CONFIG_INSTALL_PATH="cmake" \ -DCMAKE_TOOLCHAIN_FILE="../platforms/linux/arm-gnueabi.toolchain.cmake" \ -DWITH_IPP=OFF \ -DBUILD_TESTS=OFF \ -DBUILD_PERF_TESTS=OFF \ -DOPENCV_ENABLE_PKG_CONFIG=ON \ -DPYTHON3_PACKAGES_PATH="../opencv_install/python" \ -DPKG_CONFIG_EXECUTABLE="/usr/bin/arm-linux-gnueabihf-pkg-config" \ -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=ON \ -DPYTHON3_INCLUDE_PATH="/usr/include/${PYTHONVER}" \ -DPYTHON3_NUMPY_INCLUDE_DIRS="/usr/lib/python3/dist-packages/numpy/core/include" \ -DPYTHON3_LIMITED_API=ON \ -DOPENCV_SKIP_PYTHON_LOADER=ON \ -DENABLE_NEON=ON \ -DCPU_BASELINE="NEON" \ -DWITH_INF_ENGINE=ON \ -DWITH_NGRAPH=ON \ -Dngraph_DIR="/arm/openvino/openvino_build/ngraph" \ -DINF_ENGINE_RELEASE=2021030000 \ -DInferenceEngine_DIR="/arm/openvino/openvino_build" \ -DINF_ENGINE_LIB_DIRS="/arm/openvino/bin/armv7l/Release/lib" \ -DINF_ENGINE_INCLUDE_DIRS="/arm/openvino/inference-engine/include" \ -DCMAKE_FIND_ROOT_PATH="/arm/openvino" \ -DENABLE_CXX11=ON .. && \ make --jobs=$(nproc --all) && make install || \ fail "OpenCV build failed"
Now we are ready to build the image and run it:
docker image build -t cross_armhf . mkdir arm && docker container run -u `id -u`:`id -g` --rm -t -v $PWD/arm:/arm cross_armhf
As soon as build script is completed, you can find OpenCV artifacts in arm/opencv/opencv_install directory and OpenVINO artifacts in arm/openvino/openvino_install directory. We need to copy OpenCV and OpenVINO artifacts to target ARM platform. If target ARM platform has SSH interface, then scp tool can be used:
scp -r arm/{opencv/opencv_install/,openvino/openvino_install} <user>@<host>:<path>
Running the application
To evaluate the ARM CPU plugin, we can use the following Python application to detect objects:
import cv2 as cv import numpy as np import os import argparse def draw_boxes(image, boxes, confidences, class_ids, idxs): if len(idxs) > 0: for i in idxs.flatten(): # extract bounding box coordinates left, top = boxes[i][0], boxes[i][1] width, height = boxes[i][2], boxes[i][3] # draw bounding box and label cv.rectangle(image, (left, top), (left + width, top + height), (0, 255, 0)) label = "%s: %.2f" % (classes[class_ids[i]], confidences[i]) cv.putText(image, label, (left, top - 5), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0)) return image def make_prediction(net, layer_names, labels, frame, conf_threshold, nms_threshold): boxes = [] confidences = [] class_ids = [] frame_height, frame_width = frame.shape[:2] # create a blob from a frame blob = cv.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False) net.setInput(blob) outputs = net.forward(layer_names) # extract bounding boxes, confidences and class ids for output in outputs: for detection in output: # extract the scores, class id and confidence scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] # consider the predictions that are above the threshold if confidence > conf_threshold: center_x = int(detection[0] * frame_width) center_y = int(detection[1] * frame_height) width = int(detection[2] * frame_width) height = int(detection[3] * frame_height) # get top left corner coordinates left = int(center_x - (width / 2)) top = int(center_y - (height / 2)) boxes.append([left, top, width, height]) confidences.append(float(confidence)) class_ids.append(class_id) idxs = cv.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold) return boxes, confidences, class_ids, idxs parser = argparse.ArgumentParser() parser.add_argument('--model', default='yolov4-tiny.weights', help='Path to a binary file of model') parser.add_argument('--config', default='yolov4-tiny.cfg', help='Path to network configuration file') parser.add_argument('--classes', default='coco.names', help='Path to label file') parser.add_argument('--conf_threshold', type=float, default=0.5, help='Confidence threshold') parser.add_argument('--nms_threshold', type=float, default=0.3, help='Non-maximum suppression threshold') parser.add_argument('--input', help='Path to video file') parser.add_argument('--output', default='', help='Path to directory for output video file') args = parser.parse_args() # load names of classes classes = open(args.classes).read().rstrip('\n').split('\n') # load a network net = cv.dnn.readNet(args.config, args.model) net.setPreferableBackend(cv.dnn.DNN_BACKEND_INFERENCE_ENGINE) layer_names = net.getUnconnectedOutLayersNames() cap = cv.VideoCapture(args.input) # define the codec and create VideoWriter object if args.output != '': input_file_name = os.path.basename(args.input) output_file_name, output_file_format = os.path.splitext(input_file_name) output_file_name += '-output' if output_file_format != '': fourcc = int(cap.get(cv.CAP_PROP_FOURCC)) else: output_file_format = '.mp4' fourcc = cv.VideoWriter_fourcc(*'mp4v') output_file_path = args.output + output_file_name + output_file_format fps = cap.get(cv.CAP_PROP_FPS) frame_size = (int(cap.get(cv.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))) out = cv.VideoWriter(output_file_path, fourcc, fps, frame_size) while cv.waitKey(1) < 0: hasFrame, frame = cap.read() if not hasFrame: break boxes, confidences, class_ids, idxs = make_prediction(net, layer_names, classes, frame, args.conf_threshold, args.nms_threshold) frame = draw_boxes(frame, boxes, confidences, class_ids, idxs) if args.output != '': out.write(frame) else: cv.imshow('object detection', frame)
There is no code related to the ARM platform in the demo application. Default CPU target and detected ARM architecture tells Inference Engine to use ARM CPU plugin for inference. If the same application is run on the x86 platform, Inference Engine will select mklDNN backend for model inference.
Before we run the application, we need to download a pre-trained YOLOv4-tiny model and video file:
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/coco.names wget https://raw.githubusercontent.com/intel-iot-devkit/sample-videos/master/people-detection.mp4
Also, we need to define LD_LIBRARY_PATH and PYTHONPATH environment variables:
export PYTHONPATH=$PYTHONPATH:<artifacts_dir>/opencv_install/python/ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<artifacts_dir>/opencv_install/lib/:<artifacts_dir>/openvino_install/deployment_tools/ngraph/lib/: <artifacts_dir>/openvino_install/deployment_tools/inference_engine/lib/armv7l/
Finally, we can run the application:
python3 object_detection.py --model yolov4-tiny.weights --config yolov4-tiny.cfg --classes coco.names --input people-detection.mp4
If a window can’t be displayed on the platform, you can save the output in a video file using –output flag:
python3 object_detection.py --model yolov4-tiny.weights --config yolov4-tiny.cfg --classes coco.names --input people-detection.mp4 --output ./
Conclusion
In this article, you have learned how to use the ARM CPU plugin with OpenCV and validate it by running the YOLO object detection demo. If you have any issues with the ARM CPU plugin, please don’t hesitate to raise a ticket in OpenVINO contrib repository where the plugin is hosted.