Written by CHEN Zhangjie (Junior, Department of Computer Science and Engineering)
Face Detection is a computer vision task in which a computer program detects the presence of human faces and also finds their locations in an image or a video stream. The technology has been rapidly advancing and maturing with various models being developed. The popularity of face detection also leads to the demands on performance, detection rate, accuracy, and some other aspects, which have an impact on people’s choice of which model to use.
Haar Feature-based Cascade Classifier was proposed in 2001 by Paul Viola and Michael Jones. It is a classic feature-extraction algorithm, which can effectively detect specific objects using Haar features. Its efficiency is improved by the integral graph. The AdaBoost algorithm is applied to train the strong classifier. Haar Feature-based Cascade Classifier has been widely used in face detection due to its excellent performance.
YuNet is a Convolutional Neural Network (CNN)-based face detector developed by Shiqi Yu in 2018 and open-sourced in 2019. It is a powerful lightweight model which can be loaded on many devices. It’s said YuNet can not only reach 1000 frames per second in efficiency but also has high accuracy in performance. YuNet is also famous for its ability to recognize difficult side faces and occluded faces.
When users need a model for face detection, whether to continue to use the traditional classifier or move on to a newer method based on neural networks has become a problem. Many people may think that traditional methods are easy to train, consume little computation power, and can detect faces efficiently. But is this opinion still correct? This article will compare the performances of these two models through a series of tests, and proves the fact that YuNet has far surpassed the traditional classifiers in both detection rates and efficiency.
- Test platform: Windows10 x64 Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz 1.61 GHz
- Test parameters: Cascade Classifier: Use default (ScaleFactor=1.1, MinNeighbors=2) YuNet: Use default (ConfThreshold=0.9, NmsThreshold=0.3)
- Time cost measurement: Repeat detection 100 times, then calculate the average time cost.
Test 1:
Test image size: 320*320
Test results:
Model | Number of Detections | Time Consumption |
---|---|---|
Cascade Classifier | 8 | 25.81ms |
YuNet | 8 | 5.09ms |
Both models successfully detect all eight frontal faces, which means they have acceptable detection rates for uncovered frontal faces.
However, when it comes to time cost, it only takes YuNet 1/5 of the time Cascade Classifier consumed.
Test 2:
Test image size: 320*320
Test Results:
Model | Number of Detections | Time Consumption |
---|---|---|
Cascade Classifier | 7 | 24.00ms |
YuNet | 10 | 5.36ms |
Test 2 focuses on the models’ capability to detect side faces and occluded faces, which exist in the above test image.
YuNet correctly detected 10 faces, while Cascade Classifier only detected 7. It shows YuNet has a better detection rate for side faces and occluded faces than traditional methods.
The time cost is similar to the previous test.
Test 3:
Test image size: 320*320
Test results:
Model | Number of Detections | Time Consumption |
---|---|---|
Cascade Classifier | 7 | 24.58ms |
YuNet | 37 | 5.12ms |
Test 3 uses the Largest Selfie as the test image with faces of all different scales.
The result shows that only 6 faces are correctly detected by Haar Cascade Classifier, it cannot detect the faces which are smaller in scale, and there are false detection too. YuNet also had difficulty identifying smaller faces, but it correctly identified 37 faces in the front half of the image, which is six times more than the Cascade Classifier’s outcome. It tells YuNet performs better in detecting faces of all scales than the Cascade Classifier.
In terms of time consumption, YuNet is still far more efficient than the traditional method. It can be concluded that the traditional method takes around 25ms for applying face detection on a 320*320 image, while YuNet can maintain around 5ms.
Test 4:
Test image size: 640*640
Test results:
Model | Number of Detections | Time Consumption |
---|---|---|
Cascade Classifier | 29 | 111.26ms |
YuNet | 137 | 22.32ms |
Test 4 attempts to prove YuNet’s advantage still exists on larger input size.
It is noticeable that the Haar feature classifier correctly detects more faces after the image size is enlarged, but this improvement is still not comparable to YuNet’s improvement: in this test, the traditional method correctly identifies 29 faces, while YuNet can recognize 137 faces. YuNet still has better recognition rates in larger images.
There is still a significant gap in the efficiency of the two methods. From the time consumption results, when the image size increases, it still takes less time to run YuNet than to run Cascade Classifier.
Summary
It can be concluded from the tests that CNN-based YuNet has the following advantages:
- Has better detection rates and efficiency.
- Can detect more side faces and occluded faces.
- More light-weighted. (The file ‘face_detection_yunet_2022mar.onnx’ has size of 337 KB, while the file ‘haarcascade_frontalface_default.xml’ has size of 908KB)
- Saves time on parameters. Cascade Classifier’s parameters need to be carefully determined according to a series of variables such as picture size, face number, and face size in order to achieve the best effect. YuNet performs well on most images with default parameters.
- Stable efficiency. The time consumption of Classifier Cascade is closely related to parameters settings, while YuNet is not.
When selecting a face detection model, the CNN-based models should replace the traditional methods to become mainstream.
Test Code
Haar cascade classifier
import cv2 import time img = cv2.imread('test_pics/selfie640.png') k = 100 face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') tic = time.perf_counter() # Convert into grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Detect faces for i in range(1, k): faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=3) toc = time.perf_counter()
YuNet
import cv2 as cv import numpy as np from yunet import YuNet import time img = cv.imread('test_pics/selfie640.png') k = 100 model = YuNet(modelPath='face_detection_yunet_2022mar.onnx', inputSize=[320, 320], confThreshold=0.9, nmsThreshold=0.3, topK=5000, backendId=3, targetId=0) h, w, _ = img.shape # Inference model.setInputSize([w, h]) tic = time.perf_counter() for i in range(1, k): results = model.infer(img) toc = time.perf_counter()
References
All the test images are snipped from the Wider Face dataset and World’s Largest Selfie.