Open Computing Language (OpenCL) is an open standard for writing code that runs across heterogeneous platforms including CPUs, GPUs, DSPs and etc. In particular OpenCL provides applications with an access to GPUs for non-graphical computing (GPGPU) that in some cases results in significant speed-up. In Computer Vision many algorithms can run on a GPU much more effectively than on a CPU: e.g. image processing, matrix arithmetic, computational photography, object detection etc.
Acceleration of OpenCV with OpenCL started 2011 by AMD. As the result the OpenCV-2.4.3 release included the new
ocl module containing OpenCL implementations of some existing OpenCV algorithms. That is, when OpenCL runtime and a compatible device are available on a client machine, user may call
cv::ocl::resize() instead of
cv::resize() to use the accelerated code. During 3 years more and more functions and classes have been added to the
ocl module; but it has been a separate API alongside with the primary CPU-oriented API in OpenCV-2.x.
In OpenCV-3.x the architecture concept has been changed to the so-called Transparent API (T-API). In the new architecture a separate OpenCL-accelerated
cv::ocl::resize() is removed from external API and becomes a branch in a regular
cv::resize(). This branch is called automatically when it’s possible and makes sense from the performance point of view. More details on the T-API can be found in the following slides. The T-API implementation was sponsored by AMD and Intel companies.
Some performance numbers are shown on the picture below:
|Regular CPU code||OpenCL-aware code OpenCV-2.x||OpenCL-aware code OpenCV-3.x|
// initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); Mat frame, frameGray; vector
// initialization VideoCapture vcap(...); ocl::OclCascadeClassifier fd("haar_ff.xml"); ocl::oclMat frame, frameGray; Mat frameCpu; vector
// initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); UMat frame, frameGray; vector