Compile OpenCV with Python & GPU support & Examples

The idea of re-compiling OpenCV came up when I was using OpenCV for keypoint detection. The official OpenCV tutorial for Python is not that in detail. Only several CPU-based keypoint detection algorithms are mentioned in the document, which results in the high usage of CPU and long time consumption. However, when I get into the OpenCV C++ document, I found several implementations of functions utilizing GPU’s power.

So in this article, I note the compile process and give some examples of the warped OpenCV functions in Python. Leveraging the hardware acceleration, i.e., the GPU implementation of the SURF algorithm gets about 5.5 times faster than the CPU implementation.

Ⅰ. Preparation

OpenCV stands for Open Source Computer Vision Library. It is written in C++ and provides API for C++, Python, MATLAB, etc. Aiming for the GPU acceleration, some dependencies are needed here.

Dependences

The OpenCV in this article is compiled with:

Name Description
Visual Studio 2017 runtime
CMake 3.19.2 win64 configure
MSDK2020R1 Intel Quick Sync hardware video encoder/decoder
CUDA 11.0.2 CUDA binaries
cuDNN DNN
Video Codec SDK 11.0.10 NVIDIA Video Codec SDK
OpenCV 4.5.1 OpenCV source main
OpenCV_contrib 4.5.1 OpenCV source contrib
Anaconda Python environment

1. Install Anaconda

  • Remember to choose Register Anaconda as my default Python.

  • After the installation, install numpy before compiling OpenCV.

    conda install -y numpy
    

2. Install Visual Studio 2017

  • Choose to install “Desktop development with C++”.

3. Install CUDA and cuDNN

  • Install CUDA with default settings.
  • Unzip cuDNN, copy the bin, include and lib folder to where you install CUDA, default path is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0.

4. Install CMake

  • Choose “Add Cmake to the system PATH for the current user”

5. Prepare OpenCV source

  • OpenCV source main is a self-extract file. Extract to some folder, i.e., C:\, the exe file will create the folder C:\opencv .
  • Unzip OpenCV source contrib.
  • Copy the modules to C:\opencv\modules.

6. Install Intel Media SDK

The Intel Media SDK provides the hardware acceleration on Fourcc MPG2, H264, X264, AVC , H265, or HEVC. The video can be decoded/ encoded by cv2.CAP_INTEL_MFX.

  • Install MSDK2020R1 with default settings.

7. Integrate Nvidia Codec SDK

With Nvidia Codec SDK, we can use cv2.cudacodec.VideoReader() to help us decode the video.

  • Unzip the zip file
  • Copy Lib/x64/nvcuvid.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64, copy Lib/Win32/nvcuvid.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\Win32
  • Copy the .h files in folder Interface to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include

Ⅱ. Build OpenCV

We need to build OpenCV with several steps.

1. Configuration

  • Choose the source and build folder.

    cmake_1

  • Click Configure.

2. Configuration

  • Choose the generator and the platform.

    cmake_2

  • The configuration would take a while.

3. Configuration

  • Check the parameters in the table:
Name Value
ENABLE_FAST_MATH ON
INSTALL_CREATE_DISTRIB ON
OPENCV_ENABLE_NONFREE ON
OPENCV_EXTRA_MODULES_PATH C:/opencv/modules
WITH_CUDA ON
WITH_MFX ON
WITH_OPENGL ON
WITH_OPENMP ON
  • Click Configure again.

4. Configuration

  • When CMake is busy with configuration, we can check the GPU Compute Capability. For a shorter compilation process, we can specify the compute capability of the GPU. For example, the compute capability of Geforce RTX 2070 is 7.5.
Name Value
CUDA_ARCH_BIN 7.5
CUDA_ARCH_PTX 7.5
CUDA_FAST_MATH ON
WITH_NVCUVID ON
BUILD_opencv_world ON
  • Click Configure again.

5. Configuration

  • Check if NVCUVID and FAST MATH will be built.

    cmake_3

  • Check if python3 is included in the OpenCV Modules.

    cmake_4

  • Click Generate.

  • Go to C:\opencv\build, call command prompt by the address bar.

  • Compile with the command:

    cmake.exe --build . --config Release --target INSTALL
    
  • This would take a long time. Once finished, modify the system environment variables:

    Action Variable Value
    Create OPENCV_DIR C:\opencv\build
    Create OPENCV_VER 451
    Add PATH %OPENCV_DIR%\bin\Debug
    Add PATH %OPENCV_DIR%\bin\Release
    Add PATH %OPENCV_DIR%\install\x64\vc15\bin

Ⅲ. Python Bindings

  • Go to the site-package of Anaconda, the default path is C:\Users\{your name}\miniconda3\Lib\site-packages\cv2\python-3.8, change{your name} to your user name.
  • Copy the cv2.cp38-win_amd64.pyd to C:\Users\{your name}\miniconda3\Lib\site-packages\.
  • Now you can import cv2 in the Anaconda base environment.

Ⅳ. Short Python Examples

​ It is a bit weird that now we need to write the Python program together with the OpenCV C++ document. ;)

CUDA Decoder

After we compile the NVCUVID module, we can now use the cv2.cudacodec.VideoReader() function now. With this function, we can directly capture the frames via GPU. The captured frame is of type cv2.cuda_GpuMat.

A CUDA video reader can be:

1
2
3
4
5
6
7
8
video = cv2.cudacodec.createVideoReader(self.videoPath) # create a video reader

while True:
    status, currentFrame = video.nextFrame() # read the frames
    if status:
        {something you want to do}
        else:
            break

MediaSDK backend Decoder/Encoder

Once we have installed MSDK2020R1, we can call cv2.CAP_INTEL_MFX to help us decode or encode the video.

The media containers are still not supported by OpenCV, so before we use the Decoder, we need to transform the video to a certain format (.264, .h264, .mp2, .mpeg2, .265 or .hevc).

The official document introduces the method with FFmpeg tool.

# H264
ffmpeg -i video.avi -vcodec copy -an -bsf:v h264_mp4toannexb video.264
# H265
ffmpeg -i in.mkv -c:v copy -bsf hevc_mp4toannexb out.h265

After that, we can decode the video with cv2.CAP_INTEL_MFX.

1
2
3
4
5
6
7
8
video = cv2.VideoCapture('video.264', cv2.CAP_INTEL_MFX)
while True:
    status, frame = video.read()
    if status:
        cv2.imshow('video', frame)
        cv2.waitKey(40)
    else:
        break

We can also use the VideoWriter with cv2.CAP_INTEL_MFX.

1
2
fourcc = cv2.VideoWriter_fourcc(*'H264')
writer = cv2.VideoWriter(videoPath, cv2.CAP_INTEL_MFX, fourcc, fps, frameSize)

Keypoint Detection Algorithm with CUDA

OpenCV provides SURF and ORB algorithms with both CPU and GPU implementations.

  • SURF CPU

    • the keypoints can be detected by:

      1
      2
      
      surf = cv2.xfeatures2d.SURF_create(100)
      kp, des = surf.detectAndCompute(img, mask)
      

      The input image is numpy array. Both keypoints and descriptors are in list format.

  • SURF GPU

    • the keypoints can be detected by:

      1
      2
      3
      
      surf = cv2.cuda.SURF_CUDA_create(100)
      gpu_kp, gpu_des = surf.detectWithDescriptors(gpu_img, gpu_mask)
      kp = surf.downloadKeypoints(gpu_kp)
      

      The input image is a cv2.cuda_GpuMat, the detected keypoints and descriptors are in GPU. The keypoints can be downloaded to the RAM via downloadKeypoints() method. The descriptors can be matched by GPU matcher or be downloaded to RAM via download() method.

I read 2 000 images three times on my computer with both algorithms, respectively. The result shows a significant improvement on GPU implementation.

SURF (CPU) [sec] SURF (GPU) [sec]
1 178.2835454940796 34.70819616317749
2 175.7257044315338 32.26314163208008
3 175.60720658302308 32.25424408912659

Ⅴ. Summary

OpenCV provides us many convenient functions. We can call them to finish the complex algorithms just with several lines.

With the GPU acceleration, we get a much faster algorithm with much lower CPU usage.

Since OpenCV 4 integrates the DNN module (possible with GPU backend), many Computer Vision algorithms can be called from DNN module, such as YOLOv4. These changes makes us more comfortable to enjoy the new computer vision technology, and integrate them into projects and improve our daily lives.

Thanks for everyone who contribute to the community!

updatedupdated2021-02-132021-02-13