Compile OpenCV with Python & GPU support & Examples

The idea of re-compiling OpenCV came up when I was using OpenCV for keypoint detection. The official OpenCV tutorial for Python is not that in detail. Only several CPU-based keypoint detection algorithms are mentioned in the document, which results in the high usage of CPU and long time consumption. However, when I get into the OpenCV C++ document, I found several implementations of functions utilizing GPU’s power.

So in this article, I note the compile process and give some examples of the warped OpenCV functions in Python. Leveraging the hardware acceleration, i.e., the GPU implementation of the SURF algorithm gets about 5.5 times faster than the CPU implementation.

Ⅰ. Preparation

OpenCV stands for Open Source Computer Vision Library. It is written in C++ and provides API for C++, Python, MATLAB, etc. Aiming for the GPU acceleration, some dependencies are needed here.

Dependences

The OpenCV in this article is compiled with:

Name	Description
Visual Studio 2017	runtime
CMake 3.19.2 win64	configure
MSDK2020R1	Intel Quick Sync hardware video encoder/decoder
CUDA 11.0.2	CUDA binaries
cuDNN	DNN
Video Codec SDK 11.0.10	NVIDIA Video Codec SDK
OpenCV 4.5.1	OpenCV source main
OpenCV_contrib 4.5.1	OpenCV source contrib
Anaconda	Python environment

1. Install Anaconda

Remember to choose Register Anaconda as my default Python.
After the installation, install numpy before compiling OpenCV.
```
conda install -y numpy
```

2. Install Visual Studio 2017

Choose to install “Desktop development with C++”.

3. Install CUDA and cuDNN

Install CUDA with default settings.
Unzip cuDNN, copy the bin, include and lib folder to where you install CUDA, default path is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0.

4. Install CMake

Choose “Add Cmake to the system PATH for the current user”

5. Prepare OpenCV source

OpenCV source main is a self-extract file. Extract to some folder, i.e., C:\, the exe file will create the folder C:\opencv .
Unzip OpenCV source contrib.
Copy the modules to C:\opencv\modules.

6. Install Intel Media SDK

The Intel Media SDK provides the hardware acceleration on Fourcc MPG2, H264, X264, AVC , H265, or HEVC. The video can be decoded/ encoded by cv2.CAP_INTEL_MFX.

Install MSDK2020R1 with default settings.

7. Integrate Nvidia Codec SDK

With Nvidia Codec SDK, we can use cv2.cudacodec.VideoReader() to help us decode the video.

Unzip the zip file
Copy Lib/x64/nvcuvid.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64, copy Lib/Win32/nvcuvid.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\Win32
Copy the .h files in folder Interface to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include

Ⅱ. Build OpenCV

We need to build OpenCV with several steps.

1. Configuration

Choose the source and build folder.
Click Configure.

2. Configuration

Choose the generator and the platform.
The configuration would take a while.

3. Configuration

Check the parameters in the table:

Name	Value
ENABLE_FAST_MATH	ON
INSTALL_CREATE_DISTRIB	ON
OPENCV_ENABLE_NONFREE	ON
OPENCV_EXTRA_MODULES_PATH	C:/opencv/modules
WITH_CUDA	ON
WITH_MFX	ON
WITH_OPENGL	ON
WITH_OPENMP	ON

Click Configure again.

4. Configuration

When CMake is busy with configuration, we can check the GPU Compute Capability. For a shorter compilation process, we can specify the compute capability of the GPU. For example, the compute capability of Geforce RTX 2070 is 7.5.

Name	Value
CUDA_ARCH_BIN	7.5
CUDA_ARCH_PTX	7.5
CUDA_FAST_MATH	ON
WITH_NVCUVID	ON
BUILD_opencv_world	ON

Click Configure again.

5. Configuration

Check if NVCUVID and FAST MATH will be built.
Check if python3 is included in the OpenCV Modules.
Click Generate.
Go to C:\opencv\build, call command prompt by the address bar.

Compile with the command:

cmake.exe --build . --config Release --target INSTALL

This would take a long time. Once finished, modify the system environment variables:

Action	Variable	Value
Create	OPENCV_DIR	C:\opencv\build
Create	OPENCV_VER	451
Add	PATH	%OPENCV_DIR%\bin\Debug
Add	PATH	%OPENCV_DIR%\bin\Release
Add	PATH	%OPENCV_DIR%\install\x64\vc15\bin

Ⅲ. Python Bindings

Go to the site-package of Anaconda, the default path is C:\Users\{your name}\miniconda3\Lib\site-packages\cv2\python-3.8, change{your name} to your user name.
Copy the cv2.cp38-win_amd64.pyd to C:\Users\{your name}\miniconda3\Lib\site-packages\.
Now you can import cv2 in the Anaconda base environment.

Ⅳ. Short Python Examples

It is a bit weird that now we need to write the Python program together with the OpenCV C++ document. ;)

CUDA Decoder

After we compile the NVCUVID module, we can now use the cv2.cudacodec.VideoReader() function now. With this function, we can directly capture the frames via GPU. The captured frame is of type cv2.cuda_GpuMat.

A CUDA video reader can be:

1
2
3
4
5
6
7
8


video = cv2.cudacodec.createVideoReader(self.videoPath) # create a video reader

while True:
    status, currentFrame = video.nextFrame() # read the frames
    if status:
        {something you want to do}
        else:
            break

MediaSDK backend Decoder/Encoder

Once we have installed MSDK2020R1, we can call cv2.CAP_INTEL_MFX to help us decode or encode the video.

The media containers are still not supported by OpenCV, so before we use the Decoder, we need to transform the video to a certain format (.264, .h264, .mp2, .mpeg2, .265 or .hevc).

The official document introduces the method with FFmpeg tool.

# H264
ffmpeg -i video.avi -vcodec copy -an -bsf:v h264_mp4toannexb video.264
# H265
ffmpeg -i in.mkv -c:v copy -bsf hevc_mp4toannexb out.h265

After that, we can decode the video with cv2.CAP_INTEL_MFX.

1
2
3
4
5
6
7
8


video = cv2.VideoCapture('video.264', cv2.CAP_INTEL_MFX)
while True:
    status, frame = video.read()
    if status:
        cv2.imshow('video', frame)
        cv2.waitKey(40)
    else:
        break

We can also use the VideoWriter with cv2.CAP_INTEL_MFX.

1
2


fourcc = cv2.VideoWriter_fourcc(*'H264')
writer = cv2.VideoWriter(videoPath, cv2.CAP_INTEL_MFX, fourcc, fps, frameSize)

Keypoint Detection Algorithm with CUDA

OpenCV provides SURF and ORB algorithms with both CPU and GPU implementations.

SURF CPU
- the keypoints can be detected by:
  1 2
  
  surf = cv2.xfeatures2d.SURF_create(100) kp, des = surf.detectAndCompute(img, mask)
  The input image is numpy array. Both keypoints and descriptors are in list format.
SURF GPU
- the keypoints can be detected by:
  1 2 3
  
  surf = cv2.cuda.SURF_CUDA_create(100) gpu_kp, gpu_des = surf.detectWithDescriptors(gpu_img, gpu_mask) kp = surf.downloadKeypoints(gpu_kp)
  The input image is a cv2.cuda_GpuMat, the detected keypoints and descriptors are in GPU. The keypoints can be downloaded to the RAM via downloadKeypoints() method. The descriptors can be matched by GPU matcher or be downloaded to RAM via download() method.

I read 2 000 images three times on my computer with both algorithms, respectively. The result shows a significant improvement on GPU implementation.

	SURF (CPU) [sec]	SURF (GPU) [sec]
1	178.2835454940796	34.70819616317749
2	175.7257044315338	32.26314163208008
3	175.60720658302308	32.25424408912659

Ⅴ. Summary

OpenCV provides us many convenient functions. We can call them to finish the complex algorithms just with several lines.

With the GPU acceleration, we get a much faster algorithm with much lower CPU usage.

Since OpenCV 4 integrates the DNN module (possible with GPU backend), many Computer Vision algorithms can be called from DNN module, such as YOLOv4. These changes makes us more comfortable to enjoy the new computer vision technology, and integrate them into projects and improve our daily lives.

Thanks for everyone who contribute to the community!

Compile OpenCV with Python & GPU support & Examples

See Also: