AliceVision: SIFT with modern GPU programming techniques

Bildet kan inneholde: vann, urbant design, rektangel, skråningen, font.

What is Natural feature extraction?

The human mind is very good at tracking moving things with our eyes over time. Unlike a computer, we don't see pixels in a video frame that need to be associated with each other in a complex manner. Humans make use of lots of clues that reduce the complexity of the tracking task, including the structure of the scene that we see and properties that are more relevant for tracking in a context. The computer, however, must reconstruct things by interpreting groups of pixels, and several approaches for this rely on the identification of "special" pixels in video frames that represent a point in space that can be easily found again between frames (or even entirely different images). If the recorded scene is not prepared with any special images, this is done by finding pixels in an image that are "naturally" special; for example the tip of a flagpole or the center of an empty table - an high-res video frame will usually contain a few 10s of thousands of such points. They are call its natural feature points.

Generally, a single desktop computer cannot extract all of these natural feature points from a video stream in real-time today.

However, with AliceVision PopSift, we created an open-source software that was capable of extracting natural features from the frames of a 1K video stream in real-time on a the NVidia GPU GTX980; and GPU hardware is much more advanced now.

SIFT is a very famous, general natural feature extractor that has remained nearly unbeaten since its inception in 2004. Although machine-learning based approaches can regulary achieve better performance in restricted cases, SIFT is still more successful if no restrictions are applied.

Our PopSift implementation, however, shows its age. The code was written for NVidia CUDA GPUs of Compute Capability 3.5 and does not make use of younger GPUs' features, and it is most certainly not vendor-independent. This vendor lock-in is the main reason that PopSift is not part of default build of AliceVision.

Environment

The research for this thesis is conducted in the SINLab at IFI. In the lab, we have a UR10e robot (that you can already see in our lab) with a 5-finger Shadow Hand.

This robot will be used by a remote person to grab an object that they see in VR. A base condition is for this is the 3D reconstruction of the remote environment, and AliceVision Meshroom is the intended tool. PopSift is an important means of reducing the computation time for the 3D model.

Specific question for this thesis

The task of the thesis is to compare the modernization of PopSift with modern C++-based languages that make General Purpose GPU programming portable to all CPUs and GPUs and avoid future vendor lock-in. The current contenders for portable code at OpenACC and SYCL.

OpenACC uses hints inside source code to tell a compiler about the programmer's preferred parallelization option, including SIMD-programming, multithread, and offloading to the GPU. The programmer users pragma statements for these hints.
Khronos SYCL also uses hints inside C++ programs. Inside the code, it appears as a class library for a modern C++ compiler, although it requires advanced compiler features.

While both tools are heavily promoted, OpenACC by NVidia (among others) and SYCL by Intel, both admit openly that the resulting GPU code cannot achieve the performance of vendor-specific toolsets (like NVidia CUDA) because they are currently not able to use hardware-specific features (like the texture engines of the NVidia GPUs).

In the thesis, PopSift will first be pruned to remove a unnecessary options (among other things, PopSift implements 8 approaches for creating the image pyramid but only one at a time is actually used). After this, a CPU-only version of PopSift will be created by turning PopSift's CUDA kernels into lambdas.

Finally, the parallelisation opportunities of OpenACC and SYCL will be explored base on this C++ code, and benchmarked against the CUDA-only and CPU-only versions to gain a detailed understanding of the performance bottlenecks of the portable languages.

Incidentally, the work will also provide the open-source community with a platform-independent GPU-accelerated version of PopSift.

Learning outcome

Experience in

in formulating, investigating and answering research questions
gain experience with several modern techniques for GPGPU programming for real-time tasks
have gained insights into non-ML real-time video and image processing tasks
conducting, evaluating and interpreting experiments

Conditions

We expect that you:

have been admitted to a master's program in MatNat@UiO - primarily PROSA
take this as a long thesis
will participate actively in the weekly SINLab meetings
are present in the lab and collaborate with other students and staff
are interested in and have some knowledge of C++ programming
are willing to share your results on Github
include the course IN5050 in the study plan
include the course IN5060 in the study plan, unless you have already completed a course on classical (non-ML) data analysis

Publisert 2. okt. 2023 16:46 - Sist endret 4. okt. 2023 11:34

Veileder(e)

Carsten Griwodz Universitetet i Oslo
Håkon Kvale Stensland Universitetet i Oslo