AliceVision: modern GPU programming for real-time tracking

Bildet kan inneholde: rektangel, linje, font, parallell, elektronisk apparat.

What is Visual tracking?

The human mind is very good at tracking moving things with our eyes over time. Unlike a computer, we don't see pixels in a video frame that need to be associated with each other in a complex manner. Humans make use of lots of clues that reduce the complexity of the tracking task, including the structure of the scene that we see and properties that are more relevant for tracking in a context. Computers cannot achieve that in the general case in real-time today.

So, to allow computers to track objects in real-time without a lot of computation overhead, it is quite typical to put tags onto the objects. In computer vision terminology, these are called fiducial markers.

On Github, you can finds tools such as ARTag, AprilTag and CCTag. These are simple geometric shapes that can be found with very little computational overhead in every frame of a video, and thereby tracked over time.

The applications for doing this in real-time are many. Frequently, the tags are used to understand how a human operator moves the camera through a room. That can be very useful for virtual reality applications. Another application is the introduction of virtual objects onto a moving object in the real world, which is really useful for augmented reality applications. Or the marker can be stuck onto objects that are moved around in a larger physical space, which is useful for robots that grab and reposition these objects.

In our lab, we have a lot of use for such tags, and we have used CCTags in the past. We have even read a paper about them.

Environment

The research for this thesis is conducted in the SINLab at IFI. In the lab, we have a UR10e robot (that you can already see in our lab) with a 5-finger Shadow Hand (which will arrive in Oslo in October).

This robot will be used by a remote person to grab an object that they see in VR. That makes the tags on the object important for two things: (a) show the object's current position correctly in VR, and (b) make sure that the robot doesn't break it because the remote person makes a mistake.

Specific question for this thesis

The task of the thesis is to compare the suitability of modern approach for GPGPU programming in the context of CCTags for robot interaction, to create an easily maintainable modern version of the CCTags.

CCTags code was originally written with Intel's Threading Building Blocks (TBB) to gain performance from multi-threading on the CPU and with CUDA for NVidia GPUs with Compute Capability 3, backup code to work on computers without NVidia GPU.

Both of these are today very outdated and have been replaced by several newer paradigms. These include:

  • OpenACC, which uses hints inside source code to tell a compiler about the programmer's preferred parallelization option, including SIMD-programming, multithread, and offloading to the GPU. The programmer users pragma statements for these hints.
  • Khronos SYCL, which also uses hints inside C++ programs. Inside the code, it appears as a class library for a modern C++ compiler, although it requires advanced compiler features.

In the thesis, the CCTags algorithm will be re-written using both of the two modern programming paradigms. This will allow a side-by-side comparison of the two new approaches and a comparison with the CPU (Intel OpenMP) and GPU (CUDA) baselines.

Incidentally, the work will also provide the open-source community with a platform-independent GPU-accelerated version of CCTags.

Learning outcome

Experience in

  • in formulating, investigating and answering research questions
  • gain experience with several modern techniques for GPGPU programming for real-time tasks
  • have gained insights into non-ML real-time video and image processing tasks
  • conducting, evaluating and interpreting experiments

Conditions

We expect that you:

  • have been admitted to a master's program in MatNat@UiO - primarily PROSA
  • take this as a long thesis
  • will participate actively in the weekly SINLab meetings
  • are present in the lab and collaborate with other students and staff
  • are interested in and have some knowledge of C++ programming
  • are willing to share your results on Github
  • include the course IN5050 in the study plan
  • include the course IN5060 in the study plan, unless you have already completed a course on classical (non-ML) data analysis
Publisert 2. okt. 2023 15:36 - Sist endret 4. okt. 2023 11:36

Veileder(e)

Omfang (studiepoeng)

60