In-network Acceleration of Distributed In-Memory Machine Learning

This master project will investigate novel use of emerging programmable network cards for improving the performance of the distributed in-memory machine learning.

Big data is a combination of structured, semistructured and unstructured data that can be mined for information and used in machine learning projects. However, applying machine learning algorithms to massive datasets is challenging because most of the top machine learning algorithms are not designed for parallel architectures. Apache Spark/MLlib is among one of the few Big data frameworks for parallel computing that provides machine learning algorithms on large datasets in cluster memory without having to sync multiple times to the disk, making them run faster. However, one of the main overheads of distributing the data over the memories of many machines comes from network communication. Moreover, performing machine learning algorithms has overheads on the CPU.

Goal

In order to optimise the network communication and offload computation from the main CPU, In this thesis, we are going to use a special type of network card which is equipped with an Arm processor to enhance the performance of the Spark machine learning algorithms. The result of this master project will have the potential of improving all the systems that require machine learning.

Learning outcome

In this thesis, you will learn state-of-the-art technologies that are exploited in a large-scale datacenter. Specifically, you will learn how to apply machine learning algorithms on large datasets.

Qualifications

It is beneficial, but not mandatory, if you have some knowledge on networking and operating systems as well as C/C++ programming. Moreover, you will get support to organise the work and implement new systems through programmable network cards.

Work Place

Simula will provide the appropriate equipment and the work place for the student. Here is the link to project description at Simula.

 

Publisert 27. sep. 2022 09:29 - Sist endret 27. sep. 2022 09:40

Veileder(e)

Omfang (studiepoeng)

60