Oppgaven er ikke lenger tilgjengelig

Performance Analysis of Job-Scheduling in Multi-User Hadoop Clusters

Abstract

The last decade has shown a drastic increase in data generation on computer systems and on the Internet. With an increasing number of Internet users the combined data available online is enormous. While search engines try to index all the worlds information available online and business try to get insight in user patterns across their data systems to provide a better understanding of their users. With this rapid increase of data available a system to process all this information is needed. Building large cluster to processes these amounts of data can be costly. With the size of clusters reaching the thousands computers, therefore companies and institutions look to colocate data on a single processing cluster. Optimizing the efficiency of a processing cluster is wanted.

Apache Hadoop started as a open source project designed to process data in a large scale with reliability and scalability in mind. this project started out as a batch processing framework to process single jobs at a time, recent development have turned to sharing Hadoop clusters with other people to utilize the available resources within a Hadoop cluster. A few Task Schedulers have been written to facilitate this need. Languages such as Pig and Hive have been developed to quickly write analytic queries over data.

This thesis will look at the feasibility of running a processing cluster with multiple concurrent jobs running on a single cluster. It will provide a description of the system set up and shed some light upon the different obstacles that occurs when using each Task Scheduler. What can you expect in terms of performance from a system shared between multiple users.

Findings from experiments indicate that there are considerable differences between the different task schedulers benchmarked in this thesis. So, choosing a task scheduler that fits your needs is essential, as differences have a large impact on performance

Publisert 4. okt. 2017 11:39 - Sist endret 4. okt. 2017 11:39

Veileder(e)

Student(er)

  • Joachim Seilfaldet

Omfang (studiepoeng)

60