Stein Vidar Hagfors Haugan The anatomy of a large science data archive

Fredagskollokvium

Abstract

A few years from now, the Hinode Science Data Centre (SDC) at ITA will contain at least 12 million observations(*) from the Hinode satellite. In comparison, the 10-year SOHO archive contains 4.7 million files. Systems handling such amounts of information easily get sluggish, making "data mining" almost unbearable due to long search times and limited information about each observation. Our aim has been to avoid this.

During the last 8 months, we have developed a system that will handle the expected volume quite gracefully, and it is now essentially ready for use.

I will explain the processes taking the data from Japan to the end user, with focus on the development steps, the tools, the essence of databases, and the tricks and treats necessary to tackle such a project. In short: How do you go about designing a large, flexible, and fast data archive for public use?

Time permitting, I will also look at exciting possibilities for the future of the Hinode archive and the data centre.

(*) We are likely to have an additional 100 million PNG/JPEG files!

Publisert 12. aug. 2009 09:23 - Sist endret 15. juni 2011 13:49