Supervisors and topics for master's theses in statistics

As part of his or her master study, a student should write a thesis. There are two options for the thesis: a long thesis corresponding to one year full time work, and a short thesis corresponding to one semester full time work. The work on a long thesis typically starts in the second semester of the master's study (as described here), while the work on a short thesis should be done during the last semester of the master study. Below we give an overview of possible supervisors and topic for the master's thesis. 

Riccardo De Bin

My main field of research is statistical learning (STK-IN4300), with a focus on methodological issues related to high-dimensional data. Potential topics for a master project are related to statistical boosting, and could either a methodological project or more applied in collaboration with industry (forecasting of energy price with boosting).

Examples of master's theses that I have supervised:

Vegard Stikbakke. A boosting algorithm to extend first-hitting-time models to a high-dimensional survival setting. Long thesis, 2019.

Jonas Gjesvik. Statistical modelling of Goalkeepers in the Norwegian Tippeliga. Long thesis, 2019.

Lars H. B. Olsen. Likelihood-Based Boosting: Approximate Confidence Bands and Intervals for Generalized Additive Models. Long thesis, 2020.

Ingrid Kristine Glad

My main research interest is in developing methodology tailored for the analysis of high dimensional data, with links to topics in f.ex. STK-IN4300. From 2023 I am co-directing the new Centre of Excellence Integreat, in which we will work with integrating various types of knowledge into machine learning methodology. This is an emerging field of research in which it will be possible to define many interesting master projects.

I am also working with high dimensional genomic data connected to cancer research. I can offer master projects related to statistical analysis of multiomic data, where the high dimensionality of the data leads to interesting methodological challenges. The aim is often to integrate different types of genomic measurements in order to understand cancer risk, predict the effect of therapy or survival of cancer patients.

I am also heavily involved in the research for innovation center for big and complex data problems, BigInsight, as researcher and co-director. For me, it is most actual to supervise BigInsight-projects related to high dimensional, high frequency sensor data, mainly from the maritime sector, see the BigInsight web pages (Innovation Objective Sensor Systems), in collaboration with Morten Stakkeland (UiO/ABB) and Erik Vanem (UiO/DNV). BigInsight terminates in December 2024, but if somebody is really interested, it should be possible to arrange something connected to this.

Examples of master's theses that I have supervised:

Martin Tveten: Multi-Stream Sequential Change Detection Using Sparsity and Dimension Reduction, long thesis, 2017

Camilla Lingjærde: Tailored Graphical Lasso for Data Integration in Gene Network Reconstruction, long thesis, 2019

Nikola Kaletka: Effects of prior information on monotonicity directions in additive monotone regression, long thesis, 2021

Anders Kielland: Integrating Biological Domain Knowledge in Machine Learning Models for Cancer Precision Medicine, long thesis, 2023

Nils Lid Hjort

I work in several areas of theoretical and applied statistics, with some key words being model building, model selection and model averaging, confidence distributions, estimation theory, survival analysis, Bayesian nonparametrics, stability and change. I led the research project group FocuStat with several PostDocs, PhDs, and Master level students, from 2014 to 2018, and several of our themes are continued, in partly new directions. Two projects, flowing from FocuStat themes, are From Processes to Models (ways of constructing better models for data) and Stability and Change (theory for finding changes, and conditions for stability, with application to war-and-peace data).

You may check the FocuStat webpage, including blog posts, with various themes that may also lead to Master thesis projects.

The majority of my students are working on the theoretical side of the spectrum, but from time to time I also supervise more applied projects (examples being recommender systems for finn.no; analysis of track and field data; examination of the forensic information used to convict Fredrik Fasting Torgersen for murder in 1958; the keeper's role in football matches; Markov chains for modelling escalation in armed conflicts).

With Håvard Hegre at the Fredsforskningsinstituttet (PRIO) I have recently led an international cross-discplinary research group at the Centre of Advanced Study, Academy of Sciences, Oslo, from August '22 to June '23. A few dimensions of our Stability and Change project will continue. In particular, we study various conflict-and-war processes, over time. The aim is to develop models and analysis methods for such processes, which are partly characterised by heavy power-law right tails, statistically speaking. There are master thesis possibilities in these general directions.

If interested, check the list of Master- and PhD-students at the FocuStat website, which includes brief descriptions of and links to their projects and theses.

Ingrid Hobæk Haff

My main field of research over the past years has been multivariate modelling, in particular copulas, with applications within for instance finance, insurance and climate. I am also involved in BigInsight, mainly within personalised fraud detection, as well as in the convergence environment ImmunoLingo, a transdisciplinarily project, whose aim is to decipher the molecular language of adaptive immunity. These projects are now ending, but I may still propose master thesis topics from these.

Another topic I am involved in is generation of synthetic data, which is related to my interest in copula models. One setting where synthetic data are useful, is when the original data are sensitive, and therefore cannot be published, which is often the case for instance in medical applications. It is then particularly important to preserve the privacy of the patients in the synthetic data, which is typically done by introducing noise when fitting the synthetic data generator. The more noise is introduced, the better the privacy preservation, but on the other, if there is too much noise, the resulting synthetic data are not useful, i.e. they have little utility. The aim is then to construct the synthetic data generator in such a way that it balances these two concerns as well as possible. I may also offer master thesis projects within this topic.

Examples of master's theses that I have supervised:

Anna Skovbæk Mortensen: Fraud detection using copula regression. Short thesis, 2021.

Håkon Bliksås Carlsen: Studying the application of semi-supervised learning for fraud
detection
. Long thesis, 2021.

Shuijing Liao: The application of penalized logistic regression for fraud detection Studying measures of prediction performance for class imbalanced and high-dimensional data. Long thesis, 2022.

Alaa Ayoub: Discovering interactions that affect immune recognization using logic regression. Long thesis, 2023.

Aliaksandr Hubin

I am mainly specializing in Bayesian model selection, averaging, and automatic model configuration in complex regression contexts ranging from simple linear models to highly nonlinear Bayesian logic regressions (BLR) [1], Bayesian generalized nonlinear models (BGNLM) [2] and Bayesian neural networks (BNN) [3]. Potential master projects with me could be on further developing variational inference within BGNLM  [2]. Alternatively, we can work on further developing subsampling in MCMC [4] for model selection. The third topic is developing Bayesian additive regression trees (BART) as special cases of BGNLM [2]. The last direction is related to BNN [2], here the topics are a. developing interpretable BNN [3] through full structure learning allowing to infer the level of nonlinearity per covariate and the importance of covariates overall; b. developing space state BNN through modeling weights as random effects with a flexible spatio-temporal structure based on normalizing flows. 

Examples of  some master theses I have supervised or co-supervised:

Lachmann, Jon. "Subsampling Strategies for Bayesian Variable Selection and Model Averaging in GLM and BGNLM." (2021). @Stockholm University, 30 ECTS

Skaaret-Lund, Lars. "Improving latent binary Bayesian neural networks using the local reparametrization trick and normalizing flows." (2022). @University of Oslo, 60 ECTS

Hauglie Sommerfelt, Philip Sebastian. "Combining Variational Bayes and GMJMCMC for Scalable Inference on Bayesian Generalized Nonlinear Models." (2023). @University of Oslo, 60 ECTS 

Ellingsen, Herman. "Outlier Detection in Bayesian Neural Networks." (2023). @NMBU, 30 ECTS 

Abidi, Osama. "Using Graph Bayesian Neural Networks for fraud pattern detection and classification from bank transactions data." (2023). @NMBU, 30 ECTS 

References:

  • [1] Hubin, Aliaksandr, Geir Storvik, and Florian Frommlet. "A novel algorithmic approach to Bayesian logic regression (with discussion)." Bayesian Analysis  (2020)
  • [2] Hubin, Aliaksandr, Geir Storvik, and Florian Frommlet. "Flexible Bayesian Nonlinear Model Configuration." Journal of Artificial Intelligence Research (2021).
  • [3] Hubin, Aliaksandr, and Geir Storvik. "Variational Inference for Bayesian Neural Networks under Model and Parameter Uncertainty."  arXiv:2305.00934 (2023).
  • [4] Lachmann, Jon, Geir Storvik, Florian Frommlet, and Aliaksandr Hubin. "A subsampling approach for Bayesian model selection." International Journal of Approximate Reasoning (2022)

Odd Kolbjørnsen

My main fields of research are Bayesian statistics and machine learning, utilizing methods of computational statistics (STK 4051). Phrasing methods from deep learning in the framework of Bayesian statistics provides a rigorous interpretation of knowledge and uncertainty.  

One topic of research is inverse problems. Inverse problems arise from indirect measurements, and have applications in medical imaging, remote sensing and for geophysical data.  A typical example would be to reconstruct a high resolution image from a low resolution image. The interesting part is that the solution is non-unique, and we will investigate how to use methods from deep learning to explore the space of possible solution, perhaps using methodology from ( Sanchis and Kolbjørnsen, 2023).  
Another topic of research is that of Bayesian Neural Nets. The approach of function approximation utilizing random functions (Rahimi and Rech 2008), has recently been studied for deep neural nets (Bolager et al 2023).  We want to extend this approach in a setting where weights are trained in a Bayesian framework.

For all master-projects I supervise I recommend courses, STK4021STK4051, and IN4310 for background, but this is a topic we can discuss.

Examples of master's theses that I have supervised:

Marius Aasan, Invertible and Pseudo-Invertible Encoders: An Approach to Inverse Problems with Neural Networks  Long Thesis 2021

Torgeir Ladstein Waagbø, Autoregressive Generative Models with Applications to Super Resolution , Long Thesis 2023

References:

C. Semin-Sanchis and O. Kolbjørnsen, "Sampling-Free Bayesian Inference for Local Refinement in Linear Inversion Problems With a Latent Target Property," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-14, 2023

Ali Rahimi and Benjamin Recht. Uniform approximation of functions with random bases. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pages 555–561, Monticello, IL, USA, September 2008. IEEE. ISBN 978-1-4244-2925-7. 

Erik Lien Bolager, Iryna Burak, Chinmay V. Datar, Qing Sun, and Felix Dietrich, 2023, Sampling weights of deep neural networks ArXiv: abs/2306.16830

Johan Pensar

My research interest is in the field of statistical machine learning and, in particular, a class of models called probabilistic graphical models (PGMs). PGMs are used to model the joint distribution over a collection of variables and a key idea is to use a graph structure to encode how the variables in the system interact. The dependence structure implied by the graph can then be used to break down the potentially very high-dimensional joint distribution into smaller components. The key learning task related to PGM is to infer the graph structure from data, known as structure learning. I am particularly interested in developing new algorithms for structure learning for various classes of PGMs and using causal graphical models for identifying and estimating causal relationships.

An overview of available Master's thesis projects can be found here.

Sven Ove Samuelsen

My field is event history analysis (cf. STK4080) and in particular I have been interested in theoretical developments and applications in epidemiology. Some of the master’s theses I have supervised have had a focus on developments of case-control studies and other epidemiological designs with a time to event perspective, this then includes development of statistical analysis to use for such designs. Others master’s theses have been more directly connected to analyses of specific epidemiological data sets. The problems and data sets for the master’s theses often turn up in connection with collaborations with researchers at the Norwegian Institute of Public Health where I work part time or Norwegian Cancer Registry where I have several collaborators.  

Examples of master's theses that I have supervised:

Marthe Kittilsviken Tendrup: Two-phase valdidation correction: A simulation study for robust variance estimation. Long thesis 2022.

Graeme Lawrence: Bacteraemia and Cardiovascular Disease. Long thesis 2020.

Morten Madshus: A Match Too Much? - A simulation study on overmatching in nested case-control studies. Long thesis 2019

Simon Lergenmuller: Two-stage predictor substitution for time-to-event data. Long thesis, 2017.

Ida Scheel

My research themes are often connected to some type of Bayesian modelling and analysis, motivated by applications. I am involved in the new centre of excellence Integreat, which will be a centre for Knowledge-driven Machine Learning, and for many years I have been a participant in the research-based Innovation (SFI) BigInsight, where I have mainly worked on projects related to the innovation objective 'Personalized Marketing'. One of the projects concerns a Bayesian recommender system for clicking data. ‘Clicking data’, the history that consumers have of clicking on webpages, reveal their preferences. Such data arise in very many areas in the digital world, including web-based businesses such as streaming services for movies/series (e.g. Netflix, NRK) and music (e.g. Spotify), e-commerce web sites (e.g. amazon.com) and marketplaces for second hand items, job adverts, and real estate (e.g. finn.no). A recommender system aims at making personalised recommendations based on the clicking/preference history of the consumer and other consumers. Methodological challenges include modelling, scalability of algorithms, multiple optimisation criteria (consumer/business perspectives, diversity vs popularity driven,...), inconsistent data. More purely theoretically motivated is my interest in model diagnostics, mainly concerning checking for possible modelling conflicts at the node-level of (Bayesian) hierarchical models.

Examples of master’s theses I have supervised:

Haakon Muggerud: Ranked-based Bayesian clustering and variable selection for high-dimensional data. Long thesis, 2023

Øystein Skauli: Modelling Short Term Changes in User Interest for Online Marketplaces. Long thesis, 2021

Jonas Fredrik Schenkel: Collaborative Filtering for Implicit Feedback: Investigating how to improve NRK TV's recommender system by including context . Long thesis, 2017

 

Geir Storvik

My main fields of research are (Bayesian) computational statistics and Monte Carlo methods. Sequential Monte Carlo methods in which information needs to be updated as new data is arriving is of particular interest (a method that is currently used for estimation of the Norwegian reproduction rate for the Covid-19 pandemic). I am also interested in Bayesian versions of Neural networks type regression models.

I am involved in  BigInsight, a center for Research-based Innovation (SFI), in particular the sensor project where huge amounts of multivariate sensor data needs to be analyzed.

For all master-projects I supervise, STK4021STK4051 are useful background courses

Possible master-projects:

  • Bayesian methods in Neural networks: Bayesian methods have the advantage of both opening up for incorporation of prior knowledge and avoid overfitting. However, both specification of appropriate priors and computational issues are problematic. This project aim at exploring different approaches on using Bayesian methods within Neural networks and, hopefully, propose further improvements.
  • Sequential Monte Carlo for high-dimensional settings where in particular online analysis of sensor data is of interest.

Examples of master's theses I have supervised:

Fredrik Lundvall Wollbraaten: Parallel subsampling MCMC and the Perturbed Subset Parameter Approximation. Long thesis, 2020.

Kjersti Moss: The Poisson-Binomial Model for Fish Abundance Estimation: With Applications to Northeast Arctic cod. Long thesis, 2015.

Henrik Nyhus: Spatio-temporal analysis of cod catch data from the Barents Sea using INLA. Long thesis, 2014

Thordis Thorarinsdottir 

My research interests are forecast evaluation and the modelling of randomness and quantification of uncertainty in natural processes, with links to topics in STK4021 and STK4150. I am a co-leader of the center for research-based innovation Climate Futures and I collaborate with scientists at the Norwegian Meteorological Institute and the Norwegian Water Resources and Energy Directorate. 

My work has a methodological focus, often with environmental applications in mind. We build new methods or models for situations for which no applicable methods are available, or where current methods are lacking. Sometimes this requires developing new theory or theoretical frameworks; sometimes we need to discover new connections to understand how available methods may be applied in new settings.

Possible topics for master theses include:

  • Bayesian hierarchical modelling of extremes and non-stationary modelling of extremes in space-time 
  • Multivariate modelling of extremes with applications in weather and climate
  • Methods for forecast evaluation for high-dimensional forecasts 

Examples of master's theses I have supervised:

Silius M. Vandeskog: Modelling diurnal temperature range in Norway. At NTNU joint with Ingelin Steinsland, 2019. 

Silje Hindenes: Comparison between a mixture of exponential distributions and the generalized Pareto distributions for flood frequency analysis. At NTNU joint with Ingelin Steinsland, 2017. 

 

     

    Publisert 29. sep. 2020 11:17 - Sist endret 18. okt. 2023 19:11