Dag Tjøstheim: Statistical embedding: Beyond principal components

Image may contain: Person, Forehead, Nose, Cheek, Smile.

There  has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. I survey this activity in four parts. In the first part I cover nonlinear methods such as principal curves, multidimensional scaling, ISOMAP; graph based methods and kernel based methods. The second part is concerned with topological embedding methods such as persistence diagrams. Network data are considered in the third part. The task is to embed such data in a vector space such that traditional cluster and classification techniques can be used. In the final part visualization is treated mentioning t-SNE; UMAP and LargeVis. The methods are illustrated and compared on two simulated data sets, one consisting of a triplet of noisy Ranunculoid curves, and one consisting of networks of increasing complexity generated with stochastic block models. This talk is based on joint work with Anders Løland and Martin Jullum at the Norwegian Computing Center.

Dag Tjøstheim is Professor Emeritus at the University of Bergen. He is internationally well-known in particular for his contributions to the theory and methods of time-series, but has more generally worked on stochastic processes and dependent data. Tjøstheim was the first recipient of the Sverdrup-prize.

Published Nov. 1, 2022 1:05 PM - Last modified Nov. 3, 2022 10:16 AM