ML for Marine Research

Possible MSc topics with machine learning for marine science. All topics will need to be developed with relevant experts in the field (co-advised).

Images

Learning a hierarchical metric

Often, labels have defined relationships to each other, for instance in a hierarchical taxonomy. E.g., ImageNet labels are derived from the WordNet graph, and biological species are taxonomically related, and can have similarities depending on life stage, sex, or other properties.

Here, we will develop a metric learning method that learns from data with hierarchical labels. Using ArcFace or similar, we will simultaneously learn to place representations to optimize the leaf label as well as intermediate labels on the path from leaf to root of the label tree. Performance as a function of choice of sharpness/temperature and margins (class-wise or level-wise) will be studied, and compared to existing methods.

Self-supervised object detection in video

(Co-supervisor from coastvision)

Use Selective Search on each frame (region prop). Filter results by weakly identify fish likelihood, and learn unsupervised representations (e.g., through DINO). Re-select the images through proximity to set of fish/non-fish images. Generate tracks based on similarity, movement, fishiness.

Benettino (2016): https://link.springer.com/chapter/10.1007/978-3-319-48881-3_56

Acoustics

Large-scale visualization of acoustic data

(Co-supervisor: Nils Olav)

Develop a method to easily browse and zoom of terabytes of data. The project requires to do dimensionality reduction and embed the representations in a lower space that is searchable (e.g., t-SNE). The project requires development of aggressive caching algorithms, paired with a web front end.

Denoising and noise detection in acoustic data

Echosounder data is an important source of information for fisheries and marine science, but data quality is sometimes compromised by noise. This task has two goals:
1) a system to detect the occurrence (and level) of noise, and
2) a filter that removes noise from the data.

Both systems can be trained on data with simulated noise added, and will be verified on real data through evaluation by acoustic interpretation experts at IMR.

School identification from omni-directional sonar data using machine learning techniques

(Co-supervisor: Hector Peña)

Omni directional sonars can search for and inspect fish schools in an volume around the ship, and is important for pelagic fisheries. Modern sonars produce digital output of the raw data, allowing its use for scientific purposes. The identification and segmentation of schools in sonar raw data is currently done in a module inside LSSS.

This project will analysze sonar data from the NVG May surveys from 2017 to 2020. During each 3-week survey, sonar data was collected continuously, with a frequency of ca. 1 ping per second. Sonar data contains backscattering data from 64 beams in each ping from a range of 600 around the vessel. The beams are electronically tilted covering a depth between the surface and 80 m. Raw data also contains position and time for each ping. Data has been scrutinized by experts and herring schools have been labeled with a list of parameters (acoustic and morphological) and geographical and temporal position.

The aim is to use machine learning techniques to develop a system for automatic detection of fish schools using the raw data.

Region-proposal networks for acoustic classification

Acoustic (echosounder) data is one of the most important data resources for management of fish and other marine resources. With new broadband echosounders (EK80), the volume and complexity of the data is enormous, and increased collection from autonomous platforms means that expert analysts are becoming the bottleneck.

In many cases, data is annotated as polygons assigned to fish species. The task is to use data with such annotations as the training set, and to design and implement a classifier using a region proposal network architecture (for instance, from the R-CNN family).

Learning acoustic target classification from simulation

(Co-supervisor: Babak)

Use simulated frequency spectra from geometric objects and train an ML model to estimate the geometric and material properties of objects. Also train on more complex objects simulated via FEA.

General

Learning from scratch: what is the minimum amount of data needed?

While deep learning systems perform extremely well given enough training data, adequate training data is often lacking. While raw data is often plentiful, annotations are costly, and if they exist at all, they tend to be less than ideal, i.e., of low quality and poorly organized.

We have had some degree of success bootstrapping classifers and object detectors with simulated data. The task here is to explore this and other avenues to training an object detection system with a minimum of annotated data.

Representation learning for classification or object detection

(Co-veil: Kim H/Tonje S)

While traditional classifiers work well with data that is labeled with disjoint classes and reasonably balanced class abundances, reality is often less clean. An alternative is to learn a vectors space embedding that reflects semantic relationships between objects, and deriving classes from this representation. This is especially useful for few-shot classification (i.e., very few examples in the training data).

The task here is to explore such methods on central data sets. Possibilities are underwater video surveillance (challenge: recognize new/rare species of fish), plankton (fuzzy categories and many rare species), or benthic video.

Teacher/student networks for fast profile search

Teacher/student learning, i.e., training a network (the student) with the aid of another, already trained network (the teacher), is a well known technique.

In many cases, there exists large collections of data (for instance images, video, or acoustics) that can be mined to answer a variety of questions. While a neural network classifier can be used to search over the data, high-quality classifiers often use substantial compute power.

The goal is to maximize search speed without sacrificing accuracy. To achieve this, we will use T/S learning from existing classifiers to train lightweight classifiers to act as filters. The filters will be designed to run with a minimal use of resources (computational speed and memory footprint) with a negligible loss of sensitivity (but we will allow false positives).

In addition, we will explore whether the filter can output an intermediate representation that can be used as an input representation to the high-quality classifier.

Publisert 12. okt. 2022 15:28 - Sist endret 12. okt. 2022 15:28

Veileder(e)

Adín Ramírez Rivera Universitetet i Oslo
Kjetil Malde