Disputation: Erik Bryhn Myklebust

Doctoral candidate Erik Bryhn Myklebust at the Department of Informatics, Faculty of Mathematics and Natural Sciences, is defending the thesis Ecotoxicological Effect Prediction using a Tailored Knowledge Graph

for the degree of Philosophiae Doctor.

Time and place: Oct. 3, 2022 1:15 PM, Kristen Nygaards sal (5370), Ole-Johan Dahls hus / Zoom

Photo: Private

The PhD defence will be partially digital, in Kristen Nygaards sal (5370), Ole-Johan Dahls hus and streamed directly using Zoom. The host of the session will moderate the technicalities while the chair of the defence will moderate the disputation.

Ex auditorio questions: the chair of the defence will invite the attending audience at Kristen Nygaards sal to ask ex auditorio questions.

Trial lecture

"Biological relationship extraction"

Monday 3rd October 2022, 11:15 am, in Kristen Nygaards sal (5370), Ole-Johan Dahls hus / Zoom

Main research findings

The assessment of the health of ecosystems is of great concern to gain insight into the impact of human activities.
The health of an ecosystem can be looked as the sum of the impact on organisms occupying this ecosystem. The largest concern is the impact released chemical have on these organisms. The chemical concentrations in the environment can be measured, however, we need to compare these to reference points. These reference points concern the chemical concentrations and how this impacts organisms. Normally, these references are gathered through laboratory experiments where organism are exposed to increasing concentrations of toxic chemicals. These experiments have both an ethical and monetary cost and a large scientific community is working on decreasing the use of test organisms and valuable laboratory resources.

Methods for estimating the toxicity on an organism have been developed, these can be put into two categories, QSARs or Read-Across. QSAR or Quantitative structure-activity relationship methods looks at properties of each chemical to determine what and how it has a toxic impact on an organisms. Read-Across on the other hand can look at larger groups of chemical or species and determine larger level similarities and in this way use existing data for a group of species (or chemicals) to fill in the blanks in our knowledge.

This thesis investigates a hybrid approach of the two methods described above by introducing background knowledge into the modelling methods. This background knowledge takes the form of a knowledge graph which is a collection of facts. These facts are expressing in a way that is both machine and human readable. This knowledge graph contains facts about species, chemicals, and existing laboratory experiments, as well as large amounts of metadata related to these.

Knowledge graph are symbolic knowledge which is not ideal for using in modelling methods, such as machine learning, where numerical values are necessary. Therefore, we employ knowledge graph embedding methods. The task of these methods is to turn entities (\eg a chemical) in the knowledge graph into numerical representations in the form of a vector. These methods take the structure of the knowledge graph into account and tries to preserve it as well as possible in the numerical representation.

Now that the knowledge graph is represented numerically, we can learn relationships between the representations species and chemicals and the toxic effect the latter has on the former. We found that by including the background knowledge in this modelling method we where able to increase the prediction accuracy over a method using chemical and taxonomic similarity. These modelling methods used are inherently difficult to explain, that is, the relationships that is learned from the data can be very complex and it is impossible to derive these is a simple way. Therefore, we use the knowledge graph in a few ways to increase our understanding of how the model makes predictions. We look at how much data is available in the knowledge graph, and it turns out that the amount of data correlates to the error of each prediction. Albeit not surprising, an interesting result. We also employ the knowledge graph in providing facts which are relevant for the prediction. These facts can be presented to a domain expert which can make conclusions on areas of knowledge that is lacking and needs to be expanded on.

We have demonstrated, using machine learning and knowledge graphs, that large scale, generic models for toxicity is possible to develop. Moreover, large areas of this field remains unknown and further research is needed to increase robustness and explainability of models.

Adjudication committee:

Associate Professor Claudia d’Amato, Computer Science Department University of Bari, Italy
Environmental Health Data Scientist Sean Mackey Watford, US EPA Office of Research and Development, Center for Public, Washington, USA
Professor Martin Steffen, University of Oslo, Department of informatics Norway

Supervisors

Professor Martin Giese, Department of Informatics, UiO
Dr. Ernesto Jimenez-Ruiz, University of London
Dr. Raoul Wolf, NIVA, Norway
Professor Knut Erik Tollefsen, NIVA, Norway
Senior Researcher Jiaoyan Chen, University of Oxford

Chair of defence:

Professor Stephan Oepen

Candidate contact information: https://www.researchgate.net/profile/Erik-Myklebust

Contact information at Department: Mozhdeh Sheibani Harat

Published Sep. 19, 2022 9:00 AM - Last modified Oct. 3, 2022 10:38 AM