Rubbish or quality science? Quality control tool of research papers in your hands

For making great science, a quality assurance of existing knowledge present in Portable Document Format (PDF) is imperative.  A literature review or so-called meta-analysis can then be performed to summarize the knowledge. However, before trusting the results, scientists are obliged to read the paper to find out if the article represents good quality to be further included. Often this is done by using guidelines when researchers look for the presence of certain elements in the publication*. This process is time consuming and frustrating. A user-friendly tool that speeds up or completes this process will be of great asset for scientific personnel.

The main objectives of this computational master thesis are the following:

  1. Make an overview of existing frameworks for estimating quality of articles (human health domain, cancer, toxicology)
  2. Using the SciRAP* approach for evaluating the reliability and relevance of data and other approaches, build an ontology for extracting information from the PDF for scoring purpose.
  3. Using Python or R, build user friendly software to extract information needed from PDF files.
  4. Build an ontology for better extraction of text from pdf format.
  5. Application should then be placed on public servers.

With this thesis you will get an overview of how the QC for scientific paper within domain biology, human disease, toxicology. You will be working with publicly available data and using a high-performance computer (HPC) system.

Skills required: No prior knowledge to biology, medicine, or chemistry is needed. However, Python, R or another programming language is needed to progress with the task of this thesis. Understanding challenges of working with PDF. Natural Language processing, machine learning, graph theory is advantageous and should be further developed during the course of this master. Ontology frameworks is advantageous but not necessary but should be developed during the course of this master. Knowledge about the Resource Description Framework (RDF) framework for representing interconnected data on the web is advantageous but not necessary.  RDF is used to integrate data from multiple sources. Creativity and goal-oriented person. You will use GitHub to follow your progress and goals. Care for the environment, animals, and other humans is a must.

Working environment: You will be working partly at the Norwegian Institute of Public Health (NIPH, Folkehelseinstituttet) that contributed to risk assessment during SARS-Cov-2 pandemic and advise to the Ministry of Health. We do care about a nice working atmosphere and student's well-being and progress.

Supervisors: Main supervisor: Marcin Wojewodzic, Researcher at Norwegian Institute of Public Health (https://www.fhi.no/) and Norwegian Cancer Registry. Email for interview: maww@fhi.no Internal supervisor at IFI: Professor Torbjørn Rognes.

Literature:
* https://www.frontiersin.org/articles/10.3389/ftox.2021.746430/full

Publisert 4. okt. 2023 14:31 - Sist endret 28. nov. 2023 13:14

Veileder(e)

Omfang (studiepoeng)

60