Oslo Bioinformatics Workshop Week 2023

December 11th-15th 2023 will see the second edition of the Oslo Bioinformatics Workshop Week at the University of Oslo, Norway. This event is organised by the Student Committee of the Centre for Bioinformatics at UiO, in collaboration with the ISCB Regional Student group in Norway. These workshops are open to the scientific community in Oslo and the surrounding area.

Registration is now closed. Any questions or registration modifications should be addressed to oslo-bioinfo-workshops@ifi.uio.no.

Agenda


DayMorning 9:00-12:00Afternoon 13:00-16:00
Mon 11.12.23

Genome annotation and comparative genomics, part 1

The art of storytelling in science

Introduction to machine learning applications in genomic analysis

Statistical principles in machine learning for small biomedical data

Tue 12.12.23

Introduction to Machine Learning for Survival Analysis with mlr3

Genome annotation and comparative genomics, part 2

Introduction to Git and Development Cycle

Analysis of single cell data
Wed 13.12.23

Missing value imputation in methylation data

Networking event with panel discussion Fostering Innovation in Bioinformatics

Create and Handle a Data Management Plan

Introduction to gene expression regulation by transcription factors and its computational analysis

Thu 14.12.23

Reproducible research with Nextflow & building pipelines with nf-core

Using Omnipy for data wrangling and metadata mapping

Metagenomes meta-analysis: bioinformatic ecosystem from idea to data

Reconstructing and analyzing transkingdom/multi-omic networks to identify potential drivers of disease

A hands-on introduction to NeLS and usegalaxy.no
Fri 15.12.23

microRNA annotation and expression analyses

Docker and Apptainer for Reproducible Research: Getting Started with Containers

An introduction to Snakemake and Snakemake workflows

The Oslo Bioinformatics Workshop Week 2023 is supported by UiO:Life Science, the ISCB Regional Student Group Norway, the Temporary employees committee of the Division of Laboratory Medicine (Klinikk for Laboratoriemedisin), Oslo University Hospital (KLM-TEC) > and NCMM - Norsk senter for molekylærmedisin.


Statistical principles in machine learning for small biomedical data

Date: Monday 11 December 2023 9:00-12:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)A central problem in machine learning is how to make an algorithm perform well not just on the training data, but also on new inputs. Many strategies in machine learning are explicitly designed to reduce this test error, possibly at the expense of increased training error. These strategies are collectively known as regularisation and they are instrumental for good performance of any kind of prediction or classification model, especially in the context of small data (many features, few samples). We will discuss basic connected concepts of generalisation, overfitting, bias-variance trade-off and regularisation and will illustrate the principles with penalised (generalised) linear regression models, with ridge, lasso and elastic net penalties as prominent examples. Finally, we will present the idea of structured penalties and priors, which can be tailored to account for structures present in the data, e.g. multi-modality or complex correlation structures. We will use examples from large-scale cancer pharmacogenomic screens, where penalised regression and alternative Bayesian approaches are used for predicting drug sensitivity and synergy based on the genomic characterisation of tumour samples. In the hands-on tutorial we will use R to perform an integrated analysis of multi-omics data with penalised regression.

Learning outcomes

After attending this workshop, learners will understand key concepts for training machine learning models such as regularisation and how to incorporate data structure in the regularisation process.


Target audience

Graduate, post-graduate students and researchers at any level who are interested in applying machine learning methods to small data (i.e., few examples but potentially many features) or noisy data (e.g. biomedical data).

Pre-requisites

  • Basic familiarity with R. Introductory level statistics including regression.

Equipment to bring

Laptop with a recent version of R and RStudio installed.

Instructor(s):
Manuela Zucknick (main)
Theophilus Asenso,Chi Zhang

Go back

Introduction to gene expression regulation by transcription factors and its computational analysis

Date: Wednesday 13 December 2023 9:00-12:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)Gene regulation by transcription factors (TFs) is a key process in cells, controlling when and where sets of particular genes will be transcribed. This precise control is important for cell growth and differentiation, but it also means that perturbation in such mechanisms can lead to diseases, e.g. cancer. Various experimental techniques were developed to map transcription factor binding locations along the genome. The most used nowadays is ChIP-seq. Resulting experimental data requires dedicated computational tools for data analysis and interpretation. A bouquet of such tools is already available, varying from peak callers and motif discovery tools to more complex deep learning algorithms. In this workshop we will introduce the transcription factors and how they are controlling gene expression. We will describe the properties of transcription factor binding, such as sequence binding motifs. Furthermore, we will discuss available in vivo and in vitro experimental techniques that help capture transcription regulation events. Finally, we will walk through the data analysis produced by different techniques. This will focus on TF motif discovery and enrichment, TF binding site determination. We will show a few freely available resources that store TF binding profiles and sites and demonstrate a selection of online tools for data analysis. The last part of the workshop will consist of Q&A (participants can bring their project-related questions) and a hands-on exercises, where each participant will try motif-discovery tools. Then using learned knowledge and resources we will summarize and discuss the results to interpret the binding of selected TFs.

Learning outcomes

After the workshop, the participants will have a comprehensive overview of gene expression regulation by transcription factors and how this kind of biological processes and experimental data can be investigated and interpreted using computational methods. The learners will also know some of the major resources storing relevant information and/or offering data analysis.


Target audience

Any career stage

Pre-requisites

  • No prerequisites, except interest in gene expression regulation by transcription factors and computational analysis of such data.

Equipment to bring

A laptop

Instructor(s):
Ieva Rauluseviciute (main)
Jaime A. Castro Mondragon

Go back

Create and Handle a Data Management Plan

Date: Wednesday 13 December 2023 9:00-12:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)This workshop aims to provide a hands-on session on creating a data management plan (DMP) using the Data Stewardship Wizard (DSW), a dynamic web forms system provided by ELIXIR Europe. The Norwegian node of ELIXIR is curating a national instance of DSW (https://elixir-no.ds-wizard.org) and provides a DMP form customised for life scientists in Norway. The audience will learn about the information that needs to be captured into a DMP and how to use DSW for this purpose. We will demonstrate how to create a DMP, starting either from scratch or from a set of recommendations based on specific scientific domains including, among others, sequencing, light microscopy and high-throughput screening. A demonstration of how to work collaboratively on DMPs and how to export them to a document compliant with national and international funding bodies' requirements will also be given. In the second part of the workshop, the participants will work, individually or as groups, on a DMP for projects they are involved in or for projects of interest. The instructors will give advice and bring up discussion points based on feedback from the participants. Depending on the audience, discussions on how to modify a questionnaire or other technical aspects might take place.

Learning outcomes

After this workshop, learners will: Understand the relevant information to be captured in a DMP Start a new DMP on DSW, from scratch or from a set of pre-filled recommendations Share a DMP and work collaboratively on it Assess good practices in research data management and FAIRness Export a DMP according to formal requirements from funding bodies such as the Research Council of Norway, Science Europe, and Horizon Europe


Target audience

This workshop targets researchers at any career stage (Ph.D. candidates, Post Doctoral Fellows, Researchers, and Professors) and especially principal investigators and personnel responsible for the data generated within a project. Data specialists or technical personnel in the life sciences who want to learn more about DSW are also welcome.

Pre-requisites

  • A user account on the Norwegian instance of DSW (https://elixir-no.ds-wizard.org)

Equipment to bring

laptop

Instructor(s):
Federico Bianchini (main)
Nazeefa Fatima

Go back

Reproducible research with Nextflow & building pipelines with nf-core

Date: Thursday 14 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)As bioinformatics data grows ever larger, workflow managers are needed to handle automated workflows. They are able to dynamically launch and manage the chaining of numerous bioinformatics tools for hundreds of samples. Of these, Nextflow (https://nextflow.io) has gained a large community and has become the industry standard for research groups and facilities. The nf-core community (https://nf-co.re) has grown around Nextflow and provides a meeting point for people to collaborate on pipelines and components. In the morning of this workshop, we will introduce Nextflow - the concepts and language it uses and an overview of how pipelines are constructed. In the afternoon we will introduce nf-core, showing how to use community pipelines and how to develop new pipelines using nf-core shared components.

Learning outcomes

  • Understand the basics of how Nextflow works
  • Be able to write simple workflows from scratch
  • Find and launch nf-core pipelines
  • Create simple pipelines using nf-core tooling


Target audience

Anyone interested in building or running reproducible analysis workflows at scale! People from any career stage / disciple welcome.

Pre-requisites

Equipment to bring

Laptop, GitHub account.

Instructor(s):
Phil Ewels (main)
Maxime Garcia

Go back

Genome annotation and comparative genomics, part 1

Date: Monday 11 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)High quality genome assemblies are becoming available for many groups of species. This is thanks to the efforts of projects such as the Earth Biogenome Project (EBP), the Darwin Tree of Life (DToL), and our own Earth Biogenome Project Norway (EBP-Nor). The increase in the number and quality of genomes allows us to perform comparative genomics analyses on a different scale than previously. We can now compare the entire genome – genes and non-genic regions – across different species in order to better understand their biology and evolution. Annotating (finding different elements, such as genes, regulatory regions or repeats) genomes is not a straightforward process, and generating qualitative and standardised annotations is a necessary step for performing many downstream analyses (e.g. comparative genomics). In this workshop we will discuss methods of generating a genome annotation of all genes in a genome assembly for a species. We will then use annotations from multiple species to identify orthologous genes (genes related to each other) using OrthoFinder. This workshop is sponsored by EBP-Nor.

Learning outcomes

In this workshop, participants will learn about genome annotation and how to use some of the more popular tools for creating genome annotations and validating them. In addition, the participants will learn comparative genomics analyses based on orthologous genes identified from the generated genome annotation in comparison with other species.


Target audience

The workshop is targeted towards anyone that wants to learn more about genome annotation and comparative genomics and/or want to generate their own genome annotations.

Pre-requisites

Equipment to bring

A laptop with a terminal installed. You can find a guide for how to set that up via https://swcarpentry.github.io/shell-novice/.

Instructor(s):
Ole Kristian Tørresen (main)
Helle Tessand Baalsrud,José Cerca,Bram Danneels

Go back

Genome annotation and comparative genomics, part 2

Date: Tuesday 12 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)The increase in the number and quality of annotated genomes allows us to perform comparative genomics analyses on a different scale than previously. We can now compare the entire gene set across different species in order to better understand their biology and evolution. In this workshop, we will be introducing different concepts of comparative genomics. We will demonstrate how to use and interpret OrthoFinder results data to reconstruct the evolutionary relationship among genes in orthogroups (set of genes derived from a single gene in the last common ancestor of all the species under consideration). In a phylogenetic framework we will infer gene gains and gene losses in different species or clades. We will also use the gene trees obtained for each orthogroup to obtain estimates of selection across different genes or branches using various dN/dS ratio-based tests). These comparative genomics tools are powerful and efficient ways of investigating gene evolution on a genome-wide scale, and can be applied to any organism group of interest. This workshop is sponsored by EBP-Nor.

Learning outcomes

After attending this workshop, participants should be able to formulate and test hypotheses associated with gene gain/loss and positive selection on genomes.


Target audience

The workshop is targeted towards anyone that wants to learn more about comparative genomics with focus on orthologous genes with OrthoFinder.

Pre-requisites

Equipment to bring

A laptop with a terminal installed. You can find a guide for how to set that up via https://swcarpentry.github.io/shell-novice/. Please also bring Bank ID because it might be needed to get access to a cluster.

Instructor(s):
Ole Kristian Tørresen (main)
Helle Tessand Baalsrud,José Cerca,Bram Danneels

Go back

Reconstructing and analyzing transkingdom/multi-omic networks to identify potential drivers of disease

Date: Thursday 14 December 2023 9:00-12:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)This workshop will focus on using large-scale multi-omic (microbial, transcriptomic, metabolomic, phenotypic, etc.) data to reconstruct and analyze transkingdom biological networks using a tool called Transkingdom Network Analysis (TkNA, https://github.com/CAnBioNet/TkNA). This tool uses principles of causality, statistics, and networks, to identify drivers (microbes, genes, etc.) of disease. Prior to the event, attendees will need to clone the GitHub repo to their computer using instructions provided by us. During the workshop, attendees will first be familiarized with the overall structure of the pipeline, experimental considerations, data formats, and suggested normalization procedures. Then, using a Unix-based terminal, they will run through the Python pipeline, producing figures and output files. We will then cover how to interpret those files/figures, as well as discuss potential experiments that can be performed to experimentally validate the findings of TkNA. We will also show the attendees how to visualize their networks using Cytoscape. For users that are not comfortable working using the command line/terminal, there is a chance that a graphical user interface (GUI) version of the software may be ready to be used by the time of the workshop. More information on TkNA can be found at the GitHub link provided above, or in the manuscript, which is currently under review in Nature Protocols (https://www.biorxiv.org/content/10.1101/2023.02.22.529449v2).

Learning outcomes

Following completion of this workshop, attendees can expect to know how to reconstruct a transkingdom/multi-omic network using the TkNA pipeline. Additionally, they will learn how to analyze these networks to find biological pathways represented in their networks, as well as how to identify key players (microbes, genes, etc.) that likely contribute to the development of disease. They will then become familiar with visualizing their networks using the free Cytoscape software. Finally, attendees will also learn methods to experimentally validate the results of TkNA.


Target audience

The utilization of TkNA is diverse and can apply to many fields. Generally, however, it is aimed towards biologists studying specific disease systems. Researchers who have large-scale (microbial, transcriptomic, metabolomic, etc.) data will benefit the most. While the software has traditionally been used for understanding transkingdom (e.g. host-microbiome) interactions, it can be used on just single-omic data as well (e.g. a network can be made and analyzed in TkNA using just transcriptomic data). Anyone interested in this software is encouraged to join. Everyone is welcome, regardless of their computational experience. While the README on GitHub may seem lengthy and a bit intimidating, running the code is a quick and easy process, plus there is a chance a graphical user interface (GUI) version will be ready by the time of the workshop.

Pre-requisites

Equipment to bring

Laptop with access to a Unix terminal, plus the aforementioned requirements.

Instructor(s):
Nolan Newman (main)
Tatiana Belova

Go back

The art of storytelling in science

Date: Monday 11 December 2023 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)The main topics and hands-on exercises that I was hoping to cover during the workshop: 1- Why does writing matter in science? Or what functions it serves? 2- Why storytelling? Or why is important not only for communication but also for doing research? 3- Different types of stories (how certain story themes repeatedly appear). 4- The stories we write in science (story themes in science). 5- Hands-on exercises: we select a paper and try to follow their writing and understand how they have structured their paper using proper words. 6- Hands-on exercises: we will try to write an imaginary paper together, following the things we have learned.

Learning outcomes

1- have a better understanding of storytelling. 2- identify the structure (storyline) of papers more easily when reading a paper. 3- to improve the structure and delivery of their papers.


Target audience

PhD students and postdocs.

Pre-requisites

Equipment to bring

None

Instructor(s):
Youness Azimzade (main)

Go back

A hands-on introduction to NeLS and usegalaxy.no

Date: Thursday 14 December 2023 13:00-16:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)This workshop covers an introduction to the Norwegian e-Infrastructure for Life Sciences (NeLS) and usegalaxy.no. These services, both administrated by ELIXIR Norway, provide national users with data storage and the possibility to run analyses and define workflows in a reproducible way. The workshop will cover a general introduction to NeLS and its 3-layer tiered storage system with a specific focus on data transfer. A general introduction to transferring files between NeLS and UseGalaxy will also be provided, followed by a dedicated session about the standard galaxy tools and "histories" for reproducible analysis. The final session will cover how to create and run workflows on usegalaxy and share them with other users. Each session will start with a short presentation, followed by a detailed hands-on session.

Learning outcomes

  • Transfer files from local machines to NeLS
  • Transfer files between UseGalaxy and NeLS
  • Understand how to use galaxy tools for data analysis
  • Create galaxy "histories" for reproducible analysis
  • Create, run and share workflows on UseGalaxy


Target audience

Researchers in life science/bioinformatics

Pre-requisites

Equipment to bring

laptop, Feide credentials or NeLS identity

Instructor(s):
Jon Lærdahl (main)
Federico Bianchini

Go back

Docker and Apptainer for Reproducible Research: Getting Started with Containers

Date: Friday 15 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)This course will teach you how to use container software to manage reproducible bioinformatics environments. Containers are a lightweight and scalable way to bundle all of the software and libraries that you need to run your analyses, regardless of the computing environment. This is particularly useful for bioinformatics research, where analyses often require specialized software and large amounts of memory and CPU resources. In this course, you will learn how to use Docker and Apptainer to create, share, and run containerized bioinformatics environments. You will also learn how to use Apptainer to run Docker containers on shared computing environments, such as HPC clusters, without requiring privileged access. The course will take a practical approach, with several hands-on exercises to help you learn the basics of using, developing, and sharing containers.

Learning outcomes

  • Understand the basic concepts and terminology associated with virtualization with containers
  • Customize, store, manage and share containerized environments with Docker
  • Describe and understand the essential differences between Docker and Apptainer
  • Use Apptainer to run containers on a shared computer environment (e.g. a HPC cluster)


Target audience

This course is designed for those interested to work with containers in the context of bioinformatics.

Pre-requisites

Equipment to bring

Participants are required to bring their own laptop with Docker pre-installed, and have created an account on dockerhub. Note that participants need admin rights in their computer to install and use Docker. In addition, Windows users require a recent version of Windows.

Instructor(s):
Geert van Geest (main)

Go back

Metagenomes meta-analysis: bioinformatic ecosystem from idea to data

Date: Thursday 14 December 2023 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)In recent years, various large-scale research projects delivered increasing-well standardised metagenomic sequencing data to the public domain. Often times, such data is generated for pioneering or to address broad research questions, offering rich untapped information for meta-analyses and in-depth studies addressing more specific questions. First, we will explore such project's and realise that many data features can be exploited. Smaller projects also yield data that can be combined for insightful meta-analyses, which consist of collecting samples from independent sources and process them altogether to address new question, usually to test reproducibility or empower statistics with large sample sizes. For example, one might want to understand universal mechanisms by integrating soil with ocean samples from a given range of pH range, or to assess where genome lineages from a mass of air branch in a phylogeny containing genomes from underlying surface oceans. For this to be possible, meta-analyses both rely on and serve the quintessential purpose of standardization and the FAIR principles. It is important to interrogate metadata information to identify relevant samples before collecting all the sequence data files of a study. Thus, we will learn how to query public data repositories using different entry points (online and programmatically) and insist on metadata quality and content using python and visualization tools to measure potential for useful meta-analysis. The rest is simple engineering: we will use the NCBI SRA-Toolkit to download and format raw sequence data for bioinformatic processing and briefly discuss the technical challenges and avenues for metagenomics.

Learning outcomes

After this half-day workshop, participants will know:
  • what are the major metagenomic projects and public data repositories;
  • how to interrogate these repositories (NCBI's SRA and EBI's ENA, Qiita);
  • how to collect sample metadata and select relevant samples
  • how to merge metadata information from different studies and assess potential for meta-analysis
  • how to retrieve sequencing data using sratools what are the challenges for a robust meta-analysis and how to proceed for the sequence analysis


Target audience

This workshop is not intended for experienced computational scientists but rather for Master and PhD level biologists as well as senior biologists with little experience with R/python, willing to learn about accessing and collecting public metagenomic data.

Pre-requisites

Equipment to bring

Laptop with jupyter notebook installed, ca. 2GB of storage

Instructor(s):
Franck Lejzerowicz (main)

Go back

microRNA annotation and expression analyses

Date: Friday 15 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)smallRNA Sequencing (sRNA-Seq) is based on next generation sequencing (NGS) to profile small RNAs in a specific length range (typically less than 300 nucleotides), unlike RNA-Seq, which profiles the whole poly-adenylated transcriptome in a cell. The key difference of sRNA-Seq library preparation from RNA-Seq is the size selection. Total RNA is run on a denaturing polyacrylamide gel and only the section of the gel containing RNA of the appropriate size (~22nt for animal microRNAs) will be cut out and selected for sRNA sequencing. The sRNA-Seq sample is usually dominated by a small number of microRNA genes that are highly abundant, while the other but majority microRNA genes are relatively lowly expressed and highly diverse. In this tutorial you will first analyze example sRNA-Seq data (Mouse liver and blood samples) getting to know a range of tools, and then do a “real life” analysis where you will reproduce findings and plots from an actual publication (human cell-line knock-out data of microRNA biogenesis pathway).

Learning outcomes

After the workshop attendees will be able to access, process and analyze microRNA sequencing data and reproduce figures from papers.


Target audience

anyone with the prerequisites

Pre-requisites

Equipment to bring

lapto and see above

Instructor(s):
Bastian Fromm (main)
Anju Anglina Hembrom, Vanessa Paynter

Go back

Analysis of single cell data

Date: Tuesday 12 December 2023 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)Single‐cell RNA sequencing technology has become the state‐of‐the‐art approach for unravelling the heterogeneity and complexity of RNA transcripts within individual cells, as well as revealing the composition of different cell types and functions within highly organized tissues/organs/organisms. Single-cell analysis allows the study of cell-to-cell variation within a cell population. To perform tasks such as differential expression analysis, we must navigate through multiple essential steps, including data processing from the sequencer, quality assessment, normalization, and dimension reduction. For these crucial tasks, we will employ the Seurat package.

Learning outcomes

After attending this workshop, learners are able to perform basic downstream analysis on single cell data.


Target audience

Everyone who is interested in single cell field.

Pre-requisites

Equipment to bring

laptops with installed seurat package (the newest from https://satijalab.org/seurat/), internet to be able to download data

Instructor(s):
Diana Domanska (main)
Victoria Karlsen

Go back

Introduction to Machine Learning for Survival Analysis with mlr3

Date: Tuesday 12 December 2023 9:00-12:00 13:00-16:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)This introductory workshop is designed to equip participants with practical skills and knowledge for performing survival analysis using machine learning techniques. Survival analysis, a fundamental statistical method in biomedical and clinical research, focuses on analyzing time-to-event data, such as the time to disease progression or patient survival. In this workshop, attendees will work with clinical and gene expression data to build, train, and test survival models. They will learn how to leverage R's mlr3 ecosystem for efficient model development, incorporating sophisticated machine learning models such as penalized linear models and random forests to enhance the accuracy of the survival predictions. Participants will also explore survival metrics and model validation techniques to assess the quality and reliability of their models in the context of real-world data. Whether you're new to survival analysis or seeking to enhance your skills, this workshop offers valuable insights and hands-on experience for tackling challenging clinical and biomedical questions.

Learning outcomes

  • Understand the foundations of Survival Analysis and its applications in clinical and high-dimensional research.
  • Develop skills in using the mlr3 framework for survival analysis, allowing you to build and evaluate predictive models.
  • Explore the various survival prediction types and survival metrics to assess model performance.
  • Work with real-world clinical and gene expression datasets to apply machine learning techniques in a research context.


Target audience

Anyone interested in survival analysis and/or ML and have some knowledge of R programming

Pre-requisites

Equipment to bring

laptop with mlr3verse, mlr3proba and mlr3extralearners packages installed

Instructor(s):
John Zobolas (main)

Go back

Missing value imputation in methylation data

Date: Wednesday 13 December 2023 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)This workshop will teach you how to handle missing data values in your methylation data. We will go over the types of missing values, consequences of doing (or not) missing value imputation and performance under different missing value patterns. We will also discuss several packages for missing value imputation and try one of them out.

Learning outcomes

After this workshop, learners understand why missing value imputation is necessary and can confidently perform it with a suggested package.


Target audience

Everyone doing analysis of methylation data

Pre-requisites

Equipment to bring

Laptop with R

Instructor(s):
Anna Plaksienko (main)

Go back

Introduction to machine learning applications in genomic analysis

Date: Monday 11 December 2023 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)The field of genomics has undergone significant developments in recent years, leading to an increasing demand for professionals who are well-versed in the latest methods and tools that use machine learning technologies. Our short course, "Introduction to Machine Learning Applications in Genomic Analysis," is specifically designed to provide an overview of the key machine learning concepts and their use cases in genomic analysis. The course begins with an introduction to the main concepts of machine learning technologies, including supervised learning, unsupervised learning, and reinforcement learning, providing participants with a basic understanding of how these technologies work. Next, the course will introduce various genomic analysis methods that can potentially benefit from machine learning technologies. Here, we will introduce participants to recently developed machine-learning-based genomic analysis tools, while highlighting their underlying technologies. Importantly, we will highlight the challenges of using these tools in genomic analysis and discuss the associated risks. By the end of the course, participants will have gained a basic understanding of machine learning use cases in genomic analysis, as well as the current challenges and risks of this growing field.

Learning outcomes

* Gain a basic understanding of machine learning methods and their applications in genomic analysis. * Understand the current challenges and risks of using machine-learning-based tools in genomic analysis.


Target audience

Biologists interested in machine learning applications in genomics

Pre-requisites

Equipment to bring

Nothing special

Instructor(s):
Pubudu Saneth Samarakoon (main)
Burcin Buket Ogul

Go back

Using Omnipy for data wrangling and metadata mapping

Date: Thursday 14 December 2023 9:00-12:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)Researchers in the life sciences often spend a significant amount of time on data wrangling tasks, such as reformatting, cleaning, and integrating data from different sources. Despite the availability of software tools, they often end up with difficult to reuse workflows that require manual steps. Omnipy is a new Python library that offers a systematic and scalable approach to research data and metadata wrangling. It allows researchers to import data in various formats and continuously reshape it through typed transformations. For large datasets, Omnipy seamlessly scales up local test jobs and provides persistent access to the data state at every step. This workshop will provide down-to-earth tutorials and examples to help data scientists in the life sciences make use of Omnipy to wrangle real-world datasets into shape. The workshop is divided into three parts: 1. The first part will introduce the concepts of models, datasets, tasks and flows in Omnipy through small examples. We will also touch upon Python type hints and pydantic models as needed, as these are important building blocks for Omnipy. 2. In the second part, the participants will be provided with a rough example dataset that require cleaning. As a hands-on exercise, the participant will carry out step-wise parsing and shaping of the data to make it comply with a specified metadata schema. 3. In the last part, the participants will be introduced to the metadata mapping functionalities in Omnipy and will be led through another hands-on exercise to set up a transformation that maps the data from one metadata schema to another.

Learning outcomes

  • Introduction to Python type hints and pydantic models
  • How to use type hints to define models, datasets, tasks and flows in Omnipy
  • How to wrangle a rough dataset into the shape required by a metadata schema
  • How to set up an executable mapping of data from one metadata schema to another


Target audience

PhDs, Postdocs, Technical personnel. Interest and experience with programming in an academic setting. Several of the use cases will assume bioinformatics experience, so a background in bioinformatics will help. Most of the databases and ontologies in the use cases are from the biological domain.

Pre-requisites

Equipment to bring

The participant should have some experience with Python programming/scripting. We will not spend time explaining basic syntax and concepts, other than what is related to type hints. Experience with type hints in Python is useful, but not required.

Instructor(s):
Sveinung Gundersen (main)
Federico Bianchini, Pável Vázquez

Go back

Introduction to Git and Development Cycle

Date: Tuesday 12 December 2023 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)This workshop aims to introduce people to the git version control system and how it can be used to help the process of developing software. Git is a fast and widely used version control system that allows for the creation of branches to work on different parts of your code. It is especially useful when collaborating with multiple people on the same project. We will go through the standard process of using git while working on a project, introducing the most relevant commands as we go. Every participant is invited to follow along on their own laptop.

Learning outcomes

After attending the workshop, you should be able to use the basic git commands to maintain your repository and be familiar with the software development process.


Target audience

Anyone who develops, or would like to develop, software and wants to learn more about how git can help with that.

Pre-requisites

Equipment to bring

laptop, a GitHub account, having git installed is optional

Instructor(s):
Ladislav Hovan (main)
Tatiana Belova

Go back

An introduction to Snakemake and Snakemake workflows

Date: Friday 15 December 2023 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)In this workshop, we are planning to give a short introduction to the Snakemake workflow management system. This entails installation, running the simple scripts via Snakemake, using some advanced features of Snakemake, and how users can use Snakemake in their daily tasks.

Learning outcomes

How to install Snakemake and how to use it for designing simple bioinformatics pipelines.


Target audience

Students and researchers who want to create bioinformatics pipelines.

Pre-requisites

Equipment to bring

A laptop, MAC or Linux preferred (Windows with Linux subsystem and admin access)

Instructor(s):
Sinan U. Umu (main)
Lex Nederbragt

Go back
Published Sep. 18, 2023 1:45 PM - Last modified Dec. 7, 2023 9:06 AM