Description

The course provides an introduction to machine learning methods and workflows for life science research. It introduces the full end-to-end machine learning (ML) workflow, from data preprocessing and feature engineering to model training, evaluation, interpretation, and reproducible reporting, with a focus on the analysis of complex, high-dimensional biological data. Participants explore biological datasets using unsupervised methods such as dimensionality reduction and clustering, and build predictive models using supervised approaches including linear and tree-based models. Methods for multi-omics integration, including partial least squares (PLS), are introduced, and regularisation techniques for high-dimensional data, such as ridge, lasso, and elastic net, are also covered.

Content

  • Overview of the machine learning workflow
  • Dimensionality reduction methods such as PCA and UMAP
  • Unsupervised learning and clustering methods
  • Supervised learning models, including tree-based models
  • Partial least squares (PLS) for multi-omics integration
  • Regularization methods.
  • Model training, evaluation and validation strategies
  • Model interpretation and explainable machine learning methods

Upcoming Training Instances

No upcoming training instances.

Details

Language
English
Licence
Creative Commons Attribution Non Commercial 4.0 International
Affiliations
SciLifeLab
Last Updated
June 08, 2026 07:50

Content Providers

Learning Outcomes

  • Explain the main components of the machine learning workflow and their role in life science research.
  • Perform data preprocessing and exploratory analysis of high-dimensional biological datasets.
  • Apply unsupervised learning methods to discover structure and generate biological hypotheses.
  • Train, evaluate, and compare supervised learning models commonly used in life sciences.
  • Assess model performance using appropriate evaluation metrics and validation strategies.
  • Apply regularization techniques to improve model generalization in high-dimensional settings.
  • Interpret and communicate model results using explainable machine learning techniques.
  • Apply basic principles of reproducible and FAIR machine learning workflows
  • Deploy and share machine learning models using accessible tools.
  • Collaborate in interdisciplinary teams to design, implement, and present an ML-based data analysis.

Structure & Duration

One intensive week in Uppsala, from 09:00–17:00 each day, combining lectures, hands-on exercises, live coding sessions, and group discussions. The course also includes three 3-hour online sessions: two held prior to the on-site week and one follow-up session afterwards.

Prerequisites & Technical Requirements

Prior Knowledge
  • Basic programming skills in R or Python, including working with data frames and running scripts
  • Prior exposure to basic statistical concepts (e.g. descriptive statistics, linear regression)
  • Familiarity with data analysis environments such as RStudio or Jupyter Notebooks
Technical Requirements

Applicants are expected to bring their own laptops. A reasonably modern laptop with linux/unix, mac or windows OS and internet connection.

Audience & Keywords

Target Audience
PhD studentsPostdocstaff scientists
Keywords
Machine LearningDimension reductionUnsupervised learningSupervised learningModel validationReguralizationModel interpretationML framework

Course Team

Authors
  • Payam Emami
  • Olga Dethlefsen
  • Eva Freyhult

Activity log