Machine Learning for Life Sciences
Description
The course provides an introduction to machine learning methods and workflows for life science research. It introduces the full end-to-end machine learning (ML) workflow, from data preprocessing and feature engineering to model training, evaluation, interpretation, and reproducible reporting, with a focus on the analysis of complex, high-dimensional biological data. Participants explore biological datasets using unsupervised methods such as dimensionality reduction and clustering, and build predictive models using supervised approaches including linear and tree-based models. Methods for multi-omics integration, including partial least squares (PLS), are introduced, and regularisation techniques for high-dimensional data, such as ridge, lasso, and elastic net, are also covered.
Content
- Overview of the machine learning workflow
- Dimensionality reduction methods such as PCA and UMAP
- Unsupervised learning and clustering methods
- Supervised learning models, including tree-based models
- Partial least squares (PLS) for multi-omics integration
- Regularization methods.
- Model training, evaluation and validation strategies
- Model interpretation and explainable machine learning methods
Learning Outcomes
- Explain the main components of the machine learning workflow and their role in life science research.
- Perform data preprocessing and exploratory analysis of high-dimensional biological datasets.
- Apply unsupervised learning methods to discover structure and generate biological hypotheses.
- Train, evaluate, and compare supervised learning models commonly used in life sciences.
- Assess model performance using appropriate evaluation metrics and validation strategies.
- Apply regularization techniques to improve model generalization in high-dimensional settings.
- Interpret and communicate model results using explainable machine learning techniques.
- Apply basic principles of reproducible and FAIR machine learning workflows
- Deploy and share machine learning models using accessible tools.
- Collaborate in interdisciplinary teams to design, implement, and present an ML-based data analysis.
Prerequisites & Technical Requirements
Prerequisites
- Basic programming skills in R or Python, including working with data frames and running scripts
- Prior exposure to basic statistical concepts (e.g. descriptive statistics, linear regression)
- Familiarity with data analysis environments such as RStudio or Jupyter Notebooks
Technical requirements
Applicants are expected to bring their own laptops. A reasonably modern laptop with linux/unix, mac or windows OS and internet connection.
Topics & Tags
Affiliations & Networks
Activity log