Open Online Training Curriculum

In 2014 we received funding from the NIH BD2K initiative to develop MOOCs for biomedical data science. The courses are divided into two series: Data Analysis for the Life Sciences and Genomics Data Analysis.

Most of the teaching material is available as an open online book as well as on our GitHub repo

Data Analysis for the Life Sciences XSeries

The series is composed of four courses. You can follow the links to enroll:

  1. Statistics and R for the Life Sciences
  2. Introduction to Linear Models and Matrix Algebra
  3. Statistical Inference and Modeling for High-throughput Experiments
  4. High-Dimensional Data Analysis

While not required for the first class, some familiarity with R and Rstudio will serve you well so consider the online courses listed on the resources page.

Four courses we will be using this freely available book which includes links to the online material.

Genomics Data Analysis XSeries

This series focuses on Bioconductor and how it is used to analyze high-throughput data with a focus on next generation sequencing. The series is composed of three courses. You can follow the links to enroll:

  1. Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays
  2. High-performance Computing for Reproducible Genomics
  3. Case Studies in Functional Genomics

Featuring the stellar instructors [course #]:

  • [1,2] Vincent Carey of Harvard Medical School and Brigham And Women’s Hospital
  • [3] X. Shirley Liu and her lab at the Dana Farber Cancer-Institute and Harvard T.H. Chan School of Public Health

Future courses

We are currently developing a course on Data Analysis with Python which should be available this academic year.

Announcements will be made here and on twitter: @rafalab

These classes were supported in part by NIH grant R25GM114818.