This Fall 2018 THRIV Scholars Biomedical Data Science Training Program is a seven-part series of courses introducing the essentials of biomedical data science using R, directed toward junior faculty seeking a clinical and translational research career. This class introduces methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biomedical data using the R statistical computing environment.

Please see the THRIV syllabus for more information.

Click here to download the entire course archive as a zip file. Once downloaded and extracted, double-click the index.html file in the extracted folder to view the course material offline.

Course Material

FAQ

What’s this series all about?

This class introduces methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biomedical data. Specifically, the course introduces the R statistical computing environment and packages for manipulating and visualizing high-dimensional data, covers strategies for reproducible research, and culminates with predictive modeling and forecasting analyses of real public health data.

This is not a “Tool X” or “Software Y” class. I want you to take away from this series the ability to use an extremely powerful scientific computing environment (R) to do many of the things that you’ll do across study designs and disciplines – managing, manipulating, visualizing, and analyzing large, sometimes high-dimensional data. Whether that data is gene expression data from yeast, microbial genomics data from B. pertussis, public health data from Gapminder, RNA-seq data from humans, movie preference trends from Netflix, or truck routing data from FedEx, you’ll need the same computational know-how and data literacy to do the same kinds of basic tasks in each. I might show you how to use specific tools here and there (DESeq2 for RNA-seq analysis, ggtree for drawing phylogenetic trees, etc.), but these are not important – you probably won’t be using the same specific software or methods 10 years from now, but you’ll still use the same underlying data and computational foundation. That is the point of this series – to arm you with a basic foundation, and more importantly, to enable you to figure out how to use this tool or that tool on your own, when you need to.

This is not a statistics class. There is a short lesson on essential statistics using R but this 3-hour lesson offers neither a comprehensive background on underlying theory nor in-depth coverage of implementation strategies using R. Some general knowledge of statistics and study design is helpful, but isn’t required for this course.

What are the pre-requisites?

There are none!

However, click here for instructions on setting up your computer. Each class involves lots of hands-on practice coding, and you’ll need to download and install some free software prior to our first class. This may take up to an hour or so, and please do not hesitate to email me prior to the workshop if you are having difficulty.

Do I need a laptop?

YES. You must have access to a computer on which you can install software. The class will be a mix of lecture, discussion, but primarily live coding. You must bring your laptop to each session. Bring your charging cable also. Please follow the setup instructions prior to the workshop.

Where do I get additional help?

Glad you asked! See here.


Attribution: Course material is inspired by and/or modified in part from Jenny Bryan’s Stat 545 course, Software Carpentry, Data Carpentry, David Robinson’s blog, Marian Schmidt’s MSU NGS Workshop, Vanderbilt Department of BioStatistics Datasets, the ggtree vignettes, Shirin Glander’s blog, and likely many others.