Reproducibility of Microarray and Gene Expression Analysis: Setup

This page contains instructions and links for downloading the resources and tools that will be needed for the course. Step-by-step instructions for most of the software tools can be found on their respective websites, so we have not included that information here.

Data

The paper by Sørlie et al. referenced throughout this course can be found freely available online from the Proceedings of the National Academy of Sciences. Be sure to also view the supporting information under the “SI” tab, particularly Supporting Figure 6, Supporting Table 3, and the Supporting Materials and Methods.

As stated in the paper, additional data for the paper are hosted on the Stanford Genomics Breast Cancer Consortium portal. The most complete version of Supporting Figure 6 is also available through this portal. Three files used to create Supporting Figure 6 will be needed:

TreeView

The TreeView program to be used in Lesson 3 and Lesson 5 can be found here. Please download the current release (which was Alpha 3 from July 5, 2016 at the time of writing) and use the JAR file to run the program. Your computer will need to have Java 7 or higher.

Cluster

The Cluster program was originally developed by Michael Eisen at Stanford University. A recent implementation (Cluster3.0) has been made available by Hoon et al., which will be used in Lesson 5. Please download the appropriate installer for your environment, e.g. the installer for Windows, and run the .exe file to install Cluster 3.0 under Programs. You can then run Cluster 3.0 from its installation directory or the start menu.

R

Several lessons will use the R language to process data and create visualizations. Installation packages and instructions for R can be obtained from its website, where versions are available for Microsoft Windows, MacOS, and Linux/UNIX platforms. Once installed, R can be run in an interactive session where individual commands can be input one at a time. In addition, R commands can be placed into a script file, and various chunks of the file (up to and including the entire file) can be executed at once.

IDE

Managing multiple R script files is much easier with an integrated development environment (IDE) such as RStudio, but it is not required for these lessons.

Once the R environment is up and running, you will need to install several additional packages to be used in Lesson 4 and Lesson 6:

These packages can be installed directly from R by running the following commands in an R session:

install.packages("HiveR")
install.packages("dendextend")
install.packages("dendextendRcpp")
install.packages("NMF")