Reproducibility of Microarray and Gene Expression Analysis: User Stories

Daniel

Daniel is an undergraduate student in a biology program. His professor, Dr. Biggschott, has decided to teach Daniel about scientific reproducibility by having him reproduce figures from published work. For this assignment, Daniel needs to identify an important research area, learn the specific background knowledge necessary to understand the work in this area, and then attempt to reproduce a published figure by obtaining data used in the original publication. Dr. Biggschott is also highly interested in data visualization and has asked Daniel to explore other methods for presenting the results in the figures.

In the Reproducibility of Microarray and Gene Expression Analysis course, Daniel will learn the skills needed to reproduce Supporting Figure 6 from the paper, “Repeated observation of breast tumor subtypes in independent gene expression data sets,” by Sørlie et al. (2003). Daniel will learn background information on microarray analysis and hierarchical clustering, which are instrumental in the creation of the figure. He will also gain valuable insight into how to obtain and analyze data from published work, including how to represent the data in an alternative visualization.

After completing the course, Daniel will have a better understanding of the complexities involved in scientific reproducibility and the importance of detailed supplementary materials. He will also have a better guideline for reproducing work in other scientific papers. Finally, he will also have a better grasp of the principles of microarray analysis, hierarchical clustering, and how data visualization techniques can be used for scientific findings. These skills will help him impress Dr. Biggschott as he examines the reproducibility of other scientific publications.

Anu

Anu is a biomedical informatics graduate student and a teaching assistant at Georgetown University. He wants to organize and teach a short course on the reproduction of scientific findings and, particularly, the recreation of visual elements in scientific publications. As part of the goals of his lectures, Anu must ensure that his students have the correct background knowledge to understand the visualizations they aim to reproduce, the ability to locate and reprocess published data, and the inquisitiveness to attempt new analyses or new visualization techniques. He identifies the Reproducibility of Microarray and Gene Expression Analysis course as a solid foundation for several lessons for his students, who do not all have specific experience with biomedical informatics.

The online course focuses on Supporting Figure 6 from the paper, “Repeated observation of breast tumor subtypes in independent gene expression data sets,” by Sørlie et al. (2003), which shows a heat map of gene expression data. Anu likes that the course explains the basic principles of microarray analysis and hierarchical clustering, which are important for understanding the figure. He intends to use the material as a case study that will show his students the importance of reproducibility while teaching them how to critically examine published work. Although his students are not very familiar with the R language, Anu will be able to walk through the code in the lessons and show them how certain analysis steps can be recreated or how new visualization can be made. Overall, Anu thinks that his students will have a better understanding of the scientific process and how to obtain and analyze data from published studies.