Reproducibility of Microarray and Gene Expression Analysis

Recreating Supporting Figure 6 with TreeView

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • Can Supporting Figure 6 be reproduced with the same visualization tool?

Objectives
  • Understand the key functionalities of TreeView 3.0.

  • Learn how to import data into TreeView 3.0 for heat map and dendrogram visualization.

  • Interpret images of gene expression as described in the previous lessons.

Data sharing for reproducibility is an important aspect of any scientific field. It can often be productive and informative to examine the materials and method descriptions provided with a manuscript to see if certain analyses can be reproduced. In this lesson, we will explore the recreation of Supporting Figure 6 using the final data files provided by Sørlie et al. and the description of their analysis provided in the publication. We will attempt to use the same visualization tool and the same data to exactly recreate the figure.

TreeView

From the Supporting Materials and Methods section of the paper, we learn that “the cluster diagrams were visualized using TreeView.” A link for this tool is provided in the text, but it is now outdated. A further search for TreeView identifies a Java implementation last modified on January 1, 2014. We also note that the Java TreeView webpage further points to the TreeView 3.0 project and a more recent effort.

TreeView 3.0 tool (Alpha 3 version released on July 5, 2016) has been based on the original implementation by Michael Eisen (this is the TreeView version mentioned in the paper) and the previous one by Alok Saldanha (as implemented in the Java TreeView tool). TreeView 3.0 is described as an open-source Java app for visualizing large data matrices. It can load a dataset, cluster it, browse it, customize its appearance and export it (or parts of it) into a figure.

Generating the Figure

To recreate Supporting Figure 6, we will need the TreeView 3.0 program and the three original data files (SupplFigure6.cdt, SupplFigure6.atr, and SupplsFigure6.gtr). These files contain post-clustering data, so only visualization is necessary (a detailed discussion of the files is presented in Lesson 5). Instructions to obtain the data files and TreeView 3.0 program are available on the Setup page. We recommend creating a new folder (e.g. named “Cluster_Results”) and placing all three data files in this folder. Within TreeView 3.0, we import the complete data file (SupplFigure6.cdt) and are then able to view the data as shown below.

The data import window in TreeView 3.0

TreeView will then generate a version of the heat map and dendrograms for genes and arrays, provided the extra data tree files (SupplFigure6.gtr and SupplFigure6.atr) are in the same folder as the complete data table file. Without these files, the dendrograms would not be visualized.

The default heat map generated in TreeView 3.0 after loading the original data

This version of the heat map and the dendrograms is identical to the data represented in Supporting Figure 6 found in the paper, although the color-coding and image setup are slightly different. As evidence, compare the genes and arrays highlighted in the A and B figures below.

A side-by-side comparison of a subset of the heat map produced by TreeView 3.0 using the original data and the published Supporting Figure 6

The same arrays and genes as well as the same classification are all present in both places. With only a few cosmetic differences, the basic elements are the same, and we can safely argue that it is possible to reproduce Supporting Figure 6 based on the full dataset distributed with the paper.

Examination Questions

1

Examine the heat map generated by TreeView 3.0 and identify the region with the highest gene expression values (in bright red). Select this region, zoom into the selected area, and name the genes that are highly expressed in the selected arrays. Then zoom out and repeat the same exercise for a region with the lowest gene expression values (in bright green).

2

Experiment with the export functionality in TreeView (File>Export) by exporting both the full visualization and a selected region to at least two formats. Discuss potential limitations in the process and the completeness of the output.

3

Remove the SupplFigure6.gtr and SupplFigure6.atr files from the folder containing the SupplFigure6.cdt file. Open the SupplFigure6.cdt file into a new instance of TreeView 3.0 and comment on the generated visualization.

Key Points