‘Data in Brief’ articles make reproducibility a reality
The open access journal Genomics Data helps researchers make the most of their data
By Paige Shaklee, PhD Posted on 5 March 2014
Update: Two new apps are now available for Genomics Data. Read about them at the end of this article.[divider]
In genomics, big data really is big. Genomic datasets quickly consume terabytes of computer storage with more information than we have the capacity to fully analyze or understand. As a result, scientists will often publish an article about only a small piece or particular aspect of their data, leaving the rest ripe for interpretation by others. In the words of Richard Young, group leader at the Whitehead Institute for Biomedical Research at MIT, "Information isn't the bottleneck for discovery, interpretation is."
Although this precious genomic data is uploaded into public repositories, sadly, few people dare to touch the data because it is too complicated to understand. Data files are often mislabeled, data may be raw or analyzed and analysis from dataset to dataset is highly variable, experimental subtleties are not mentioned, and software code used to filter through data is not available.
The lack of reproducibility has far-reaching consequences.
In a recent Nature news article, Dr. Francis Collins, Director of the National Institutes of Health (NIH), and Lawrence Tabak, Deputy Director of the NIH, mention that there are "a troubling frequency of published reports that claim a significant result, but fail to be reproducible."
Dr. John Greally, Professor at Albert Einstein College of Medicine in New York and Editor of the Genome Analysis and Tools Section of the journal Genomics, gave one example from his work. "It is impossible to truly review genomic research grant proposals," he said, because there are so many variables in how genomic datasets are acquired, presented and analyzed. While research articles are an essential step forward in advancing science, they seem to fall short in being able to describe data well enough to make it reproducible.
That's where Genomics Data, one of Elsevier's new open access journals, comes in. It provides an avenue for researchers to bring their data – along with the details necessary to understand and reuse the data – to the forefront.
The journal's signature "Data in Brief" articles describe publicly available genomic datasets thoroughly so the data can be easily found, reproduced, reused and reanalyzed. Data in Brief are intended to supplement a research article, describing all of the nitty-gritty details that are essential to understanding the data.
One Genomics Data Editor Dr. Jessica Mar, Assistant Professor in Albert Einstein's Department of Systems & Computational Biology, pointed out that this journal will fill an important need in her work. Currently, she said, essential "clinical variables such as blood pressure and blood sugar" don't accompany the publicly available genomic datasets from diabetes patients she examines, but these variables are key to being able to compare different datasets and gather new insights from the data.
Now, these kinds of accompanying details must be documented in a Specifications table at the top of each Data in Brief. The journal's Editorial Board also checks that any related software or programming code is submitted alongside the Data in Brief.
Because Data in Brief are reviewed by the editorial board, authors receive a decision quickly: the average time for a decision on the manuscript is one week. As a result, researchers can get the word out about their datasets quickly, driving more traffic to their data and to any research article that may discuss interpretations of that data.
We hope Data in Brief will be a big step forward in making genomic data sharing and reproduction a reality.
Data in Brief — how it works
Two essential components of genomic research are:
- The data, available in a public repository: supports your research article but is not published or copyrighted as a part of that research article.
- The Research Article: an interpretation of the data.
The Data in Brief articles support these elements by providing a thorough description of the data, including quality-control checks and base-level analysis.
Data in Brief articles:
- Thoroughly describe data, facilitating reproducibility.
- Make deposited genomic data easier to find.
- Increase traffic towards associated research articles and data, leading to more citations.
- Open up doors for new collaborations.
The first published Data in Brief articles are freely available in Genomics Data on ScienceDirect.
Submit your own Data in Brief
To submit a Data in Brief article to Genomics Data:
1. Fill in this template.
Update: 2 new apps for Genomics Data
The Interactive Phylogenetic Tree Viewer provides functionality for interactive exploration of phylogenetic data as submitted by authors. Using the viewer, it is possible to zoom in and out of certain tree areas, change the tree layout, search in the tree, view bootstrap values, export tree data, and collapse and expand tree nodes. Tree data in Newick and NeXML formats have to be uploaded by the authors with their articles.
The Gene Expression Omnibus (GEO) app identifies all GEO accession numbers mentioned in the online article, displays a summary overview for each data entry next to the article, and provides a link to a complete GEO record. Unique GEO accession numbers are provided by the authors.
Two articles in Genomics Data feature the GEO app:
Elsevier Connect Contributor
Dr. Paige Shaklee made her way from studying physics at Colorado School of Mines to nanoscience at TU Delft to biophysics at Leiden University, where she received her PhD. After doing postdoctoral research in Biochemistry at Stanford University, she joined Cell Press in 2011 as the Editor of Trends in Biotechnology. Last year, she joined Elsevier's biochemistry publishing team as a Publisher for the Genomics portfolio. She is based in Cambridge, Massachusetts.