1 Introduction Zoe Lacroix and Terence Critchlow 1.1 Overview 1.2 Problem and Scope 1.3 Biological Data Integration 1.4 Developing a Biological Data Integration System 1.4.1 Specifications 1.4.2 Translating Specifications into a Technical Approach 1.4.3 Development Process 1.4.4 Evaluation of the System References
2 Challenges Faced in the Integration of Biological Information Su Yun Chung and John C. Wooley 2.1 The Life Science Discovery Process 2.2 An Information Integration Environment for Life Science Discovery 2.3 The Nature of Biological Data 2.3.1 Diversity 2.3.2 Variability 2.4 Data Sources in Life Science 2.4.1 Biological Databases Are Autonomous 2.4.2 Biological Databases Are Heterogeneous in Data Formats 2.4.3 Biological Data Sources Are Dynamic 2.4.4 Computational Analysis Tools Require Specific Input/Output Formats and Broad Domain Knowledge 2.5 Challenges in Information Integration 2.5.1 Data Integration 2.5.2 Meta-Data Specification 2.5.3 Data Provenance and Data Accuracy 2.5.4 Ontology 2.5.5 Web Presentations Conclusion References
3 A Practitioner's Guide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman 3.1 Introduction 3.2 Data Management in Bioinformatics 3.2.1 Data Management Basics 3.2.2 Two Popular Data Management Strategies and Their Limitations 3.2.3 Traditional Database Management 3.3 Dimensions Describing the Space of
Life science data integration and interoperability is one of the most challenging problems facing bioinformatics today. In the current age of the life sciences, investigators have to interpret many types of information from a variety of sources: lab instruments, public databases, gene expression profiles, raw sequence traces, single nucleotide polymorphisms, chemical screening data, proteomic data, putative metabolic pathway models, and many others. Unfortunately, scientists are not currently able to easily identify and access this information because of the variety of semantics, interfaces, and data formats used by the underlying data sources.
Bioinformatics: Managing Scientific Data tackles this challenge head-on by discussing the current approaches and variety of systems available to help bioinformaticians with this increasingly complex issue. The heart of the book lies in the collaboration efforts of eight distinct bioinformatics teams that describe their own unique approaches to data integration and interoperability. Each system receives its own chapter where the lead contributors provide precious insight into the specific problems being addressed by the system, why the particular architecture was chosen, and details on the system's strengths and weaknesses. In closing, the editors provide important criteria for evaluating these systems that bioinformatics professionals will find valuable.
- Provides a clear overview of the state-of-the-art in data integration and interoperability in genomics, highlighting a variety of systems and giving insight into the strengths and weaknesses of their different approaches.
- Discusses shared vocabulary, design issues, complexity of use cases, and the difficulties of transferring existing data management approaches to bioinformatics systems, which serves to connect computer and life scientists.
- Written by the primary contributors of eight reputable bioinformatics systems in academia and industry including: BioKris, TAMBIS, K2, GeneExpress, P/FDM, MBM, SDSC, SRS, and DiscoveryLink.
Bioinformaticians involved in data management (development, design, management, etc) at corporations and research companies. CS and life science students in bioinformatics programs.
- No. of pages:
- © Morgan Kaufmann 2004
- 18th July 2003
- Morgan Kaufmann
- eBook ISBN:
- Hardcover ISBN:
An exciting compilation that addresses the key issues in biological data management. -Sylvia Spengler, Lawrence Berkeley National Laboratory
Dr. Zoé Lacroix is currently a Research Assistant Professor at Arizona State University. She received a Ph.D. in Computer Science in 1996 from the University of Paris XI (France). Her research interests cover various aspects of data management. She has published over twenty journal articles, conference papers, and book chapters. She also has served in numerous conference program committees, she has organized several panels and workshops, and she was an active member in the working groups XML Query Language and XML Forms at the World Wide Web Consortium (W3C). Dr. Lacroix has been involved in bioinformatics for over seven years. She has interacted with the Center of Bioinformatics at the University of Pennsylvania, and worked for two biotech companies: Gene Logic Inc. and SurroMed Inc. Her contributions in bioinformatics include publications, invited talks (Symposium on Bioinformatics organized at the National University of Singapore) and data integration middlewares such as the Object-Web Wrapper currently used at SmithKlineGlaxo.
Arizona State University, USA
Dr. Terence Critchlow is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, and leads the DataFoundry project. His involvement in bioinformatics began over seven years ago as part of a collaboration between the University of Utah Computer Science department and the Utah Human Genome Center. Since completing his dissertation and joining LLNL in 1997, he has been an active member of the research community publishing in both computer science and informatics forums, giving invited talks, participating in program committees, and organizing the XML Enabled Searches in Bioinformatics workshop.
Lawrence Livermore National Laboratory, Livermore, CA, USA