Open Source Software in Life Science Research

Open Source Software in Life Science Research

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

1st Edition - October 31, 2012

Write a review

  • Editors: Lee Harland, Mark Forster
  • eBook ISBN: 9781908818249
  • Hardcover ISBN: 9781907568978

Purchase options

Purchase options
DRM-free (EPub, PDF, Mobi)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


The free/open source approach has grown from a minor activity to become a significant producer of robust, task-orientated software for a wide variety of situations and applications. To life science informatics groups, these systems present an appealing proposition - high quality software at a very attractive price. Open source software in life science research considers how industry and applied research groups have embraced these resources, discussing practical implementations that address real-world business problems.The book is divided into four parts. Part one looks at laboratory data management and chemical informatics, covering software such as Bioclipse, OpenTox, ImageJ and KNIME. In part two, the focus turns to genomics and bioinformatics tools, with chapters examining GenomicsTools and EBI Atlas software, as well as the practicalities of setting up an ‘omics’ platform and managing large volumes of data. Chapters in part three examine information and knowledge management, covering a range of topics including software for web-based collaboration, open source search and visualisation technologies for scientific business applications, and specific software such as DesignTracker and Utopia Documents. Part four looks at semantic technologies such as Semantic MediaWiki, TripleMap and Chem2Bio2RDF, before part five examines clinical analytics, and validation and regulatory compliance of free/open source software. Finally, the book concludes by looking at future perspectives and the economics and free/open source software in industry.

Key Features

  • Discusses a broad range of applications from a variety of sectors
  • Provides a unique perspective on work normally performed behind closed doors
  • Highlights the criteria used to compare and assess different approaches to solving problems


Information and informatics processionals across the field of life sciences; Non-IT professionals with a remit concerning business practices and new technologies

Table of Contents

  • Dedication

    List of figures and tables


    About the editors

    About the contributors


    Chapter 1: Building research data handling systems with open source tools


    1.1 Introduction

    1.2 Legacy

    1.3 Ambition

    1.4 Path chosen

    1.5 The ‘ilities

    1.6 Overall vision

    1.7 Lessons learned

    1.8 Implementation

    1.9 Who uses LSP today?

    1.10 Organisation

    1.11 Future aspirations

    Chapter 2: Interactive predictive toxicology with Bioclipse and OpenTox


    2.1 Introduction

    2.2 Basic Bioclipse-OpenTox interaction examples

    2.3 Use Case 1: Removing toxicity without interfering with pharmacology

    2.4 Use Case 2: Toxicity prediction on compound collections

    2.5 Discussion

    2.6 Availability

    Chapter 3: Utilizing open source software to facilitate communication of chemistry at RSC


    3.1 Introduction

    3.2 Project Prospect and open ontologies

    3.3 ChemSpider

    3.4 ChemDraw Digester

    3.5 Learn Chemistry Wiki

    3.6 Conclusion

    3.7 Acknowledgments

    Chapter 4: Open source software for mass spectrometry and metabolomics


    4.1 Introduction

    4.2 A short mass spectrometry primer

    4.3 Metabolomics and metabonomics

    4.4 Data types

    4.5 Metabolomics data processing

    4.6 Metabolomics data processing using the open source workflow engine, KNIME

    4.7 Open source software for multivariate analysis

    4.8 Performing PCA on metabolomics data in R/KNIME

    4.9 Other open source packages

    4.10 Perspective

    4.11 Acknowledgments

    Chapter 5: Open source software for image processing and analysis: picture this with ImageJ


    5.1 Introduction

    5.2 ImageJ

    5.3 ImageJ macros: an overview

    5.4 Graphical user interface

    5.5 Industrial applications of image analysis

    5.6 Summary

    Chapter 6: Integrated data analysis with KNIME


    6.1 The KNIME platform

    6.2 The KNIME success story

    6.3 Benefits of 'professional open source'

    6.4 Application examples

    6.5 Conclusion and outlook

    6.6 Acknowledgments

    Chapter 7: Investigation-Study-Assay, a toolkit for standardizing data capture and sharing


    7.1 The growing need for content curation in industry

    7.2 The BioSharing initiative: cooperating standards needed

    7.3 The ISA framework – principles for progress

    7.4 Lessons learned

    7.5 Acknowledgments

    Chapter 8: GenomicTools: an open source platform for developing high-throughput analytics in genomics


    8.1 Introduction

    8.2 Data types

    8.3 Tools overview

    8.4 C++ API for developers

    8.5 Case study: a simple ChIP-seq pipeline

    8.6 Performance

    8.7 Conclusion

    8.8 Resources

    Chapter 9: Creating an in-house ’omics data portal using EBI Atlas software


    9.1 Introduction

    9.2 Leveraging ’omics data for drug discovery

    9.3 The EBI Atlas software

    9.4 Deploying Atlas in the enterprise

    9.5 Conclusion and learnings

    9.6 Acknowledgments

    Chapter 10: Setting up an ’omics platform in a small biotech


    10.1 Introduction

    10.2 General changes over time

    10.3 The hardware solution

    10.4 Maintenance of the system

    10.5 Backups

    10.6 Keeping up-to-date

    10.7 Disaster recovery

    10.8 Personnel skill sets

    10.9 Conclusion

    10.10 Acknowledgements

    Chapter 11: Squeezing big data into a small organisation


    11.1 Introduction

    11.2 Our service and its goals

    11.3 Manage the data: relieving the burden of data-handling

    11.4 Organising the data

    11.5 Standardising to your requirements

    11.6 Analysing the data: helping users work with their own data

    11.7 Helping biologists to stick to the rules

    11.8 Running programs

    11.9 Helping the user to understand the details

    11.10 Summary

    Chapter 12: Design Tracker: an easy to use and flexible hypothesis tracking system to aid project team working


    12.1 Overview

    12.2 Methods

    12.3 Technical overview

    12.4 Infrastructure

    12.5 Review

    12.6 Acknowledgements

    Chapter 13: Free and open source software for web-based collaboration


    13.1 Introduction

    13.2 Application of the FLOSS assessment framework

    13.3 Conclusion

    13.4 Acknowledgements

    Chapter 14: Developing scientific business applications using open source search and visualisation technologies


    14.1 A changing attitude

    14.2 The need to make sense of large amounts of data

    14.3 Open source search technologies

    14.4 Creating the foundation layer

    14.5 Visualisation technologies

    14.6 Prefuse visualisation toolkit

    14.7 Business applications

    14.8 Other applications

    14.9 Challenges and future developments

    14.10 Reflections

    14.11 Thanks and Acknowledgements

    Chapter 15: Utopia Documents: transforming how industrial scientists interact with the scientific literature


    15.1 Utopia Documents in industry

    15.2 Enabling collaboration

    15.3 Sharing, while playing by the rules

    15.4 History and future of Utopia Documents

    Chapter 16: Semantic MediaWiki in applied life science and industry: building an Enterprise Encyclopaedia


    16.1 Introduction

    16.2 Wiki-based Enterprise Encyclopaedia

    16.3 Semantic MediaWiki

    16.4 Conclusion and future directions

    16.5 Acknowledgements

    Chapter 17: Building disease and target knowledge with Semantic MediaWiki


    17.1 The Targetpedia

    17.2 The Disease Knowledge Workbench (DKWB)

    17.3 Conclusion

    17.4 Acknowledgements

    Chapter 18: Chem2Bio2RDF: a semantic resource for systems chemical biology and drug discovery


    18.1 The need for integrated, semantic resources in drug discovery

    18.2 The Semantic Web in drug discovery

    18.3 Implementation challenges

    18.4 Chem2Bio2RDF architecture

    18.5 Tools and methodologies that use Chem2Bio2RDF

    18.6 Conclusions

    Chapter 19: TripleMap: a web-based semantic knowledge discovery and collaboration application for biomedical research


    19.1 The challenge of Big Data

    19.2 Semantic technologies

    19.3 Semantic technologies overview

    19.4 The design and features of TripleMap

    19.5 TripleMap Generated Entity Master ('GEM') semantic data core

    19.6 TripleMap semantic search interface

    19.7 TripleMap collaborative, dynamic knowledge maps

    19.8 Comparison and integration with third-party systems

    19.9 Conclusions

    Chapter 20: Extreme scale clinical analytics with open source software


    20.1 Introduction

    20.2 Interoperability

    20.3 Mirth

    20.4 Mule ESB

    20.5 Unified Medical Language System (UMLS)

    20.6 Open source databases

    20.7 Analytics

    20.8 Final architectural overview

    Chapter 21: Validation and regulatory compliance of free/open source software


    21.1 Introduction

    21.2 The need to validate open source applications

    21.3 Who should validate open source software?

    21.4 Validation planning

    21.5 Risk management and open source software

    21.6 Key validation activities

    21.7 Ongoing validation and compliance

    21.8 Conclusions

    Chapter 22: The economics of free/open source software in industry


    22.1 Introduction

    22.2 Background

    22.3 Open source innovation

    22.4 Open source software in the pharmaceutical industry

    22.5 Open source as a catalyst for pre-competitive collaboration in the pharmaceutical industry

    22.6 The Pistoia Alliance Sequence Services Project

    22.7 Conclusion


Product details

  • No. of pages: 582
  • Language: English
  • Copyright: © Woodhead Publishing 2012
  • Published: October 31, 2012
  • Imprint: Woodhead Publishing
  • eBook ISBN: 9781908818249
  • Hardcover ISBN: 9781907568978

About the Editors

Lee Harland

Lee Harland is currently leading the information engineering group at Pfizer – a group tasked with developing cutting edge software that helps scientists use internal and external information more effectively. He is also leading member of the pharma-industry pre-competitive group, the Pistoia Alliance, and has 13 years’ experience in bioinformatics, software development and information science within major Pharma.

Affiliations and Expertise


Mark Forster

Mark Forster is currently a senior information domain specialist within the Syngenta R&D Information Systems (RDIS) group, supporting R&D scientists in the fields of small molecule discovery and development, plant breeding and biotechnology. He has 15 years of industrial experience in scientific software development, deployment and support in the US and the UK.

Affiliations and Expertise

Syngenta, UK

Ratings and Reviews

Write a review

There are currently no reviews for "Open Source Software in Life Science Research"