COVID-19 Update: We are currently shipping orders daily. However, due to transit disruptions in some geographies, deliveries may be delayed. To provide all customers with timely access to content, we are offering 50% off Science and Technology Print & eBook bundle options. Terms & conditions.
Open Source Software in Life Science Research - 1st Edition - ISBN: 9781907568978, 9781908818249

Open Source Software in Life Science Research

1st Edition

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

Editors: Lee Harland Mark Forster
eBook ISBN: 9781908818249
Hardcover ISBN: 9781907568978
Imprint: Woodhead Publishing
Published Date: 31st October 2012
Page Count: 582
Sales tax will be calculated at check-out Price includes VAT/GST
Price includes VAT/GST

Institutional Subscription

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Table of Contents


List of figures and tables


About the editors

About the contributors


Chapter 1: Building research data handling systems with open source tools


1.1 Introduction

1.2 Legacy

1.3 Ambition

1.4 Path chosen

1.5 The ‘ilities

1.6 Overall vision

1.7 Lessons learned

1.8 Implementation

1.9 Who uses LSP today?

1.10 Organisation

1.11 Future aspirations

Chapter 2: Interactive predictive toxicology with Bioclipse and OpenTox


2.1 Introduction

2.2 Basic Bioclipse-OpenTox interaction examples

2.3 Use Case 1: Removing toxicity without interfering with pharmacology

2.4 Use Case 2: Toxicity prediction on compound collections

2.5 Discussion

2.6 Availability

Chapter 3: Utilizing open source software to facilitate communication of chemistry at RSC


3.1 Introduction

3.2 Project Prospect and open ontologies

3.3 ChemSpider

3.4 ChemDraw Digester

3.5 Learn Chemistry Wiki

3.6 Conclusion

3.7 Acknowledgments

Chapter 4: Open source software for mass spectrometry and metabolomics


4.1 Introduction

4.2 A short mass spectrometry primer

4.3 Metabolomics and metabonomics

4.4 Data types

4.5 Metabolomics data processing

4.6 Metabolomics data processing using the open source workflow engine, KNIME

4.7 Open source software for multivariate analysis

4.8 Performing PCA on metabolomics data in R/KNIME

4.9 Other open source packages

4.10 Perspective

4.11 Acknowledgments

Chapter 5: Open source software for image processing and analysis: picture this with ImageJ


5.1 Introduction

5.2 ImageJ

5.3 ImageJ macros: an overview

5.4 Graphical user interface

5.5 Industrial applications of image analysis

5.6 Summary

Chapter 6: Integrated data analysis with KNIME


6.1 The KNIME platform

6.2 The KNIME success story

6.3 Benefits of 'professional open source'

6.4 Application examples

6.5 Conclusion and outlook

6.6 Acknowledgments

Chapter 7: Investigation-Study-Assay, a toolkit for standardizing data capture and sharing


7.1 The growing need for content curation in industry

7.2 The BioSharing initiative: cooperating standards needed

7.3 The ISA framework – principles for progress

7.4 Lessons learned

7.5 Acknowledgments

Chapter 8: GenomicTools: an open source platform for developing high-throughput analytics in genomics


8.1 Introduction

8.2 Data types

8.3 Tools overview

8.4 C++ API for developers

8.5 Case study: a simple ChIP-seq pipeline

8.6 Performance

8.7 Conclusion

8.8 Resources

Chapter 9: Creating an in-house ’omics data portal using EBI Atlas software


9.1 Introduction

9.2 Leveraging ’omics data for drug discovery

9.3 The EBI Atlas software

9.4 Deploying Atlas in the enterprise

9.5 Conclusion and learnings

9.6 Acknowledgments

Chapter 10: Setting up an ’omics platform in a small biotech


10.1 Introduction

10.2 General changes over time

10.3 The hardware solution

10.4 Maintenance of the system

10.5 Backups

10.6 Keeping up-to-date

10.7 Disaster recovery

10.8 Personnel skill sets

10.9 Conclusion

10.10 Acknowledgements

Chapter 11: Squeezing big data into a small organisation


11.1 Introduction

11.2 Our service and its goals

11.3 Manage the data: relieving the burden of data-handling

11.4 Organising the data

11.5 Standardising to your requirements

11.6 Analysing the data: helping users work with their own data

11.7 Helping biologists to stick to the rules

11.8 Running programs

11.9 Helping the user to understand the details

11.10 Summary

Chapter 12: Design Tracker: an easy to use and flexible hypothesis tracking system to aid project team working


12.1 Overview

12.2 Methods

12.3 Technical overview

12.4 Infrastructure

12.5 Review

12.6 Acknowledgements

Chapter 13: Free and open source software for web-based collaboration


13.1 Introduction

13.2 Application of the FLOSS assessment framework

13.3 Conclusion

13.4 Acknowledgements

Chapter 14: Developing scientific business applications using open source search and visualisation technologies


14.1 A changing attitude

14.2 The need to make sense of large amounts of data

14.3 Open source search technologies

14.4 Creating the foundation layer

14.5 Visualisation technologies

14.6 Prefuse visualisation toolkit

14.7 Business applications

14.8 Other applications

14.9 Challenges and future developments

14.10 Reflections

14.11 Thanks and Acknowledgements

Chapter 15: Utopia Documents: transforming how industrial scientists interact with the scientific literature


15.1 Utopia Documents in industry

15.2 Enabling collaboration

15.3 Sharing, while playing by the rules

15.4 History and future of Utopia Documents

Chapter 16: Semantic MediaWiki in applied life science and industry: building an Enterprise Encyclopaedia


16.1 Introduction

16.2 Wiki-based Enterprise Encyclopaedia

16.3 Semantic MediaWiki

16.4 Conclusion and future directions

16.5 Acknowledgements

Chapter 17: Building disease and target knowledge with Semantic MediaWiki


17.1 The Targetpedia

17.2 The Disease Knowledge Workbench (DKWB)

17.3 Conclusion

17.4 Acknowledgements

Chapter 18: Chem2Bio2RDF: a semantic resource for systems chemical biology and drug discovery


18.1 The need for integrated, semantic resources in drug discovery

18.2 The Semantic Web in drug discovery

18.3 Implementation challenges

18.4 Chem2Bio2RDF architecture

18.5 Tools and methodologies that use Chem2Bio2RDF

18.6 Conclusions

Chapter 19: TripleMap: a web-based semantic knowledge discovery and collaboration application for biomedical research


19.1 The challenge of Big Data

19.2 Semantic technologies

19.3 Semantic technologies overview

19.4 The design and features of TripleMap

19.5 TripleMap Generated Entity Master ('GEM') semantic data core

19.6 TripleMap semantic search interface

19.7 TripleMap collaborative, dynamic knowledge maps

19.8 Comparison and integration with third-party systems

19.9 Conclusions

Chapter 20: Extreme scale clinical analytics with open source software


20.1 Introduction

20.2 Interoperability

20.3 Mirth

20.4 Mule ESB

20.5 Unified Medical Language System (UMLS)

20.6 Open source databases

20.7 Analytics

20.8 Final architectural overview

Chapter 21: Validation and regulatory compliance of free/open source software


21.1 Introduction

21.2 The need to validate open source applications

21.3 Who should validate open source software?

21.4 Validation planning

21.5 Risk management and open source software

21.6 Key validation activities

21.7 Ongoing validation and compliance

21.8 Conclusions

Chapter 22: The economics of free/open source software in industry


22.1 Introduction

22.2 Background

22.3 Open source innovation

22.4 Open source software in the pharmaceutical industry

22.5 Open source as a catalyst for pre-competitive collaboration in the pharmaceutical industry

22.6 The Pistoia Alliance Sequence Services Project

22.7 Conclusion



The free/open source approach has grown from a minor activity to become a significant producer of robust, task-orientated software for a wide variety of situations and applications. To life science informatics groups, these systems present an appealing proposition - high quality software at a very attractive price. Open source software in life science research considers how industry and applied research groups have embraced these resources, discussing practical implementations that address real-world business problems.

The book is divided into four parts. Part one looks at laboratory data management and chemical informatics, covering software such as Bioclipse, OpenTox, ImageJ and KNIME. In part two, the focus turns to genomics and bioinformatics tools, with chapters examining GenomicsTools and EBI Atlas software, as well as the practicalities of setting up an ‘omics’ platform and managing large volumes of data. Chapters in part three examine information and knowledge management, covering a range of topics including software for web-based collaboration, open source search and visualisation technologies for scientific business applications, and specific software such as DesignTracker and Utopia Documents. Part four looks at semantic technologies such as Semantic MediaWiki, TripleMap and Chem2Bio2RDF, before part five examines clinical analytics, and validation and regulatory compliance of free/open source software. Finally, the book concludes by looking at future perspectives and the economics and free/open source software in industry.

Key Features

  • Discusses a broad range of applications from a variety of sectors
  • Provides a unique perspective on work normally performed behind closed doors
  • Highlights the criteria used to compare and assess different approaches to solving problems


Information and informatics processionals across the field of life sciences; Non-IT professionals with a remit concerning business practices and new technologies


No. of pages:
© Woodhead Publishing 2012
31st October 2012
Woodhead Publishing
eBook ISBN:
Hardcover ISBN:

Ratings and Reviews

About the Editors

Lee Harland

Lee Harland is currently leading the information engineering group at Pfizer – a group tasked with developing cutting edge software that helps scientists use internal and external information more effectively. He is also leading member of the pharma-industry pre-competitive group, the Pistoia Alliance, and has 13 years’ experience in bioinformatics, software development and information science within major Pharma.

Affiliations and Expertise


Mark Forster

Mark Forster is currently a senior information domain specialist within the Syngenta R&D Information Systems (RDIS) group, supporting R&D scientists in the fields of small molecule discovery and development, plant breeding and biotechnology. He has 15 years of industrial experience in scientific software development, deployment and support in the US and the UK.

Affiliations and Expertise

Syngenta, UK