The Web has become the largest decentralised database the world has ever seen, but problems with reliability, accessibility and "socialisation of that data" stop it from being as universally useful as it could be.
That's the premise of Alice Bonasio, Communications Manager for Mendeley, who goes on to describe a large-scale project to tackle this issue.
To set the stage, Bonasio cites Tim Berners-Lee, who envisioned the next generation of the World Wide Web as a "Giant Global Graph" with what we now call Big Data available in a context people could easily access and take advantage of. In other words, it would be social. Then she quotes Dr. Gerhard Weikum, Research Director at the Max Planck Institute for Computer Science, who proposed bringing Big Data and Open Data together, creating Linked Open Data.
Their ideals, however, have yet to become reality. As Bonasio explains:
Major administrative authorities already publish their statistical data in a Linked Data aware format, but the actual value of these datasets is not unleashed or fully exploited, because data needs context to be of value, and "socialising" is what provides such context. One example of this is the Digital Agenda EU Portal, which has a huge number of datasets on important European indicators, but does not allow people to share their findings or to discuss its interpretations. This means that the context, which gives the data most of its meaning, is simply missing.
That is the problem a group of EU-funded researchers is taking on, together with industry partners that include Mendeley, the London-based company that created the Mendeley research collaboration platform and was acquired by Elsevier a year ago. The group launched an open beta version of 42-Data, which is the main output of the CODE project.
Their goal is to essentially create a 'flea market for research data' by combining crowd sourced workflows with offline statistical data. This would create a Linked Open Data cloud capable of generating customised datasets to backup and answer all manner of research questions.
Scientific articles are obviously the perfect fodder for this cloud database, but they come with a major problem attached: most papers are in PDF format, which means that it's difficult (not to say impossible) to extract the primary research data contained in tables and figures. The CODE project, which Mendeley participates in, addresses this by reverse-engineering the paper to extract this information in a format that can then be easily processed and analysed.
Quoting key players in the partnership, Bonasio goes on to explain how the project works and how it will enable researchers to find data and make sense of it much more easily.