Editor’s note: In his recent talk at the Bio-IT World Conference (Bio-IT 18), Zen and the Art of Data Science Maintenance, Elsevier’s Dr. Jabe Wilson used examples of data science undertaken at Elsevier to illustrate why data science is an art – and how best to support data scientists based on this insight. You can see his slide presentation at the end of this story.
The idea of programming as an art – and by extension data science – is not new; it can be traced right back to the founding of the discipline of computer science. It is mentioned in the 1959 statement of purposes at the founding of the Communication of the ACM), and put forward as a positive characteristic in a famous article from 1974:
We have seen that computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.
More recently, similar realizations have been made for the field of data science, such as the recent article in Forbes, which refers to data science as an art.
In my talk at Bio-IT, I shared some lessons from the experience of data science at Elsevier: specifically, why it is an art and how to support it. The important thing is that we need to support programming and data science as it is experienced rather than with an idealized view of the practice.
That data science is an art seems obvious to me due to my experiences growing up in a household of artists and programming from an early age. The art-based approach is where you understand a problem more fully over time by progressively working on alternative solutions, which lead to greater understanding of the problem itself. Watching my parents work on a painting over several weeks and seeing the canvas change as they worked towards their vision for the work has many parallels to trying out lines of code and testing different algorithms.
The image for this article is my dad’s artist studio from the 1970s, and I think it illustrates the creative workplace nicely.
I studied artificial intelligence (AI) at the multidisciplinary Centre for Cognitive Sciences at the University of Sussex and taught interaction design at the Royal College of Art, so I can see the strong association between the creative arts, programming and data science.
The reason data science can be described as an art is because of the need to adopt an exploratory workflow (similar ideas about artist-design and engineering-design as applied to software design were expressed by my colleague Gillian Crampton-Smith at the Royal College of Art in the mid-1990s). There are a number of challenges you face as you work:
- First, you have to start with clearly defining problem that may initially be ill-defined.
- Then you have to identify and work on preparing the relevant data (described as data curation and feature engineering).
- Next you have to choose the algorithmic approach you take.
- And finally, you might have to adjust these elements based on your experience of running your system.
This keeps you busy working, and to paraphrase Pablo Picasso, that is when inspiration can find you!
These themes of multidisciplinary teams working collaboratively and the creative approach to problem solving are present in the work I do at Elsevier and the teams I work with. This is also the approach being taken across Elsevier and RELX – recently voted one of the world’s most innovate companies by Forbes.
3 principles of supporting creative data science
Understanding and thereby supporting data science practice is a critical driver to the work we do in building new scientifically nuanced platforms. It also underpins our work on the Professional Services team as we provide consulting services to pharmaceutical companies and commercial R&D organizations.
You can summarize our approach in three principles:
- Good data: cleaned and curated to remove noise, including curation and feature engineering such as scaling or reducing dimensionality. We spend a great deal of effort on data curation.
- Right data: to have enough of the relevant data for your hypothesis to be able to build a predictive model. Problem description is important (defining the hypothesis or model you want to explore) as is the choice of algorithmic approach (Are you choosing Naïve Bayes, support vector machines or logistic regression?).
- In-time data: avoiding the opportunity cost of waiting hours between process steps. Create a platform that brings this together so you have the information at your fingertips when you need it.
In my opinion, where you see examples of data science failing (like the well publicized MD Anderson example), it comes down to failings across the range of these factors, or as was said recently in the Harvard Business Review –“if your data is bad, your machine learning tools are useless.”
Data science in practice: examples from Elsevier
At Elsevier, we use data science extensively in operations. Our Elsevier Labs team operates on the cutting edge exploring emerging techniques; and our Professional Services team brings all of this together to create bespoke solutions to customer problems.
Creative data science is delivered through a multidisciplinary work place – at Elsevier we have domain experts, specialists in creating taxonomies and ontologies, and folks that focus on how best to apply the right algorithms that make up AI approaches.
This is our culture around data science and AI, as Dan Olley, CIO and EVP of Product Development at Elsevier, talked about recently in Forbes.
Here are some examples of the work we do:
- Rare disease treatment: Our team have taken highly curated data that enables us to make predictions about which drugs can be repurposed to treat rare dieases.
- Translational safety: We have worked alongside Bayer to determine where animal models can be avoided when testing drug toxicity.
- Evidence selection: The Elsevier Labs team has successfully applied neural networks to identify complex evidence-based statements to choose the right data for building data science models.
- Real-world data interpretation: We are investigating how to bring together machine learning classification of text with taxonomies, and alongside images, to deliver learning across multimodal data sets. These approaches create opportunities to develop classification of data sources with unstructured text and unlabeled images from real-world data, such as patient records.
Our Professional Services team – and our users – have benefited from this vision of data science as an art form: starting with the right canvas in terms of good data, getting to the right model (using qualitative data and intuition), and putting the tools at your fingertips so inspiration can find you busily at work, whether you’re identifying candidate drugs for treating rare diseases or reducing the need to use animals in drug safety experiments.
Why this is important for all of us
AI and data science hold the keys to overcoming some of the challenges we face as a society. We face a productivity crisis both in research and commercial R&D, which is hitting economic growth with social implications and limiting our abilities to respond to heath and environmental challenges. People know Elsevier for great content; what they will learn more about is the innovative culture that is exists here in applying data science to advancing science and commercial R&D and thereby addressing the productivity crisis experienced in research and economics.
Elsevier is well placed to take a lead in data science to meet the global challenges we face in productivity. Because as I wrote recently in The Guardian:
It’s important to remember that while AI has great promise, it’s not simply a case of ‘plug and play.’ The use of AI in healthcare will necessitate purpose-built platforms that are not only technologically advanced but scientifically nuanced. Such platforms will require huge volumes of accurate, varied, multidisciplinary data, along with many years of training and algorithm-building by human ‘masters.’
At Elsevier, we’re constantly working to enhance productivity by developing tools to support research and commercial R&D. One of the important ways we do so is by supporting the data scientists who develop these tools.
Human creativity and AI
We need to augment our intelligence, and success is still down to human creativity
The hope is that we can achieve far more by combining human creativity with AI than we can through automation alone. But the very activity of creating AI systems through data science is creative in itself.
An important insight is that human researchers are more efficient using AI systems than working alone – and that together, they are more effective than advanced AI systems working without human guidance.
As James Bridle wrote in The Guardian recently:
While even a mid-level chess computer can today wipe the floor with most grandmasters, an average player paired with an average computer is capable of beating the most sophisticated supercomputer.
Therefore, we need to both support data scientists in the development of AI systems through their creativity in data science, as well as taking the end products and enabling humans to work with these systems to augment their human intelligence to solve problems.
Elsevier is doing important work to support data science. We are building tools that bring data together in ways that are accessible to AI. We have made significant, well curated data sources available and are putting them in a semantic data knowledge hub that can be harvested for insights or as a source of features for machine learning to deliver predictive analytics. We are really excited about the opportunities this brings when our teams are able to work in a dynamic manner that allows them to explore diverse information spaces driven by their intuition and inspiration. We’re using our tools both to improve the quality of the data and to provide tools that allow you to be sure you have the right data for the task at hand.
Watch this space for new announcements on our scientifically nuanced data science platforms.
We’re seeking data scientists
Teams throughout Elsevier use data scientists to develop and improve our products. If you like the sound of the work we are doing to address the productivity crises through data science, come join us!
Elsevier’s Professional Services team
Dr. Jabe Wilson is on the Professional Services team. Here’s more about the team and the work they do.
Elsevier is more than an information provider; we’re a partner in the curation, normalization and integration of scientific and medical data. Composed of experts in life sciences research and informatics solutions, the Professional Services team enables customers to increase R&D productivity and return on information, and reduce the cost of IT support.
Drawing on Elsevier’s 140+ years’ experience in curating and classifying information and data, the Professional Services team partners with customers to design tailored solutions that address specific needs. You can work with Professional Services to:
- Customize data retrieval to pinpoint relevant answers in the vast amount of published scientific literature. (See our Text mining page.)
- Optimize data integration and normalization processes that facilitate comparison and analysis.
- Harmonize internal and third-party resources to create integrated, customized databases.