Data science is becoming increasingly important in academia and industry. University departments — from biology to economics — are turning to data science more and more, and businesses that face big-data problems are working in partnership with the academic institutions that are equipped to solve them.
However, many outlets have identified a talent gap in data science, with industry and academia facing challenges in recruiting data scientists that have the skills to address their needs.
What I’m seeing is tremendous skills in the continuous deployment world not matched with skills in configuring data science devices. The one thing I try to tell students is that there are far more choices in data science technologies than what most people use.
Other panelists were Dr. Amy Apon, Professor and Chair of the Division of Computer Science in the School of Computing at Clemson University in South Carolina; Dr. Johann-Christoph Freytag, Full Professor for Databases and Information Systems (DBIS) at the Computer Science Department of the Humboldt-Universität zu Berlin; and Dr. John Preston, Interim Dean of College of Computing and Software Engineering at Kennesaw State University in Georgia.
Dr. Freytag shared similar observations to Dr. Menzies and added:
My concern is people only knowing how to turn knobs and then thinking things fall into place as if by magic. We have to teach people the basics of how things really work. We need people who really understand the basics and understand this is not magic but also needs preparation, modeling and a lot of sophisticated steps before you can really come out with something that stands.
Dr Apon, meanwhile, explained how her team has been helping industry bridge the skills gap. “A large number of industry partners will approach us with research problems that are hard problems that they want solved,” she said, and they want to recruit her students to solve them.
In response, faculty at Clemson have created a new kind of senior capstone class in which teams of students work on problems motivated by industry. Each problem has an industry champion that helps define the problem and then mentors the students. The approach allows students to learn not only the approaches and skills they need to graduate but also how to apply them to problems of interest to potential employers.
In Germany the situation is sometimes reversed, according to Dr. Freytag, with mixed results:
I have seen companies in Germany who request that people from academia give courses and are therefore trying to introduce these ideas more from an academic point of view. It’s a double-edged sword, meaning … the professors are very high up in the clouds and then the people who listen don’t really know how to apply that knowledge. We need lecturers … who understand both sides of the coin: the academic research technology and at the same time the way it is put into tools that can be used.
Dr. Preston and his team at Kennesaw State University approached the problem using data science itself to determine how to maximize the success of their students:
We use data and analytics to track where our students are and how they’re being successful. We’ve done an analysis in the past year that tracks students flowing through our degree programs and … the bottlenecks and where are they tripping up, and we’re changing our curriculum based on that; so not only can we help the … students learn but we can be efficient with the resources that we have. It’s completely transforming how we’re offering our programs in computing.
This annual event is an important part of HPCC Systems’ outreach efforts. According to Ann Gabriel, VP of Academic and Research Relations for Elsevier‘s Global Strategic Networks, “HPCC Community Day creates welcome opportunities to highlight use cases and effective ways to leverage the technology and, most importantly, to network across a vibrant and innovative group of professionals and researchers.”
HPPC Systems itself is one of the most important tools for leveraging big-data collaborations.
It’s an open source big-data analytical system originally designed and built as part of the infrastructure of Elsevier’s sister company LexisNexis Risk Solutions but later released to the open source community. “As part of our outreach efforts,” said Dr. Flavio Villanustre, who leads HPCC Systems and is also VP of Technology for LexisNexis Risk Solutions, “we have numerous collaboration initiatives with a number of higher eduction institutions and researchers around the world.”
Retaining talent when students “disappear into industry”
With Gabriel moderating, the panel also discussed the difficulty of retaining student talent, how decreased governmental funding is affecting the field, and how the growth of the field affects those issues.
As a result of the talent gap, Gabriel pointed out, it can be difficult for universities to retain talented students when industry demand for their skills is so high.
“Most of my grad students disappear into industry,” said Dr. Menzies. “It’s either the greatest success or the greatest problem that I have.”
A similar problem at Clemson faces Dr. Apon. “CMU is a campus that resonates in particular, where they have Google and Uber setting up shops on campus which is helpful in one way but not so helpful in another because the talent is then not retained.”
While the demand for data science projects and skilled data scientists has continued to increase, funding for such project from agencies such as the National Science Foundation has stagnated. The panelists seemed undaunted, however:
“The outlook appears flat for federal funding,” said Dr. Apon. “What that means is that because of the increased number of students who are coming to university that it’s increasingly important for the universities to be able to leverage our partners like HPCC Systems to help fund and support research.”
Dr. Apon continued, “we’ve had some really interesting projects where we’re looking at a data problem from a company, say an automobile manufacture or an agricultural company, and then a technology provider like HPCC Systems, and those synergies can be really exciting.”
While agreeing, Dr. Freytag advised caution about industry partnerships. “You have to be very careful about finding the right partners,” he said. He pointed to Humboldt University’s initiative in partnership with Elsevier and LexisNexis, the Humboldt Elsevier Advanced Data & Text (HEADT) Centre. In such a partnership, Dr. Freytag explained, researchers receive their funding from industry and are able to confront real-world problems. “From my side,” he said, “these are the most fun projects because you have a lot of freedom, but on the other hand you have a receiving partner, and the partner is usually not a passive one.”
Effects of growth
The explosive growth of data science has created new opportunities for researchers, and it has also allowed them to make their approaches ever more interdisciplinary, as other university departments as well as industry increasingly require big-data solutions.
“We’re seeing a lot of interdisciplinary and applied courses and degree programs to meet the needs of data analytics,” said Dr. Apon. “We’ve created a new degree program in biomedical informatics and analytics that’s joint with the Medical University of South Carolina.”
Dr. Preston believes that for data science to have meaning, it must be interdisciplinary. “I’m keen to say that computing by itself is a box on a desk,” he said. “So how are we partnering across the university into the College of Business or the College of the Arts, the College of Humanities, but also outside as well, working with industry partners, solving their data problems in our labs?”
Dr. Freytag agreed. In Berlin, he explained four major universities have teamed up to create the Einstein Center Digital Future, an inter-university digital research nucleus. The aim of the center is to to foster innovative, cutting-edge interdisciplinary research, and to provide outstanding training for talented young scholars. “It is really more or less like an incubator,” he said, “trying to get new talent, new professors, new thinkers outside of computer science. Because I believe, and I think that’s shared by many of my colleagues, data sciences is not something about computer science alone; its more about the domains where things get applied.”
Putting the science into data science
While hailing the interdisciplinary approach, and academia’s partnerships with industry, Dr. Menzies argued that there was an even more compelling need at the heart of the field:
When I think about science, it’s a community of people collecting and curating and critiquing a set of ideas and everyone doing each other the courtesy to try and prove their ideas. Most data science is not science. People produce conclusions, and that’s the end of the story. Their conclusions aren’t registered. We don’t have anomaly detectors that tell us when our conclusions go out of date. We don’t have incremental methods for doing minimum change to old models to produce new models that don’t go out of date.
“I think the model we most need in data science,” said Dr. Menzies, “is science.”