Join us as Elsevier data scientists present at Columbia University

Seminar explored semi-automated extraction of data in scientific tables — watch the video

Ron Daniel, PhD, Director of Elsevier Labs, presented with his colleague Corey Harper at the Columbia University Data Science Institute. (Photo by Alison Bert)

Update: Watch the video and view the slides.

Semi-automated Exploration and Extraction of Data in Scientific Tables from Elsevier

Check out our technology careers

Update: Snapshots from the seminar

October 2, 2018

Akshata Patel and her classmates are MS students in Data Science at Columbia University. (Photo by Alison Bert)

Akshata Patel, who’s working on an MS in data science at Columbia University, said the presentation gave her ideas on how to create better tables for her own research papers. She said she was impressed by the extensive work Elsevier has done in analyzing the large number of papers on particular topics and also by the Chrome extension they developed to that enables publishers to correct their tables and easily integrate it to the databases maintained by Elsevier. She especially liked the interactive world map of plastics in the oceans from an oceanography paper.

“You could understand the gist of the paper with a single view,” she explained.

Ron Daniel talks with Simran Lamba, who graduated recently with an MS in Data Science from Columbia University. (Photo by Alison Bert)

Afterwards, Simran Lamba, an NLP data scientist for Revelio Labs who recently received her MS  in data science from Columbia University, asked Ron Daniel what kind of natural language processing his team uses to find relationships between entities. For her own work, she wants to use NLP to create vector representations of words. When embedded into articles, these word vectors can map semantic relationships among articles.

Corey Harper, a data scientist and Research Technology Director from Elsevier Labs, talks about his team's work with semi-automated extraction of data from scientific tables. (Photo by Alison Bert) Students in Columbia University's MS in Data Science program talk with Corey Harper after his presentation. (Photo by Alison Bert) (Photo by Alison Bert)

The seminar

On Wednesday, September 26, three of Elsevier's data science experts will present as part of Columbia University's Data Science Institute Industry-Innovation Seminar series, known as I3. The presenters are from Elsevier Labs, an advanced technology and R&D group within Elsevier that concentrates on smart content and on the future of scholarly communications.

The seminar is from 5 to 6 pm EDT at Columbia University, followed by a reception. Dr. Ron Daniel, Director of Elsevier Labs, describes the presentation, titled Semi-automated exploration and extraction of data in scientific tables:

"Most of the experimental results reported in scientific articles, and recorded in databases or in supplements to the article, are provided in tables. Unfortunately, the amazing recent progress in natural language understanding is of little help if we want to automatically understand those tables. Tables are, after all, not your grandmother’s natural language. Despite this, we believe significant progress can be made towards the goal of combining tables of related information into larger sets that can be analyzed, visualized, understood, and used as the basis for decisions.

"Elsevier Labs is prototyping tools to help guide people in the exploration of tables from many articles and the extraction and merging of the data they contain. This talk will show examples of what has been accomplished by manually merging such data. With those as examples of the desired outcomes, we will describe our experiments to duplicate such examples, the work flow in which they operate, and our most recent results."

Watch the livestream — or attend in person

Presentation: Semi-automated exploration and extraction of data in scientific tables, with Elsevier Labs

When: Wednesday, Sept. 26, from 5 to 6 EDT, followed by a 30-minute reception

Where: Columbia University's Davis Auditorium, Room 412 on the fourth floor of Shapiro CEPSR, 530 W. 120 Street, New York City

Online: Watch the livestream here

The presenters

Dr. Ron Daniel is the Director of Elsevier Labs. Educated as an electrical engineer, Ron has done extensive work on metadata standards such as the Dublin Core, RDF, and PRISM. Before joining Elsevier in 2010, he worked at a startup that was acquired for its automatic classification technology, and consulted on taxonomy and information management issues for nine years. Ron received his PhD in Electrical Engineering from Oklahoma State University and was a postdoctoral researcher at Cambridge University and the Los Alamos National Laboratory. Ron is bemused by the way technology reincarnates itself, specifically in the way that parallel implementations of neural networks for machine vision are currently in vogue, just as they were 30 years ago when he was working on them in grad school.

Corey Harper is a data scientist and Technology Research Director for Elsevier Labs. He spent nearly 15 years building digital libraries, administering library systems, and managing library metadata. He has held metadata librarian positions at both New York University and the University of Oregon, where his research focused on linked data, digital repositories, and library discovery. His current research interests include natural language processing, machine learning, predictive analytics, and data visualization with applications toward issues around research communications. In addition, he is involved in both the Digital Public Library of America (DPLA) and code4libcommunities. Corey has an MBA from NYU's Stern School of Business and an MSLS from the University of North Carolina.

Check out our technology careers


Written by

Alison Bert, DMA

Written by

Alison Bert, DMA

As Executive Editor of Strategic Communications at Elsevier, Dr. Alison Bert works with contributors around the world to publish daily stories for the global science and health communities. Previously, she was Editor-in-Chief of Elsevier Connect, which won the 2016 North American Excellence Award for Science & Education.

Alison joined Elsevier in 2007 from the world of journalism, where she was a business reporter and blogger for The Journal News, a Gannett daily newspaper in New York. In the previous century, she was a classical guitarist on the music faculty of Syracuse University. She received a doctorate in music from the University of Arizona, was Fulbright scholar in Spain, and studied in a master class with Andrés Segovia.


comments powered by Disqus