Update: Watch the video and view the slides.
Update: Snapshots from the seminar
October 2, 2018
Akshata Patel, who’s working on an MS in data science at Columbia University, said the presentation gave her ideas on how to create better tables for her own research papers. She said she was impressed by the extensive work Elsevier has done in analyzing the large number of papers on particular topics and also by the Chrome extension they developed to that enables publishers to correct their tables and easily integrate it to the databases maintained by Elsevier. She especially liked the interactive world map of plastics in the oceans from an oceanography paper.
“You could understand the gist of the paper with a single view,” she explained.
Afterwards, Simran Lamba, an NLP data scientist for Revelio Labs who recently received her MS in data science from Columbia University, asked Ron Daniel what kind of natural language processing his team uses to find relationships between entities. For her own work, she wants to use NLP to create vector representations of words. When embedded into articles, these word vectors can map semantic relationships among articles.
On Wednesday, September 26, three of Elsevier's data science experts will present as part of Columbia University's Data Science Institute Industry-Innovation Seminar series, known as I3. The presenters are from Elsevier Labs, an advanced technology and R&D group within Elsevier that concentrates on smart content and on the future of scholarly communications.
The seminar is from 5 to 6 pm EDT at Columbia University, followed by a reception. Dr. Ron Daniel, Director of Elsevier Labs, describes the presentation, titled Semi-automated exploration and extraction of data in scientific tables:
"Most of the experimental results reported in scientific articles, and recorded in databases or in supplements to the article, are provided in tables. Unfortunately, the amazing recent progress in natural language understanding is of little help if we want to automatically understand those tables. Tables are, after all, not your grandmother’s natural language. Despite this, we believe significant progress can be made towards the goal of combining tables of related information into larger sets that can be analyzed, visualized, understood, and used as the basis for decisions.
"Elsevier Labs is prototyping tools to help guide people in the exploration of tables from many articles and the extraction and merging of the data they contain. This talk will show examples of what has been accomplished by manually merging such data. With those as examples of the desired outcomes, we will describe our experiments to duplicate such examples, the work flow in which they operate, and our most recent results."
Watch the livestream — or attend in person
Presentation: Semi-automated exploration and extraction of data in scientific tables, with Elsevier Labs
When: Wednesday, Sept. 26, from 5 to 6 EDT, followed by a 30-minute reception
Where: Columbia University's Davis Auditorium, Room 412 on the fourth floor of Shapiro CEPSR, 530 W. 120 Street, New York City
Online: Watch the livestream here
Dr. Ron Daniel is the Director of Elsevier Labs. Educated as an electrical engineer, Ron has done extensive work on metadata standards such as the Dublin Core, RDF, and PRISM. Before joining Elsevier in 2010, he worked at a startup that was acquired for its automatic classification technology, and consulted on taxonomy and information management issues for nine years. Ron received his PhD in Electrical Engineering from Oklahoma State University and was a postdoctoral researcher at Cambridge University and the Los Alamos National Laboratory. Ron is bemused by the way technology reincarnates itself, specifically in the way that parallel implementations of neural networks for machine vision are currently in vogue, just as they were 30 years ago when he was working on them in grad school.
Corey Harper is a data scientist and Technology Research Director for Elsevier Labs. He spent nearly 15 years building digital libraries, administering library systems, and managing library metadata. He has held metadata librarian positions at both New York University and the University of Oregon, where his research focused on linked data, digital repositories, and library discovery. His current research interests include natural language processing, machine learning, predictive analytics, and data visualization with applications toward issues around research communications. In addition, he is involved in both the Digital Public Library of America (DPLA) and code4libcommunities. Corey has an MBA from NYU's Stern School of Business and an MSLS from the University of North Carolina.