Battle of the search engines — and the winner is …

A side-by-side comparison of ScienceDirect and Google Scholar shows how their differences can impact scientific research

Dr. Antonio Gullí (@antoniogulli) is VP of Product Management for Researcher Operating System and Awareness Technologies at Elsevier. He has 20 years of industry expertise in web search, machine learning and big data, which he recently applied to this comparison of ScienceDirect and Google Scholar.

Antonio Gulli, PhDWhen Google was launched 15 years ago, its use was driven by college students. But less than a decade later, it would pretty much knock all other search engines out of the water and is now so frequently used as the default search engine of choice that “to google” has become a verb we all use much like “xerox” for copying.

But what if you are a researcher looking for the most relevant data on a specific topic? Let’s say you’re an economist looking for publications or data on quantitative easing. Google’s results are certainly impressive if all you are looking for is sheer volume. A recent search turned up more than 3 million results, which may very well yield some interesting articles and data— one of which is a really clever little video produced by the Khan Academy that explains it in simple terms — but requires considerable time and patience to sort through and determine what’s useful.

Using the same search terms in Google Scholar yielded 54,000 results, many of which were dated. Like any other professionals, researchers are always working against the clock, and time is money, so it doesn’t make sense to use a search engine that provides a volume response for the sake of mass quantity.

The tale of the tape

Recently I compared Google Scholar and ScienceDirect using the so-called side-by-side (win/lose) approach, a standard industry process to complement solid A/B tests and qualitative studies. The outcome demonstrated how a targeted search using a platform designed for researchers yields far more relevant results. A walk through three examples illustrates the widely disparate results where Google systematically fails against ScienceDirect.   

Google search for 'quantitative easing

The first word search on Google Scholar using “quantitative easing” brings up more than 54,000 articles, but many are dated as far back as 2010. Using the same search terms, ScienceDirect yielded 140,845 targeted results. Not only was the yield nearly three times as large, the articles were also far more recent and highly relevant to the search parameters.

ScienceDirect search for 'quantitative easing 

Let’s take it one step further. On deep searches, the user’s need will likely be highly specific and require a search program that can detect subtle differences in content that will match the results with the search terms. 

In the example shown, Machine Learning is used as the discipline, and a specific innovation — deep learning autoencoders — the search terms. Google Scholar returns the seminal paper from 2006 that is considered the starting point for the renaissance of Neural Networks and their evolution into modern Deep Learning systems.

Google search for deep learning autoencoders

However, this paper does not talk about autoencoders, which are deep learning machines able to auto-learn the important features in a dataset with no human intervention. Instead, it talks about deep belief nets, and while this is a slightly related topic, it is not a result that is useful for this particular search. When using ScienceDirect in the same exercise, the returns on Deep Learning and Autoencoders are much more relevant and recent. 


It should also be noted that the quantity of results that Google and ScienceDirect yielded were significantly different – Google flagged 3,710 items and ScienceDirect 151. But as mentioned earlier, on such very detailed, nuanced searches, it is not quantity but quality of the results that is the most important.

Oops, I made a mistake …!

So what if you make a make a mistake when entering search terms? (And who has not?) Going back to the basics, it’s safe to assume that users will make mistakes while they write.

Google search for 'analytical chemistryIn this example, the mistake is made on purpose to simulate a user with a different keyboard with a foreign-language alphabet. The search should automatically support normalization, which it does not.

In the next test, the requirement was to search a specific item related to prostate cancer named {ARN-509}. In this example, the search is deliberately written as {ARN \space -509} and no match is given.

Google search for 'ARN-509

In both instances, ScienceDirect provides a match regardless of the mistake, while Google Scholar matches only with the exact term.


Additional tests using other disciplines and search terms neuroscienceand Higgs boson – yielded similar highly matched returns in ScienceDirect versus those from Google Scholar.   

The tests showcased here are by no means conclusive, but they do point out the very different results that each can yield. For the busy researcher, it can also mean the difference in more efficient time-management and ultimately research output.

For more examples and information, visit my blog: Coding Playground.

Elsevier Connect Contributor

Dr. Antonio Gullí(@antoniogulli) is VP of Product Management for Researcher Operating System and Awareness Technologies at Elsevier, where he brings his years of industry expertise to the world of academic research. He has 20 years of experience in web search, machine learning and big data. Before joining Elsevier in Amsterdam last year, Antonio worked for Microsoft, where he led the Bing development team in London. He created algorithms to determine whether news articles were popular, suggest related articles, and enable users to refine their searches. 

Previously, he served as CTO for in Europe (now part of IAC), where he created a European Development Center, managing teams in the US and Europe. Before that, Antonio was the CEO of Ideare, one of the earliest search and pay-per-click advertising companies in Europe, which he co-founded and sold to Tiscali. Back in 1996, Antonio co-developed the first Italian search engine, Arianna, and he was the product owner of Web classification technologies at Fireball, the first German search engine.  

Antonio earned his PhD in computer science from the University of Pisa, Italy. He has authored many articles for peer reviewed journals and filed more than 20 patents. Antonio blogs at Antonio Gullí’s Coding Playground.

comments powered by Disqus

Related Stories