PDF versus HTML — which do researchers prefer?

A survey by the Article of the Future team suggests a shared future

Over the past few years, the Article of the Future team at Elsevier has introduced an array of article content innovations to enhance the online reading experience. More details on the most recent of these can be found in the article "How to handle digital content" in Editors’ Update and "Designing the Article of the Future" and "Executable papers" on Elsevier Connect.

With so many of these innovations already live and functioning on ScienceDirect, we thought the time was ripe to poll researchers about their thoughts on the future of the scientific article format — in particular the online version (HTML) and the traditional PDF. In January, we posed a series of questions online to a group of 500 researchers worldwide whose members represent a variety of disciplines. We received a total of 281 responses, and the answers provided some interesting food for thought.

How we conducted the survey

The survey was divided into two phases. We began by asking participants for their thoughts on both the HTML and the PDF. We asked them to outline the pros and cons of accessing and reading articles in both formats and which had their preference. We also asked them to indicate which format they expected to be using in the future. Once they had completed their answers, participants entered stage two. This involved viewing avideooutlining Elsevier’s Article of the Future project, which focuses on enriching content in a discipline-specific manner. They were then presented with a second set of questions.

Responses from Phase 1

During the first stage, we gained some key insights from the survey:

  • Researchers favor the PDF format primarily for in-depth reading. They find the HTML format more convenient and suitable for discovering content and determining its relevance.
  • Researchers feel there will be a place for both HTML and PDF formats in the future — PDF for offline access and the storage of content for later reference, HTML for online access and immediate learning and discovery.
  • If we want to promote the uptake of HTML, we need to address concerns around author input, printing, storing and sharing content, and make the HTML layout customizable.

In the words of one participant: “If the article contains interactive elements, then (the) HTML version would make more sense; otherwise (the) good old PDF will be preferable.”

[note color="#f1f9fc" position="center" width=800 margin=10]

Sample comments

[pullquote align="right"]If the article contains interactive elements, then (the) HTML version would make more sense; otherwise (the) good old PDF will be preferable.[/pullquote]

“There exist so many articles. And it's hard to open or download whenever I find interesting things. So it's more reasonable to read (the) HTML version first.”

“I prefer to work with the printed version of the article because I don’t like reading from (the) screen.”

“I use articles in the HTML format because they may contain links to additional information and tools (missing in PDFs).”

“PDF version is formatted just like the article in print, I can easily navigate to the places I want.”

“HTML files are ok on the screen but messy to handle downloaded. But I do sort normally on abstract or relevant info I get in the HTML environment, before I click on the PDF icon for download.”

“I prefer to share the article URLs rather than sending big PDF files around.”“(The PDF) looks and feels more like a paper article.  If I want to print it, I think it will look better printed from PDF. If I want to save or email it, it is easier with PDF.”[/note]

These replies match the ones received in earlier Article of the Future studies, which led us to develop the three-pane article view now available on ScienceDirect. The PDF-style center pane is ideal for reading the paper while the other two panes offer a series of discipline-specific presentation and content enrichments that add real value to the article. The preference for a PDF format when printing is something that the Article of the Future project is taking into consideration: we are currently looking into how we can make the center pane easy to print, while maintaining its optimized reading format.

[caption id="attachment_5776" align="aligncenter" width="798"]Three-pane format on ScienceDirect Three-pane format on ScienceDirect[/caption]

Responses from Phase 2

In the second stage of the survey, participants were shown the Article of the Future video below, which discusses recent improvements to article presentation, content and context and the introduction of the article three-pane view on ScienceDirect.

When asked if the video had changed their perception of the usability of the HTML format, 60 percent agreed it had, while 25 percent said it had not and 15 percent were unsure. A sample of participants’ comments is shown on this table:

60% said:  ‘Yes, it has changed my perception’25% said: ‘No, it has not changed my perception’
“I conventionally don´t like html articles because of the way they are presented in the screen, nevertheless, the Article of the Future is taking the traditional way to a new frontier, beyond hypertext to metacontent management by user or reader.”“This is more or less what I can see on some programming software, but applied to articles, good idea but questions remain: How will it age? How expensive to maintain? How to keep it alive and operational?”
“Elsevier has used the power of the internet to make sure the article is a truly dynamic, interactive and well annotated and connected scientific document.”“Logical path forward. The current problem is that not all users are equipped with appropriate technology, nor do they master it.”
“I like the interaction with the content, and the ease of exploring other links and references without having to go back to search for them later.”“…as long as I cannot download it, it's hard to archive and I prefer reading it offline.”
“I think there's a great deal of advantages to providing the option to publish in such a format. I would be interested to see how authors can access the tools to represent their data in these new formats.”“It offers greater interactivity - but there again a much more powerful PDF viewer (with interactive tools built in) would preserve PDF's ascendancy.”
“… much more powerful than pdfs that I currently use.”“I didn't see how the features were relevant to articles, only handbooks and textbooks.”

More than 65 percent said they thought there would be a shift towards HTML use in the future.

We also asked participants to let us know whether they expected the way they access and use articles today to change in the future. Their responses varied:

Probably not10.3%
Don't know / not sure5.8%

The digital revolution has radically changed the way in which scientists carry out their research, and process and store their results. It is clear that as long as technology develops, the way scientists access and use articles will develop as well. The important question is, "How and by how much?"

Twenty years ago, the only article format was paper, 10 to 15 years ago the format became PDF, and now a new way of usage has been created by the introduction of tablets. Does that mean that we threw away paper and will now throw away desktop computers? No – we apply the format and way of use that is applicable at the moment of use. And the same will hold for PDF and online HTML; I think there is a future for both. PDF will remain the preferred format for archiving and offline use, while online HTML will increasingly become the standard for online use, as it is so much richer and in tune with the ongoing developments in the regular research process.

PDF and HTML – the pros

  • Consistent layout
  • Offline accessibility
  • Easy to store and organize
  • Similar to print version
  • Easy to print
  • Displays images well
  • Easy to share by email (when small)
  • Easy to annotate
  • Customized to device (including mobile)
  • Enriched and interactive contentAlways latest version
  • Up-to-date and linked context
  • Linked with data repositories
  • Easy to search
  • Easy to share by link (also when large)
  • Fast access from lists
  • Includes supplementary material

A similar version of this article originally appeared in Editors' Update. Visit their site to take a poll, or leave your comments below.

The Author

[caption id="attachment_5777" align="alignleft" width="110"]IJsbrand Jan Aalbersberg, PHDIJsbrand Jan Aalbersberg, PhD[/caption]

Dr. IJsbrand Jan Aalbersberg is Senior VP of Journal and Content Technology for Elsevier. After joining the company in 1997, he served as VP of Technology at Elsevier Engineering Information from 1999 to 2002. As Technology Director in Elsevier Science & Technology from 2002 to 2005, he was one of the initiators of Scopus, responsible for its publishing-technology connection, and subsequently focused on product development in Elsevier’s Corporate Markets division from 2006 to 2009. He then took on the role of VP of Content Innovation, which he held until 2012. In that role and the position he now holds, he has striven to help scientists communicate research in ways they were not able to do before.

Dr. Aalbersberg holds a PhD in theoretical computer science from Leiden University in the Netherlands. He is based in Amsterdam.

comments powered by Disqus