PDF versus HTML – a shared future!

Results of a recent survey offer interesting food for thought

As you may know, over the past few years the Article of the Future team at Elsevier has introduced an array of article content innovations to enhance the online reading experience. More details on the most recent of these can be found in the article How to handle digital content in this edition of Editors’ Update.

With so many of these innovations already live and functioning on ScienceDirect, we thought the time was ripe to poll researchers about their thoughts on the future of the scientific article format - in particular the online version (HTML) and the traditional PDF. We posed a series of questions to a 500-strong online group of researchers, whose members represent a variety of disciplines worldwide*. Their answers provided some interesting food for thought.

How we conducted the survey

The survey was divided into two phases. We began by asking participants for their thoughts on both the HTML and the PDF. We asked them to outline the pros and cons of accessing and reading articles in both formats and which had their preference. We also asked them to indicate which format they expected to be using in the future.  Once they had completed their answers, participants entered stage two. This involved viewing a video outlining Elsevier’s Article of the Future project, which focuses on enriching content in a discipline-specific manner. They were then presented with a second set of questions.

During the first stage, key insights gained included:

  • Researchers favor the PDF format primarily for in-depth reading - they find the HTML format more convenient and suitable for discovering content and determining its relevance.
  • Researchers feel there will be a place for both HTML and PDF formats in the future: PDF for offline access and the storage of content for later reference, HTML for online access and immediate learning and discovery.
  • If we want to promote the uptake of HTML, we need to address concerns around author input, printing, storing and sharing content, and make the HTML layout customizable.

In the words of one participant: “If the article contains interactive elements, then (the) HTML version would make more sense; otherwise (the) good old PDF will be preferable.”

Other participant comments included:

“There exist so many articles. And it's hard to open or download whenever I find interesting things. So it's more reasonable to read (the) HTML version first.”

“I prefer to work with the printed version of the article because I don’t like reading from (the) screen.”

“I use articles in the HTML format because they may contain links to additional information and tools (missing in PDFs).”

“PDF version is formatted just like the article in print, I can easily navigate to the places I want.”

“HTML files are ok on the screen but messy to handle downloaded. But I do sort normally on abstract or relevant info I get in the HTML environment, before I click on the PDF icon for download.”

“I prefer to share the article URLs rather than sending big PDF files around.”

“(The PDF) looks and feels more like a paper article.  If I want to print it, I think it will look better printed from PDF.  If I want to save or email it, it is easier with PDF.”

IJsbrand Jan Aalbersberg, Senior Vice President Journal & Content Technology, leads Elsevier’s Content Innovation team. He was not surprised by the findings of the first stage of the research. He explained: “These replies match the ones received in earlier Article of the Future studies, which led us to develop the three-pane article view now available on ScienceDirect. The PDF-style center pane is ideal for reading the paper while the other two panes offer a series of discipline-specific presentation and content enrichments that add real value to the article. The preference for a PDF format when printing is something that the Article of the Future project is taking into consideration: we are currently looking into how we can make the center pane easy to print, while maintaining its optimized reading format.”

Figure 1. An article in the journal Digital Applications in Archaeology and Cultural Heritage on ScienceDirect, displayed using the three-pane view.

Phase two

In the second stage of the survey, participants were shown the Article of the Future video (below), which discusses recent improvements to article presentation, content and context and the introduction of the article three-pane view on ScienceDirect.

ARVE Error: Mode: is invalid or not supported. Note that you will need the Pro Addon for lazyload modes.

When asked if the video had changed their perception of the usability of the HTML format, 60% agreed it had, while 25% said it hadn’t and 15% were unsure. A sample of participants’ comments is recorded below.

60% said: ‘Yes, it has changed my perception’ 25% said: ‘No, it has not changed my perception’
“I conventionally don´t like html articles because of the way they are presented in the screen, nevertheless, the Article of the Future is taking the traditional way to a new frontier, beyond hypertext to metacontent management by user or reader.” “This is more or less what I can see on some programming software, but applied to articles, good idea but questions remain: How will it age? How expensive to maintain? How to keep it alive and operational?”
“Elsevier has used the power of the internet to make sure the article is a truly dynamic, interactive and well annotated and connected scientific document.” “Logical path forward. The current problem is that not all users are equipped with appropriate technology, nor do they master it.”
“I like the interaction with the content, and the ease of exploring other links and references without having to go back to search for them later.” “…as long as I cannot download it, it's hard to archive and I prefer reading it offline.”
“I think there's a great deal of advantages to providing the option to publish in such a format. I would be interested to see how authors can access the tools to represent their data in these new formats.” “It offers greater interactivity - but there again a much more powerful PDF viewer (with interactive tools built in) would preserve PDF's ascendancy.”
“… much more powerful than pdfs that I currently use.” “I didn't see how the features were relevant to articles, only handbooks and textbooks.”

More than 65% thought there would be a shift towards HTML use in the future.

We also asked participants to let us know whether they expected the way they access and use articles today to change in the future. Their responses varied:

Yes 30.8%
Maybe 50.0%
Probably not 10.3%
No 3.2%
Don't know / not sure 5.8%

Aalbersberg commented: “The digital revolution has radically changed the way in which scientists carry out their research, and process and store their results. It is clear that as long as technology develops, the way scientists access and use articles will develop as well – the important question is: How and by how much? 20 years ago the only article format was paper, some 10-15 years ago the format became PDF, and now a new way of usage has been created by the introduction of tablets. Does that mean that we threw away paper and will now throw away desktop computers? No – we apply the format and way of use that is applicable at the moment of use. And the same will hold for PDF and online HTML: I think that there is a future for both. PDF will remain the preferred format for archiving and offline use, while online HTML will increasingly become the standard for online use, as it is so much richer and in tune with the ongoing developments in the regular research process.”

PDF and HTML – the pros



Consistent layout

Offline accessibility

Easy to store and organize

Similar to print version

Easy to print

Displays images well

Easy to share by email (when small)

Easy to annotate

Customized to device (incl. mobile)

Enriched and interactive content

Always latest version

Up-to-date and linked context

Linked with data repositories

Easy to search

Easy to share by link (also when large)

Fast access from lists

Includes supplementary material

* The questions were posed to a community of 500 researchers. Two surveys were conducted during January 2013. The first attracted 159 responses (31.8% response rate) the second attracted 122 responses (24.4% response rate).

Author biography

IJsbrand Jan AalbersbergIJsbrand Jan Aalbersberg
After joining Elsevier in 1997, IJsbrand Jan served as Vice President of Technology at Elsevier Engineering Information (Hoboken, USA) during 1999-2002. As Technology Director in Elsevier Science and Technology (2002-2005), he was one of the initiators of Scopus, responsible for its publishing-technology connection, and subsequently focused on product development in Elsevier’s Corporate Markets division (from 2006-2009). He then took on the role of Vice President Content Innovation, which he held until 2012. In both that role and the position he now holds, he has striven to help scientists to communicate research in ways they weren’t able to do before. IJsbrand Jan holds a PhD in Theoretical Computer Science.

Archived comments

Jim Palmer says: June 27, 2013 at 8:10 pm
It is my understanding that a pdf is secure, that is it cannot contain a virus or Trojan horse. It is my understanding that this is not true for HTML. If this is correct, then I would suggest that this is a very important consideration that should be in the discussion.

IJsbrand Jan Aalbersberg says: July 2, 2013 at 7:55 am
Dear Professor Palmer,

Thank you for your comment and it is good that you bring up the issue of viruses and security. A very important topic indeed, and one I am happy to comment on.

Technically, a PDF could contain a virus or a Trojan horse, but Elsevier has taken various measures to prevent their occurrences, like using industry-standard PDF creation tools as well as employing validation and detection tools. We have also ruled out that our PDFs contain scripting, by adhering to the 'PDF-A standard'. This high level of protection, however, only applies to our in-house generated PDFs, like the full-text article PDFs. If we post a PDF from an author (embedded as a supplementary file), then a virus or Trojan horse could potentially be included. Even though we have detection software in place to also scan for malicious elements in these files, a yet-unknown virus could slip through.

You are right that HTML could include JavaScript code that calls in viruses. However, in ScienceDirect we do control our own HTML, and we take good care that this situation doesn't happen. Though, when an author-provided HTML is submitted as a supplementary file, or when a reference link is followed to a web page outside of ScienceDirect, we cannot make any guarantee about any scripting that might possibly be encountered. And the same holds for a link that might look like it is going to download a PDF, but actually redirects through one or more intermediate steps that involve HTML pages.

Summarizing, though in general one could say that PDF files can also be harmful, (though probably in much lesser quantity than arbitrary HTML web pages), in the case of Elsevier's, and probably most other STM publishers', full-text articles, the risk of either a harmful PDF or HTML format is probably negligible.

Kind regards, IJsbrand Jan Aalbersberg

Dr Dover says: July 27, 2013 at 10:23 pm
I like pdf files because of the ease of access. But when I first started school I liked html files because they looked easier too read. The further I got in school the more annoyed I got by html because the writing looked more prepublished and unfinished, whether it was or not. Pulling a pdf off the world wide web is danger because you can download any amount of dangerous material but on a secure website like school or other specific databases it is a work of wonders. Though the access to such locations is minimal for most of the general public. Yet I cannot find any html anywhere on the world wide web, at least I can't.

G. Paterson says: September 24, 2013 at 2:41 am
I agree with the general view that both HTML and PDF are useful formats. My preference, however, is PDF and I would argue that many of the advantages of HTML are also features of PDF.

Enriched and interactive content: PDFs support the embedding or linking to multimedia, which includes video and audio. In addition, most modern PDF provide web links to cited references (where available) and, provided the URLs are stable, can be easily linked with data repositories.

Easy to search: Most modern PDF readers have extensive and powerful search tools that can be easily use to efficiently search single documents, or folders with multiple PDFs. Searching multiple papers via HTML is not as straightforward.

Includes supplementary material: Several journals (e.g., PNAS) support PDFs that are the standalone paper, or are the paper with associated supplementary material.

File size will always be an issue for PDFs, particularly when they are heavily loaded with multimedia content. In this respect, HTML comes into its element by allowing users to selectively view content. The disadvantage of this is the need to always be online.

Prof. Dr. Claudia Quintella says: October 14, 2013 at 9:11 am
Both are important. I work with both but printing the manuscript is a very important point. For us researchers, to have the manuscript with us means we can also work without a computer, in all places. Time is shorter. This is, in my experience, the reason why good professional reviewers refuse to review manuscripts.

comments powered by Disqus