eBook management: would you sacrifice some control to improve access?

Librarians, publishers, and systems vendors are moving to automated workflows to manage knowledge bases for online books and journals. The benefits of automated data sharing are great, but some libraries may be reluctant to switch to a system for knowledge base management that decouples it from acquisitions and electronic resource management.

In the last two years, Elsevier has participated in the KBART II working group, and has gone a step beyond KBART by developing systems for sharing a library’s ScienceDirect holdings data with the vendors who provide their catalog records, discovery services, and article linkers. Automated data sharing promises speed, accuracy, and far less work for librarians. To learn more about the practical details and benefits of automated data sharing, see “Reduced Workload Ahead” on the Elsevier blog and the recent Library Connect interview with Carlen Ruschoff, “When Less Is Really More” about using automatic data sharing with OCLC Local at the University of Maryland. More than 600 libraries around the world have activated automatic data sharing between ScienceDirect and their vendor knowledge bases, yet we are still very much in that early phase of technology adoption when obstacles and objections need to be overcome.

My experience has been concentrated on using automated data sharing to support the ScienceDirect MARC program that provides catalog records for our eBooks through OCLC at no cost to libraries. I have also been involved in supporting data sharing with discovery systems providers. In the past, libraries ordered MARC record sets from OCLC’s website or order forms that we created to select collections or individual books. Ordering the right collection was not easy; there are over 600 collections for ScienceDirect books, including different collections by subject and year, and deprecated collections that are no longer sold but are still maintained because some libraries that purchased perpetual access to them. Worse yet, many collections have very similar names and there is no consistent practice around using unique identifiers for them. Even if the library selected the right collections, another layer of human error was in play behind the scenes in the maintenance of the collection title list used by OCLC. Under the new model, libraries set up data sharing to automatically update their WorldCat knowledge base with their ScienceDirect holdings; then they set their OCLC WorldShare Collection Manager profile to enable MARC deliveries for the “ScienceDirect All Books” collection and specify filtering based on their holdings. With automated data sharing enabled, Collection Manager knows immediately when a book is added or removed from the library’s holdings. And, depending a bit on processing time and the update frequency chosen by the library, its catalog stays completely in synch with the set of books the library’s users can access on ScienceDirect. No more human error and far less work for everyone. What’s not to like?

The biggest obstacles for most libraries are the practical ones that you’d expect for a new technology. There is a setup process that is slow and a bit complicated; it can take two or three weeks to get the data flowing between ScienceDirect, OCLC, and your library catalog. If you’re not already a WorldShare Collection Manager user, there is a new system to learn. And the new system is not universal. While I believe that most big publishers will soon implement automated data sharing for library holdings, most have not, so the library must support two procedures. Luckily the procedure for automated data sharing is, indeed, automated. After the initial set up effort, the librarians who have been kind enough to provide me with feedback report that the system works as advertised. When I spoke recently with the technical services and acquisitions team at the University of Maryland about how data sharing is working for their OCLC Local catalog, they confirmed that knowledge base updates for ScienceDirect now happen without human intervention.* Several procedures that were necessary with the collection-based model have simply been dropped. For example, UMD does not monitor publication announcements to know when current-year titles become available. They know that when a new book is published in a current-year collection they purchased, the knowledge base will be updated to include it, and it won’t show up until it is actually available to users. Head of Acquisitions, Angie Ohler, summed it up: “I really wish more publishers would embrace this idea. It would make sense for both sides. Why should we waste our precious resources doing and redoing the same work?” But she also noted that moving to automated data sharing has required a change in mindset.

The biggest objection to automated data sharing is that is separates discovery from purchasing. To some extent, it dis-integrates the integrated library system. Traditionally, librarians audited acquisitions against fulfillment, and an item went live in the catalog after a librarian had confirmed that the MARC delivery matched the order. Automated data sharing does not work that way. The holdings data flow regularly from our entitlement system to the vendor knowledge base for each library. This ensures that the knowledge base will match exactly the content that is available to the users. In each cycle, records added to the knowledge base or catalog may cover multiple purchases, different collections, and even trials. Similarly, when books are removed from the knowledge base or MARC delete records are delivered, no explanation is provided. No attempt is made to report how or when the library purchased the book, and no distinction is made between books that are owned in perpetuity and those that are on trial or available during a subscription period.

For cataloging, automated data sharing saves a great deal of time compared to the traditional processes whereby librarians were constantly checking the publishers’ title lists against MARC records received. The advantages for discovery knowledge bases are even more dramatic. Discovery depends on the accuracy of the knowledge base, and this meant that librarians had to recreate the library’s purchase history in the discovery system administrative interface; if you bought an eBook collection you had to then select it in your discovery system. Behind the scenes, it also meant that the discovery service had to maintain a title list for each collection from each publisher. In reality, discovery services often simplify publishers’ collections and their title lists are not always very accurate. The KBART II recommendation will make it much easier for discovery services to maintain accurate collection title lists, but I hope that this is just a transitional phase as more publishers and vendors switch to automated data sharing. KBART II makes less work for vendors not for librarians. It makes it easier to maintain accurate collection titles lists, but it still leaves librarians with the task of activating and deactivating each collection for each vendor to match their holdings, hopefully. The crushing amount of work placed on librarians to maintain the discovery knowledge base by selecting and de-selecting collections is probably the greatest source of error in discovery systems. Let’s get rid of it.

If librarians and publishers embrace automated data sharing and stop collecting collections, what is lost?  As long as the knowledge base is accurate, the fact that a certain book belongs to certain collection adds no value for the researcher running a search in the catalog or discovery system. Yet, the concern I hear most frequently from librarians about our MARC records is that they don’t show the collection. Collection membership was never listed in our MARC records, and would not be consistent with vendor-neutral cataloging practices. However, many libraries have added collection information to ScienceDirect MARC records to support both cataloging and acquisitions. I would argue that this information is no longer needed for cataloging and knowledge base management. For example, with automated data exchange you don’t need to attach a collection name to keep track of which books belong to a trial or an annual subscription so that you can remove them later. OCLC Collection Manager will send you delete records for books that are no longer part of your subscription. Based on my discussions with librarians, vendors, and other publishers, especially at the recent NISO forum on “The Future of Library Resource Discovery,” I believe the most critical outstanding issues relate to purchasing and electronic resource management. By separating discovery from purchasing, we lose some of the auditing and troubleshooting that was built into the old process. New systems that will hopefully also use automated data exchange may need to be developed to give libraries confidence that they have received the online book and journals they purchased and to manage access policies such as limits on simultaneous users. There are unresolved issues related to discovery too, including how to identify the same book on various platforms to link users to the appropriate copy, and how knowledge bases should deal with the different permutations of book series which create a grey area between serial and monographic publications. I believe that the best approach here is to tackle these issues without falling back on old procedures that cannot realistically be maintained and extended. Librarians, publishers, and vendors should pursue automated data sharing, accept the separation of discovery and purchasing that it requires, and look for new solutions for the outstanding issues.

Many thanks to Nathan Putnam, Head of Metadata Services, Angie Ohler, Head of Acquisitions, and Rebecca Goldfinger, Continuing Resources Librarian, at the University of Maryland for taking time to meet with me and discuss the effect of data sharing on their work.

