Data on the Web

From Relations to Semistructured Data and XML


  • Serge Abiteboul
  • Peter Buneman
  • Dan Suciu

The Web is causing a revolution in how we represent, retrieve, and process information Its growth has given us a universally accessible database—but in the form of a largely unorganized collection of documents. This is changing, thanks to the simultaneous emergence of new ways of representing data: from within the Web community, XML; and from within the database community, semistructured data. The convergence of these two approaches has rendered them nearly identical. Now, there is a concerted effort to develop effective techniques for retrieving and processing both kinds of data.

Data on the Web is the only comprehensive, up-to-date examination of these rapidly evolving retrieval and processing strategies, which are of critical importance for almost all Web- and data-intensive enterprises. This book offers detailed solutions to a wide range of practical problems while equipping you with a keen understanding of the fundamental issues—including data models, query languages, and schemas—involved in their design, implementation, and optimization. You'll find it to be compelling reading, whether your interest is that of a practitioner involved in a database-driven Web enterprise or a researcher in computer science or related field.

View full description


Information systems professionals/managers who want to have Web-based databases, large scale Web publishers, database designers, and anyone in the database field (students or graduate students) who wish to investigate this field further.


Book information

  • Published: October 1999
  • ISBN: 978-1-55860-622-7

Table of Contents

ForwardAcknowledgments1 Introduction1.1 Audience1.2 Web Data and the Two Cultures1.3 OrganizationI Data Model2 A Syntax for Data2.1 Base types2.2 Representing Relational Databases2.3 Representing Object Databases2.4 Specification of syntax2.5 The Object Exchange Model, OEM2.6 Object databases2.7 Other representations2.7.1 ACeDB2.8 Terminology2.9 Bibliographic Remarks3 XML3.1 Basic Syntax3.1.1 XML Elements3.1.2 XML Attributes3.1.3 Well­Formed XML Documents3.2 XML and Semistructured Data3.2.1 XML Graph Model3.2.2 XML References3.2.3 Order3.2.4 Mixing elements and text3.2.5 Other XML Constructs3.3 Document Type Declarations3.3.1 A Simple DTD3.3.2 DTD's as Grammars 3.3.3 DTD's as Schemas3.3.4 Declaring Attributes in DTDs3.3.5 Valid XML Documents3.3.6 Limitations of DTD's as schemas3.4 Document Navigation3.5 DCD3.6 Paraphernalia3.6.1 RDF3.6.2 Stylesheets3.6.3 SAX and DOM3.7 Bibliographic RemarksII Queries4 Query Languages4.1 Path expressions4.2 A core language4.2.1 The basic syntax4.3 More on Lorel4.3.1 Less Essential Syntactic Sugaring4.4 UnQL4.5 Label and path variables4.5.1 Paths as Data4.6 Mixing with structured data4.7 Bibliographic Remarks5 Query Languages for XML5.1 XML­QL5.1.1 Constructing New XML Data5.1.2 Processing Optional Elements withNested Queries5.1.3 Grouping with Nested Queries5.1.4 Binding Elements and Contents5.1.5 Querying Attributes5.1.6 Joining Elements by Value5.1.7 Tag Variables5.1.8 Regular Path Expressions5.1.9 Order5.2 XSL5.3 Bibliographic Remarks6 Interpretation and advanced features6.1 First­order interpretation6.2 Object creation6.3 Graphical languages6.4 Structural Recursion6.4.1 Structural recursion on trees6.4.2 XSL and Structural Recursion6.4.3 Bisimulation in Semistructured Data6.4.4 Structural recursion on cyclic data6.5 StruQLIII Types7 Typing semistructured data7.1 What is typing good for?7.1.1 Browsing and querying data7.1.2 Optimizing query evaluation7.1.3 Improving storage7.2 Analyzing the problem7.3 Schema Formalisms7.3.1 Logic7.3.2 Datalog7.3.3 Simulation7.3.4 Comparison between datalog rules and simulation7.4 Extracting Schemas From Data7.4.1 Data Guides7.4.2 Extracting datalog rules from data7.5 Inferring Schemas from Queries7.6 Sharing, Multiplicity, and Order7.6.1 Sharing7.6.2 Attribute Multiplicity7.6.3 Order7.7 Path constraints7.7.1 Path constraints in semistructured data7.7.2 The constraint inference problem7.8 Bibliographic RemarksIV Systems8 Query Processing8.1 Architecture8.2 Semistructured Data Servers8.2.1 Storage8.2.2 Indexing8.2.3 Distributed Evaluation8.3 Mediators for Semistructured Data8.3.1 A Simple Mediator: Converting Relational Data to XML8.3.2 Mediators for Data Integration8.4 Incremental Maintenance of Semistructured Data8.5 Bibliographic Remarks9 The Lore system9.1 Architecture 9.2 Query processing and indexes9.3 Other aspects of LoreThe Data GuideManaging External DataProximity SearchViewsDynamic OEM and ChorelMixing Structured and Semistructured in Ozone9.4 Bibliographic Remarks10 Strudel10.1 An Example10.1.1 Data Management10.1.2 Structure Management10.1.3 Management fo the Graphical Presentation10.2 Advantages of Declarative Web Site Design10.3 Bibliographic Remarks11 Database products supporting XML11.1 Architecture11.2 Storage11.3 Application Programming Interface11.4 Query language11.5 Scalability11.6 Bibliographic RemarksBibliographyIndexAbout the Authors