- Print ISBN 9780124045767
- Electronic ISBN 9780124047242
Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endowed with semantic support (i.e., organized in classes of uniquely identified data objects). Readers will learn how their data can be integrated with data from other resources, and how the data extracted from Big Data resources can be used for purposes beyond those imagined by the data creators.
• Learn general methods for specifying Big Data in a way that is understandable to humans and to computers.
• Avoid the pitfalls in Big Data design and analysis.
• Understand how to create and use Big Data safely and responsibly with a set of laws, regulations and ethical standards that apply to the acquisition, distribution and integration of Big Data resources.
data managers, data analysts, statisticians
Definition of Big Data
Big Data Versus Small Data
Whence Comest Big Data?
The Most Common Purpose of Big Data is to Produce Small Data
Big Data Moves to the Center of the Information Universe
Chapter 1. Providing Structure to Unstructured Data
Chapter 2. Identification, Deidentification, and Reidentification
Features of an Identifier System
Registered Unique Object Identifiers
Really Bad Identifier Methods
Embedding Information in an Identifier: Not Recommended
Use Case: Hospital Registration
Chapter 3. Ontologies and Semantics
Classifications, the Simplest of Ontologies
Ontologies, Classes with Multiple Parents
Choosing a Class Model
Introduction to Resource Description Framework Schema
Common Pitfalls in Ontology Development
Chapter 4. Introspection
Knowledge of Self
eXtensible Markup Language
Introduction to Meaning
Namespaces and the Aggregation of Meaningful Assertions
Resource Description Framework Triples
Use Case: Trusted Time Stamp
Chapter 5. Data Integration and Software Interoperability
The Committee to Survey Standards
Specifications and Standards
Interfaces to Big Data Resources
Chapter 6. Immutability
"By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book."--ODBMS.org, March 21, 2014
"The book is written in a colloquial style and is full of anecdotes, quotations from famous people, and personal opinions."--ComputingReviews.com, February 3, 2014
"The author has produced a sober, serious treatment of this emerging phenomenon, avoiding hype and gee-whiz cases in favor of concepts and mature advice. For example, the author offers ten distinctions between big data and small data, including such factors as goals, location, data structure, preparation, and longevity. This characterization provides much greater insight into the phenomenon than the standard 3V treatment (volume, velocity, and variety)."--ComputingReviews.com, October 3, 2013