Joe Celko's Data and Databases

Concepts in Practice


  • Joe Celko, Independent Consultant, Austin, Texas

Do you need an introductory book on data and databases? If the book is by Joe Celko, the answer is yes. Data and Databases: Concepts in Practice is the first introduction to relational database technology written especially for practicing IT professionals. If you work mostly outside the database world, this book will ground you in the concepts and overall framework you must master if your data-intensive projects are to be successful. If you're already an experienced database programmer, administrator, analyst, or user, it will let you take a step back from your work and examine the founding principles on which you rely every day-helping you to work smarter, faster, and problem-free.

Whatever your field or level of expertise, Data and Databases offers you the depth and breadth of vision for which Celko is famous. No one knows the topic as well as he, and no one conveys this knowledge as clearly, as effectively-or as engagingly. Filled with absorbing war stories and no-holds-barred commentary, this is a book you'll pick up again and again, both for the information it holds and for the distinctive style that marks it as genuine Celko.

View full description


Programmers and other developers working on database applications development.


Book information

  • Published: July 1999
  • ISBN: 978-1-55860-432-2

Table of Contents

01.0 Preface01.1 Organization of the Book01.2 Corrections and Future Editions02.0 The Nature of Data 02.1 Data versus Information02.2 Information versus Wisdom02.2.1 Innumeracy02.2.2 Bad Math 02.2.3 Causation and Correlation02.2.4 Testing the Model against Reality02.3 Models versus Reality02.3.1 Errors in Models 02.3.2 Assumptions about Reality03.0 Entities, Attributes, Values and Relationships03.1 Entities 03.1.1 Entities as Objects03.1.2 Entities as Relationships 03.2 Attributes03.3 Values03.4 Relationships 03.5 ER Modeling 03.5.1 Mixed Models03.6 FORM Modeling 04.0 Data Structures04.1 Sets04.2 Multi-sets04.3 Simple Sequential Files 04.4 Lists 04.5 Arrays04.6 Graphs 04.7 Trees 05.0 Relational Tables05.1 Subsets05.2 Union05.3 Intersection05.4 Set difference 05.5 Partitioning 05.5.1 Groups 05.5.2 Relational Division 05.6 Duplicates05.6.1 Allow Duplicates 05.6.2 Disallow Duplicates 05.6.3 Consolidate Duplicates 05.6.4 Uniqueness 05.6.5 Levels of Aggregation 05.7 VIEWs05.7.1 Updatable VIEWs06.0 Access Structures06.1 Indexes 06.1.1 Simple Indexes 06.2 Tree Structured Indexes 06.3 Covering Indexes 06.4 Hashing Functions06.4.1 Uniform Hashing Function06.4.2 Perfect Hashing Function06.4.3 Minimal Hashing Functions06.4.4 Multi-column Hashing Functions06.5 Inverted Files06.6 Bit Vector Indexes06.7 Mixed Access Methods06.8 Multiple Table Access Structures06.9 An Informal Survey of Database Products07.0 Numeric Data07.1 Tag Numbers or Absolute Scales 07.2 Cardinal Numbers 07.3 Ordinal Numbers07.3.1 Arithmetic with Ordinals, Cardinals and Tags 07.4.0 Computer Representations07.4.1 Exact Numeric Representations 07.4.2 Approximate Numeric Representations versus the Continuum 07.5 Zero, NULL and Math07.5.1 Zero, NULL and Division 07.5.2 Zero and Powers 07.6 Rounding and Truncating07.6.1 Applied To Individual Values 07.6.2 Applied To Sets of Values07.7 Addition and Summation are Different07.8 Exotic Numbers 07.8.1 Fractions 07.8.2 Repeating Decimals 07.8.3 Infinite Precision Numbers07.8.4 Complex Numbers07.8.5 Coordinates 08.0 Character String Data 08.1 National Character Sets08.2 Problems of String Equality08.2.2 Empty String versus NULL String08.3 Length 08.4 Concatenation08.5 Position08.6 Reversal 08.7 Pattern matching 08.8 Language as Data08.8.1 Syntax versus Semantics08.8.2 Computerized Language Translation 09.0 Logic and Databases 09.1 Boolean or Two Valued Logic 09.2 Multi-valued Logic09.2.2 SQL-92 and Three Valued Logic09.3 Fuzzy Sets, Logic and Math09.3.1 Fuzzy Logic operators 09.3.2 Fuzzy Math Operators 09.4 Constructionist Logic10.0 Temporal Data 10.1 Temporal Terminology10.1.1 Events10.1.2 Durations10.1.3 Periods10.1.4 Time in Databases10.2 A Short History of the Calendar 10.3 The Julian Date10.4 ISO Temporal Standards10.4.1 ISO Date Formats10.4.1.1 Calendar Date Format10.4.1.2 Ordinal Date Format Week Format 10.5 The Year 2000 Problem10.5.1 The Odometer Problem10.5.2 The Leap Year Problem10.5.3 The Millennium Problem10.5.4 Weird Dates10.5.5 Solutions10.6 ISO Time of Day 10.6.1 Local Time, lawful Time and UTC10.7 Notes on SQL-92 and Temporal Data10.7.1 Time Zones in SQL-92 10.7.2 the OVERLAPS Predicate 11.0 Textual Data 11.1 Terminology and the Basics11.2. Indexing Text11.3 Text Searching11.4 The grep() Search Tool11.5 ANSI and ISO Search Languages11.5.1 ANSI Common Command Language11.5.2 The ISO 8777 Language12.0 Exotic Data12.1 Nature of Data Manipulation12.2 Physical versus Perceptual Data12.2.1 Physical Data12.2.2 Perceptual Data 12.3 Stored versus Constructed Data12.4 SQL/MM13.0 Missing data 13.1 Types of missing Data 13.1.1 Unknown13.1.2 Not Applicable13.1.3 Missing 13.1.4 Not Classified13.1.5 Erroneous Data13.1.5.1 Illegal Results13.1.5.2 Error in Representation Limited Values13.2 The NULL in SQL14.0 Scales and Measurements14.1 Range and Granularity14.1 Scales and Measurements14.1.1. Range 14.1.2. Granularity and Precision 14.2 Types of Scales14.3 Scale Conversion14.4 Derived Units14.5 Punctuation and Standard Units14.6 General Guidelines for Using Scales in a Database15.0 Data Encoding Schemes15.1 Bad Encoding Schemes15.1.2 Characteristics of a Bad Encoding Scheme15.2 Encoding Scheme Types15.2.1 Enumeration Encoding15.2.2 Scale Encoding15.2.3 Abbreviation Encoding15.2.4 Algorithmic Encoding15.2.5 Hierarchical Encoding15.2.6 Vector Encoding15.2.6.1 Social Security Numbers 15.2.7 Concatenation Encoding15.3 General Guidelines for Designing Encoding Schemes15.3.1 Existing Encoding Standards15.3.2 Allow for Expansion15.3.3 Use Explicit Missing Values To Avoid Nulls 15.3.4 Translate Codes for the User15.3.5 Keep the Codes in the Database 16. Check Digits16.1 Error Detection versus Error Correction 16.2 Mod Functions and Casting Functions16.2.1 Mod Function16.2.2 Casting Functions16.3 Classes of Algorithms16.3.1 Weighted-Sum Algorithms16.3.2 Power-Sum Check Digits16.3.3 Bull Check Digits16.3.4 Dihedral Five Check Digit16.4 Check Digit Algorithms16.4.1 Casting Nines16.4.2 Dihedral Five Check Digits 16.4.3 Bull Function16.4.4 Power Function16.4.5 SKU Code Algorithm16.4.5 ISBN Function 16.4.6 Code 39 Algorithm17.0 The Basic Relational Model 17.1 Tables versus Files17.1.1 Tables as Entities17.1.2 Tables as Relationships 17.2 Rows versus Records17.3 Columns versus Fields17.4 Relationships among Tables within a database17.5 Codd's Rules for a Relational Database18.0 Keys 18.1 Types of keys 18.2 Intelligent Keys versus Surrogate Keys.18.2.1 Arguments against Intelligent Keys18.2.1.1 Autoincremented Surrogate Keys18.2.1.2 (MAX() + 1) Surrogate key generators18.2.1.2 Random Number Surrogate keys18.2.1.3 Surrogate Key Tables18.2.2 Arguments for Intelligent Keys18.2.2.1 Saving Space in the Database18.2.2.2 Verification of the Data Changing Surrogate Keys is Hard Getting New Keys Clustering Versus Random Distribution Standards Versus "Roll Your Own" Data Warehousing Replicated and Distributed Databases18.3 Simple versus Compound Keys18.3.1 Super keys and Overlapping Keys 18.3.2 Queries with Compound keys 19.0 Different Relational Models19.1 Chris Date = no duplicates, no NULLs19.2 E.F Codd, RM version I 19.3 E.F Codd, RM version II 19.4 SQL-92 = duplicates, one NULL19.5 Rick Snodgrass = Temporal SQL19.5 Roth & Korth = duplicates, one NULL, non-1NF Tables20.0 Basic Relational Operations20.1 Projection20.2 Restriction20.3 Computations20.3.1 Expressions Using Other Tables20.3.1 Simple Operators and Expressions 20.4 Joins20.4.1 Inner Join20.4.2 Outer Join20.4.2.1 Left Outer Join20.4.2.2 Right Outer Join20.4.2.3 Full Outer Join20.4.3 Self Join20.4.4 Union Join21.0 Transactions and Concurrency Control21.1 Sessions21.2 Transactions and ACID 21.3 Concurrency 21.3.1 The Three Phenomena 21.3.2 The Four Isolation Levels 21.4 Pessimistic Concurrency Control 21.5 Optimistic Concurrency Control 21.6 Logical Concurrency Control 21.7 Deadlock and Livelocks 22.0 Functional Dependencies22.1 Armstrong's Axioms22.2 Multi-valued Dependencies22.3 Mappings 23.0 Normalization23.1 Anomalies23.2 First Normal Form (1NF)23.2.1 Note on Repeated Groups 23.3 Second Normal Form (2NF)23.4 Third Normal Form (3NF)23.5 Boyce-Codd Normal Form (BCNF)23.6 Fourth Normal Form (4NF)23.7 Fifth Normal Form (5NF)23.8 Domain-Key Normal Form (DKNF)23.9 Functional Dependency Axioms23.10 Practical Hints for Normalization23.10.1 Case Tools for Normalization 24.0 Denormalization 24.1. Criteria for Denormalization24.2 Types of Denormalization 24.2.1 Pre-joined Tables24.2.2 Reports24.2.3 Mirrored Tables24.2.4 Table Splitting 24.2.5 Combined Tables24.2.6 Redundant Data24.2.7 Repeating Groups24.2.8 Derivable Data 24.2.9 Hierarchy Tables24.2.10. Overloaded Datatypes25.0 Metadata 25.1 Data25.1.1 Data Assets25.1.2 Data Engineering Assets25.1.3 Core Data 25.1.3 Data Management Assets25.2 Metadata Management25.2.1 Database Management Systems25.2.2 Data Dictionary25.2.3 Data Directory25.2.4 Data Encyclopedia25.2.5 Data Element Registry25.2.6 Repository25.3 Data Dictionary Environments25.3.1 Application Data Dictionary 25.3.2 Functional Area Data Dictionaries25.3.3 The ITS Data Dictionary/Registry25.4 NCITS L8 Standards25.4.1 Naming Data Elements25.4.1.1 Levels of Abstraction25.4.2 Registering Standards