Joe Celko

Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL

Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages. The result is overly complex and inefficient code, not to mention lost productivity.This book will change the way you think about the problems you solve with SQL programs.. Focusing on three key table-based techniques, Celko reveals their power through detailed examples and clear explanations. As you master these techniques, you’ll find you are able to conceptualize problems as rooted in sets and solvable through declarative programming. Before long, you’ll be coding more quickly, writing more efficient code, and applying the full power of SQL

Audience
Data analysts and database developers from all backgrounds, regardless of which database technology they use; this is the market for all of Joe's other books, all our data modeling books in general as well. This includes database developers working on transactional (OLTP) systems as well as data warehouse design (OLAP systems). Unlike most of Joe's other books which are for very experienced SQL programmers who want to become gurus, this book has the widest possible audience of programmers new to SQL as well as those who are very experienced.

Paperback, 384 Pages

Published: January 2008

Imprint: Morgan Kaufmann

ISBN: 978-0-12-374137-0

Contents

  • Table of ContentsPreface xvii1 SQL Is Declarative, Not Procedural1.1 Different Programming Models1.2 Different Data Models1.2.1 Columns Are Not Fields1.2.2 Rows Are Not Records1.2.3 Tables Are Not Files 1.2.4 Relational Keys Are Not Record Locators 1.2.5 Kinds of Keys 1.2.6 Desirable Properties of Relational Keys 1.2.7 Unique But Not Invariant 1.3 Tables as Entities 1.4 Tables as Relationships1.5 Statements Are Not Procedures 1.6 Molecular, Atomic, and Subatomic Data Elements 1.6.1 Table Splitting 1.6.2 Column Splitting 1.6.3 Temporal Splitting 1.6.4 Faking Non-1NF Data 1.6.5 Molecular Data Elements 1.6.6 Isomer Data Elements 1.6.7 Validating a Molecule 2 Hardware, Data Volume, and Maintaining Databases 2.1 Parallelism 2.2 Cheap Main Storage 2.3 Solid-State Disk 2.4 Cheaper Secondary and Tertiary Storage 2.5 The Data Changed 2.6 The Mindset Has Not Changed 3 Data Access and Records 3.1 Sequential Access 3.1.1 Tape-Searching Algorithms 3.2 Indexes 3.2.1 Single-Table Indexes 3.2.2 Multiple-Table Indexes 3.2.3 Type of Indexes 3.3 Hashing 3.3.1 Digit Selection 3.3.2 Division Hashing 3.3.3 Multiplication Hashing 3.3.4 Folding 3.3.5 Table Lookups 3.3.6 Collisions 3.4 Bit Vector Indexes 3.5 Parallel Access 3.6 Row and Column Storage 3.6.1 Row-Based Storage 3.6.2 Column-Based Storage 3.7 JOIN Algorithms 3.7.1 Nested-Loop Join Algorithm 3.7.2 Sort-Merge Join Method 3.7.3 Hash Join Method 3.7.4 Shin’s Algorithm 4 Lookup Tables 4.1 Data Element Names 4.2 Multiparameter Lookup Tables 4.3 Constants Table 4.4 OTLT or MUCK Table Problems 4.5 Defi nition of a Proper Table 5 Auxiliary Tables 5.1 Sequence Table 5.1.1 Creating a Sequence Table 5.1.2 Sequence Constructor 5.1.3 Replacing an Iterative Loop 5.2 Permutations 5.2.1 Permutations via Recursion 5.2.2 Permutations via CROSS JOIN 5.3 Functions 5.3.1 Functions without a Simple Formula 5.4 Encryption via Tables 5.5 Random Numbers 5.6 Interpolation 6 Views 6.1 Mullins VIEW Usage Rules 6.1.1 Effi cient Access and Computations 6.1.2 Column Renaming 6.1.3 Proliferation Avoidance 6.1.4 The VIEW Synchronization Rule 6.2 Updatable and Read-Only VIEWs 6.3 Types of VIEWs 6.3.1 Single-Table Projection and Restriction 6.3.2 Calculated Columns 6.3.3 Translated Columns 6.3.4 Grouped VIEWs 6.3.5 UNIONed VIEWs 6.3.6 JOINs in VIEWs 6.3.7 Nested VIEWs 6.4 Modeling Classes with Tables 6.4.1 Class Hierarchies in SQL 6.4.2 Subclasses via ASSERTIONs and TRIGGERs 6.5 How VIEWs Are Handled in the Database System 6.5.1 VIEW Column List 6.5.2 VIEW Materialization 6.6 In-Line Text Expansion 6.7 WITH CHECK OPTION Clause 6.7.1 WITH CHECK OPTION as CHECK( ) Clause 6.8 Dropping VIEWs 6.9 Outdated Uses for VIEWs 6.9.1 Domain Support 6.9.2 Table Expression VIEWs 6.9.3 VIEWs for Table Level CHECK( ) Constraints 6.9.4 One VIEW per Base Table 7 Virtual Tables 7.1 Derived Tables 7.1.1 Column Naming Rules 7.1.2 Scoping Rules 7.1.3 Exposed Table Names 7.1.4 LATERAL() Clause 7.2 Common Table Expressions 7.2.1 Nonrecursive CTEs 7.2.2 Recursive CTEs 7.3 Temporary Tables 7.3.1 ANSI/ISO Standards 7.3.2 Vendors Models 7.4 The Information Schema 7.4.1 The INFORMATION_SCHEMA Declarations 7.4.2 A Quick List of VIEWS and Their Purposes 7.4.3 DOMAIN Declarations 7.4.4 Defi nition Schema 7.4.5 INFORMATION_SCHEMA Assertions 8 Complicated Functions via Tables 8.1 Functions without a Simple Formula 8.1.1 Encryption via Tables 8.2 Check Digits via Tables 8.2.1 Check Digits Defi ned 8.2.2 Error Detection versus Error Correction 8.3 Classes of Algorithms 8.3.1 Weighted-Sum Algorithms 8.3.2 Power-Sum Check Digits 8.3.3 Luhn Algorithm 8.3.4 Dihedral Five Check Digit 8.4 Declarations, Not Functions, Not Procedures 8.5 Data Mining for Auxiliary Tables 9 Temporal Tables 9.1 The Nature of Time 9.1.1 Durations, Not Chronons 9.1.2 Granularity 9.2 The ISO Half-Open Interval Model 9.2.1 Use of NULL for “Eternity” 9.2.2 Single Timestamp Tables 9.2.3 Overlapping Intervals 9.3 State Transition Tables 9.4 Consolidating Intervals 9.4.1 Cursors and Triggers 9.4.2 OLAP Function Solution 9.4.3 CTE Solution 9.5 Calendar Tables 9.5.1 Day of Week via Tables 9.5.2 Holiday Lists 9.5.3 Report Periods 9.5.4 Self-Updating Views 9.6 History Tables 9.6.1 Audit Trails 10 Scrubbing Data with Non-1NF Tables 10.1 Repeated Groups 10.1.1 Sorting within a Repeated Group 10.2 Designing Scrubbing Tables 10.3 Scrubbing Constraints 10.4 Calendar Scrubs 10.4.1 Special Dates 10.5 String Scrubbing 10.6 Sharing SQL Data 10.6.1 A Look at Data Evolution 10.6.2 Databases 10.7 Extract, Transform, and Load Products 10.7.1 Loading Data Warehouses 10.7.2 Doing It All in SQL 10.7.3 Extract, Load, and then Transform 11 Thinking in SQL 11.1 Warm-up Exercises 11.1.1 The Whole and Not the Parts 11.1.2 Characteristic Functions 11.1.3 Locking into a Solution Early 11.2 Heuristics 11.2.1 Put the Specification into a Clear Statement 11.2.2 Add the Words “Set of All…” in Front of the Nouns 11.2.3 Remove Active Verbs from the Problem Statement 11.2.4 You Can Still Use Stubs 11.2.5 Do Not Worry about Displaying the Data 11.2.6 Your First Attempts Need Special Handling 11.2.7 Do Not Be Afraid to Throw Away Your First Attempts at DDL 11.2.8 Save Your First Attempts at DML 11.2.9 Do Not Think with Boxes and Arrows 11.2.10 Draw Circles and Set Diagrams 11.2.11 Learn Your Dialect 11.2.12 Imagine that Your WHERE Clause Is “Super Amoeba” 11.2.13 Use the Newsgroups, Blogs, and Internet 11.3 Do Not Use BIT or BOOLEAN Flags in SQL 11.3.1 Flags Are at the Wrong Level 11.3.2 Flags Confuse Proper Attributes 12 Group Characteristics 12.1 Grouping Is Not Equality 12.2 Using Groups without Looking Inside 12.2.1 Semiset-Oriented Approach 12.2.2 Grouped Solutions 12.2.3 Aggregated Solutions 12.3 Grouping over Time 12.3.1 Piece-by-Piece Solution 12.3.2 Data as a Whole Solution 12.4 Other Tricks with HAVING Clauses 12.5 Groupings, Rollups, and Cubes 12.5.1 GROUPING SET Clause 12.5.2 The ROLLUP Clause 12.5.3 The CUBE Clause 12.5.4 A Footnote about Super Grouping 12.6 The WINDOW Clause 12.6.1 The PARTITION BY Clause 12.6.2 The ORDER BY Clause 12.6.3 The RANGE Clause 12.6.4 Programming Tricks 13 Turning Specifications into Code 13.1 Signs of Bad SQL 13.1.1 Is the Code Formatted Like Another Language? 13.1.2 Assuming Sequential Access 13.1.3 Cursors 13.1.4 Poor Cohesion 13.1.5 Table-Valued Functions 13.1.6 Multiple Names for the Same Data Element 13.1.7 Formatting in the Database 13.1.8 Keeping Dates in Strings 13.1.9 BIT Flags, BOOLEAN, and Other Computed Columns 13.1.10 Attribute Splitting Across Columns 13.1.11 Attribute Splitting Across Rows 13.1.12 Attribute Splitting Across Tables 13.2 Methods of Attack 13.2.1 Cursor-Based Solution 13.2.2 Semiset-Oriented Approach 13.2.3 Pure Set-Oriented Approach 13.2.4 Advantages of Set-Oriented Code 13.3 Translating Vague Specifications 13.3.1 Go Back to the DDL 13.3.2 Changing Specifications 14 Using Procedure and Function Calls 14.1 Clearing out Spaces in a String 14.1.1 Procedural Solution #1 14.1.2 Functional Solution #1 14.1.3 Functional Solution #2 14.2 The PRD( ) Aggregate Function 14.3 Long Parameter Lists in Procedures and Functions 14.3.1 The IN( ) Predicate Parameter Lists 15 Numbering Rows 15.1 Procedural Solutions 15.1.1 Reordering on a Numbering Column 15.2 OLAP Functions 15.2.1 Simple Row Numbering 15.2.2 RANK( ) and DENSE_RANK( ) 15.3 Sections 16 Keeping Computed Data 16.1 Procedural Solution 16.2 Relational Solution 16.3 Other Kinds of Computed Data 17 Triggers for Constraints 17.1 Triggers for Computations 17.2 Complex Constraints via CHECK( ) and CASE Constraints 17.3 Complex Constraints via VIEWs 17.3.1 Set-Oriented Solutions 17.4 Operations on VIEWs as Constraints 17.4.1 The Basic Three Operations 17.4.2 WITH CHECK OPTION Clause 17.4.3 WITH CHECK OPTION as CHECK( ) clause 17.4.4 How VIEWs Behave 17.4.5 UNIONed VIEWs 17.4.6 Simple INSTEAD OF Triggers 17.4.7 Warnings about INSTEAD OF Triggers 18 Procedural and Data Driven Solutions 18.1 Removing Letters in a String 18.1.1 The Procedural Solution 18.1.2 Pure SQL Solution 18.1.3 Impure SQL Solution 18.2 Two Approaches to Sudoku 18.2.1 Procedural Approach 18.2.2 Data-Driven Approach 18.2.3 Handling the Given Digits 18.3 Data Constraint Approach 18.4 Bin Packing Problems 18.4.1 The Procedural Approach 18.4.2 The SQL Approach 18.5 Inventory Costs over Time 18.5.1 Inventory UPDATE Statements 18.5.2 Bin Packing Returns Index

Advertisement

advert image