Structured Parallel Programming - 1st Edition - ISBN: 9780124159938, 9780123914439

Structured Parallel Programming

1st Edition

Patterns for Efficient Computation

Authors: Michael McCool James Reinders Arch Robison
Paperback ISBN: 9780124159938
eBook ISBN: 9780123914439
Imprint: Morgan Kaufmann
Published Date: 25th June 2012
Page Count: 432
Tax/VAT will be calculated at check-out
Compatible Not compatible
VitalSource PC, Mac, iPhone & iPad Amazon Kindle eReader
ePub & PDF Apple & PC desktop. Mobile devices (Apple & Android) Amazon Kindle eReader
Mobi Amazon Kindle eReader Anything else

Institutional Access


Structured Parallel Programming offers the simplest way for developers to learn patterns for high-performance parallel programming. Written by parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders, this book explains how to design and implement maintainable and efficient parallel algorithms using a composable, structured, scalable, and machine-independent approach to parallel computing. It presents both theory and practice, and provides detailed concrete examples using multiple programming models.

The examples in this book are presented using two of the most popular and cutting edge programming models for parallel programming: Threading Building Blocks, and Cilk Plus. These architecture-independent models enable easy integration into existing applications, preserve investments in existing code, and speed the development of parallel applications. Examples from realistic contexts illustrate patterns and themes in parallel algorithm design that are widely applicable regardless of implementation technology.

Software developers, computer programmers, and software architects will find this book extremely helpful.

Key Features

  • The patterns-based approach offers structure and insight that developers can apply to a variety of parallel programming models
  • Develops a composable, structured, scalable, and machine-independent approach to parallel computing
  • Includes detailed examples in both Cilk Plus and the latest Threading Building Blocks, which support a wide variety of computers


Software Developers, Computer Programmers, Software Architects

Table of Contents




Chapter 1. Introduction

1.1 Think Parallel

1.2 Performance

1.3 Motivation: Pervasive Parallelism

1.4 Structured Pattern-Based Programming

1.5 Parallel Programming Models

1.6 Organization of this Book

1.7 Summary

Chapter 2. Background

2.1 Vocabulary and Notation

2.2 Strategies

2.3 Mechanisms

2.4 Machine Models

2.5 Performance Theory

2.6 Pitfalls

2.7 Summary

PART I. Patterns

Chapter 3. Patterns

3.1 Nesting Pattern

3.2 Structured Serial Control Flow Patterns

3.3 Parallel Control Patterns

3.4 Serial Data Management Patterns

3.5 Parallel Data Management Patterns

3.6 Other Parallel Patterns

3.7 Non-Deterministic Patterns

3.8 Programming Model Support for Patterns

3.9 Summary

Chapter 4. Map

4.1 Map

4.2 Scaled Vector Addition (SAXPY)

4.3 Mandelbrot

4.4 Sequence of Maps versus Map of Sequence

4.5 Comparison of Parallel Models

4.6 Related Patterns

4.7 Summary

Chapter 5. Collectives

5.1 Reduce

5.2 Fusing Map and Reduce

5.3 Dot Product

5.4 Scan

5.5 Fusing Map and Scan

5.6 Integration

5.7 Summary

Chapter 6. Data Reorganization

6.1 Gather

6.2 Scatter

6.3 Converting Scatter to Gather

6.4 Pack

6.5 Fusing Map and Pack

6.6 Geometric Decomposition and Partition

6.7 Array of Structures vs. Structures of Arrays

6.8 Summary

Chapter 7. Stencil and Recurrence

7.1 Stencil

7.2 Implementing Stencil with Shift

7.3 Tiling Stencils for Cache

7.4 Optimizing Stencils for Communication

7.5 Recurrence

7.6 Summary

Chapter 8. Fork–Join

8.1 Definition

8.2 Programming Model Support for Fork–Join

8.3 Recursive Implementation of Map

8.4 Choosing Base Cases

8.5 Load Balancing

8.6 Complexity of Parallel Divide-and-Conquer

8.7 Karatsuba Multiplication of Polynomials

8.8 Cache Locality and Cache-Oblivious Algorithms

8.9 Quicksort

8.10 Reductions and Hyperobjects

8.11 Implementing Scan with Fork–Join

8.12 Applying Fork–Join to Recurrences

8.13 Summary

Chapter 9. Pipeline

9.1 Basic Pipeline

9.2 Pipeline with Parallel Stages

9.3 Implementation of a Pipeline

9.4 Programming Model Support for Pipelines

9.5 More General Topologies

9.6 Mandatory versus Optional Parallelism

9.7 Summary

PART II. Examples

Chapter 10. Forward Seismic Simulation

10.1 Background

10.2 Stencil Computation

10.3 Impact of Caches on Arithmetic Intensity

10.4 Raising Arithmetic Intensity with Space–Time Tiling

10.5 Cilk Plus Code

10.6 ArBB Implementation

10.7 Summary

Chapter 11. K-Means Clustering

11.1 Algorithm

11.2 K-Means with Cilk Plus

11.3 K-Means with TBB

11.4 Summary

Chapter 12. Bzip2 Data Compression

12.1 The Bzip2 Algorithm

12.2 Three-Stage Pipeline Using TBB

12.3 Four-Stage Pipeline Using TBB

12.4 Three-Stage Pipeline Using Cilk Plus

12.5 Summary

Chapter 13. Merge Sort

13.1 Parallel Merge

13.2 Parallel Merge Sort

13.3 Summary

Chapter 14. Sample Sort

14.1 Overall Structure

14.2 Choosing the Number of Bins

14.3 Binning

14.4 Repacking and Subsorting

14.5 Performance Analysis of Sample Sort

14.6 For C++ Experts

14.7 Summary

Chapter 15. Cholesky Factorization

15.1 Fortran Rules!

15.2 Recursive Cholesky Decomposition

15.3 Triangular Solve

15.4 Symmetric Rank Update

15.5 Where is the Time Spent?

15.6 Summary

APPENDIX A. Further Reading




APPENDIX E. Glossary




No. of pages:
© Morgan Kaufmann 2012
Morgan Kaufmann
eBook ISBN:
Paperback ISBN:

About the Author

Michael McCool

Michael McCool has research and application experience in the areas of data mining, computer graphics (specifically sampling, rasterization, texture hardware, antialiasing,shading, illumination, and visualization), medical imaging, signal and image processing, financial analysis,and languages and programming platforms for high productivity parallel computing. In order to commercialize research work into many-core computing platforms done while he was a professor at the University of Waterloo,in 2004 he co-founded RapidMind, which in 2009 was acquired by Intel. Currently he is a Software Architect with Intel working on Array Building Blocks and an Adjunct Associate Professor with the University of Waterloo. In addition to his university teaching, he has presented tutorials at Eurographics, SIGGRAPH, and SC on graphics and/or parallel computing.

Affiliations and Expertise

Software Architect, Intel Corporation

James Reinders

James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the world’s first TeraFLOP supercomputer (ASCI Red), as well as compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James has published numerous articles, contributed to several books and is widely interviewed on parallelism. James has managed software development groups, customer service and consulting teams, business development and marketing teams. James is sought after to keynote on parallel programming, and is the author/co-author of three books currently in print including Structured Parallel Programming, published by Morgan Kaufmann in 2012.

Affiliations and Expertise

Director and Programming Model Architect, Intel Corporation

Arch Robison

Arch Robison is the architect of Threading Building Blocks.He was the lead developer for KAI C++, which established the state of the art for C++ optimization in the ‘90s. He has 12 software patents, 3 award winning entries in the International Obfuscated C Code Contest, and an Erdös number of 3. He is also the (uncredited) author of the TBB Tutorial and Design Patterns documents, as well as most of the TBB Reference manual, and wrote a chapter of the Intel Multi-core Programming book. He has presented TBB tutorials at OSCON, PADTAD, and HPCC07, and invited talks on TBB at LCPC, Texas A&M, U. Maryland, Indiana U., MIT, and UIUC. Finally, he has presented short classes (three 75 minute sessions) on parallel programming at UIUC.

Affiliations and Expertise

Senior Principal Engineer, Intel Corporation


Intel Recommended Reading List for Developers, 2nd Half 2013 – Books for Software Developers, Intel
Intel Recommended Reading List for Developers, 1st Half 2014 – Books for Software Developers, Intel


"I've been dreaming for a while of a modern accessible book that I could recommend to my threading-deprived colleagues and assorted enquirers to get them up to speed with the core concepts of multithreading as well as something that covers all the major current interesting implementations. Finally I have that book."

— Martin Watt, Principal Engineer, Dreamworks Animation