GPU Computing Gems Emerald Edition

GPU Computing Gems Emerald Edition

1st Edition - January 13, 2011
  • Editor-in-Chief: Wen-mei Hwu
  • Hardcover ISBN: 9780123849885
  • eBook ISBN: 9780123849892

Purchase options

Purchase options
Available
DRM-free (Mobi, EPub, PDF)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing. This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use. Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website:

Key Features

  • Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more
  • Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution
  • Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use

Readership

computer programmers, software engineers, hardware engineers, computer science students

Table of Contents

  • Editors, Reviewers, and Authors

    Introduction

    Introduction

    Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals

    1.1. Introduction, Problem Statement, and Context

    1.2. Core Method

    1.3. Algorithms, Implementations, and Evaluations

    1.4. Final Evaluation

    1.5. Future Directions

    Chapter 2. Large-Scale Chemical Informatics on GPUs

    2.1. Introduction, Problem Statement, and Context

    2.2. Core Methods

    2.3. Gaussian Shape Overlay: Parallelization and Arithmetic Optimization

    2.4. LINGO: Algorithmic Transformation and Memory Optimization

    2.5. Final Evaluation

    2.6. Future Directions

    Chapter 3. Dynamical Quadrature Grids

    3.1. Introduction

    3.2. Core Method

    3.3. Implementation

    3.4. Performance Improvement

    3.5. Future Work

    Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs

    4.1. Introduction, Problem Statement, and Context

    4.2. Core Method

    4.3. Algorithms, Implementations, and Evaluations

    4.4. Final Evaluation

    4.5. Future Directions

    Chapter 5. Quantum Chemistry

    5.1. Problem Statement

    5.2. Core Technology and Algorithm

    5.3. The Key Insight on the Implementation—the Choice of Building Blocks

    5.4. Final Evaluation and Benefits

    5.5. Conclusions and Future Directions

    Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

    6.1. Introduction, Problem Statement, and Context

    6.2. Core Methods

    6.3. Algorithms and Implementations

    6.4. Evaluation and Validation of Results, Total Benefits, and Limitations

    6.5. Future Directions

    Chapter 7. Leveraging the Untapped Computation Power of GPUs

    7.1. Background and Problem Statement

    7.2. Flux Calculation and Aggregation

    7.3. The GRASSY Platform

    7.4. Initial Testing

    7.5. Impact and Future Directions

    Chapter 8. Black Hole Simulations with CUDA

    8.1. Introduction

    8.2. The Post-Newtonian Approximation

    8.3. Numerical Algorithm

    8.4. GPU Implementation

    8.5. Performance Results

    8.6. GPU Supercomputing Clusters

    8.7. Statistical Results for Black Hole Inspirals

    8.8. Conclusion

    Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA

    9.1. Introduction

    9.2. Fast N-Body Simulation

    9.3. CUDA Implementation of the Fast N-Body Algorithms

    9.4. Improvements of Performance

    9.5. Detailed Description of the GPU Kernels

    9.6. Overview of Advanced Techniques

    9.7. Conclusions

    Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures

    10.1. Introduction, Problem Statement, and Context

    10.2. Core Method

    10.3. Algorithms, Implementations, and Evaluations

    10.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    10.5. Conclusions and Future Directions

    Introduction

    Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm

    11.1. Introduction, Problem Statement, and Context

    11.2. Core Method

    11.3. CUDA implementation of the SW algorithm for identification of homologous proteins

    11.4. Discussion

    11.5. Final Evaluation

    Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching

    12.1. Introduction, Problem Statement, and Context

    12.2. Core Methods

    12.3. Algorithms, Implementations, and Evaluations

    12.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    12.5. Future Directions

    Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching

    13.1. Introduction, Problem Statement, and Context

    13.2. Core Method

    13.3. Algorithms, Implementations, and Evaluations

    13.4. Final Evaluation

    13.5. Future Direction

    Chapter 14. GPU Accelerated RNA Folding Algorithm

    14.1. Problem Statement

    14.2. Core Method

    14.3. Algorithms, Implementations, and Evaluations

    14.4. Final Evaluation

    14.5. Future Directions

    Chapter 15. Temporal Data Mining for Neuroscience

    15.1. Introduction

    15.2. Core Methodology

    15.3. GPU Parallelization: Algorithms and Implementations

    15.4. Experimental Results

    15.5. Discussion

    Introduction

    Chapter 16. Parallelization Techniques for Random Number Generators

    16.1. Introduction

    16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a

    16.3. Sobol Generator

    16.4. Mersenne Twister MT19937

    16.5. Performance Benchmarks

    Chapter 17. Monte Carlo Photon Transport on the GPU

    17.1. Physics of Photon Transport

    17.2. Photon Transport on the GPU

    17.3. The Complete System

    17.4. Results and Evaluation

    17.5. Future Directions

    Chapter 18. High-Performance Iterated Function Systems

    18.1. Problem Statement and Mathematical Background

    18.2. Core Technology

    18.3. Implementation

    18.4. Final Evaluation

    18.5. Conclusion

    Introduction

    Chapter 19. Large-Scale Machine Learning

    19.1. Introduction

    19.2. Core Technology

    19.3. GPU Algorithm and Implementation

    19.4. Improvements of Performance

    19.5. Conclusions and Future Work

    Chapter 20. Multiclass Support Vector Machine

    20.1. Introduction, Problem Statement, and Context

    20.2. Core Method

    20.3. Algorithms, Implementations, and Evaluations

    20.4. Final Evaluation

    20.5. Future Direction

    Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA

    21.1. Introduction, Problem Statement, and Context

    21.2. Final Evaluation and Validation of Results

    21.3. Conclusions, Benefits and Limitations, and Future Work

    Chapter 22. GPU-Accelerated Ant Colony Optimization

    22.1. Introduction, Problem Statement, and Context

    22.2. Core Method

    22.3. Algorithms, Implementations, and Evaluations

    22.4. Final Evaluation

    22.5. Future Direction

    Introduction

    Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs

    23.1. Introduction

    23.2. Simulator Overview

    23.3. Compilation and Simulation

    23.4. Experimental Results

    23.5. Future Directions

    Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization

    24.1. Introduction, Problem Statement, and Context

    24.2. Core Method

    24.3. Algorithms, Implementations, and Evaluations

    24.4. Final Evaluation

    24.5. Future Direction

    Introduction

    Chapter 25. Lattice Boltzmann Lighting Models

    25.1. Introduction, Problem Statement, and Context

    25.2. Core Methods

    25.3. Algorithms, Implementation, and Evaluation

    25.4. Final Evaluation

    25.5. Future Directions

    25.6. Derivation of the Diffusion Equation

    Chapter 26. Path Regeneration for Random Walks

    26.1. Introduction

    26.2. Path Tracing as Case Study

    26.3. Random Walks in Path Tracing

    26.4. Implementation Details

    26.5. Results

    26.6. Discussion

    Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation

    27.1. System Overview

    27.2. Background

    27.3. Core Technology and Algorithms

    27.4. Future Directions

    Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency

    28.1. Introduction, Problem Statement, and Context

    28.2. Core Method

    28.3. Algorithms, Implementations, and Evaluations

    28.4. Final Evaluation

    28.5. Future Direction

    Introduction

    Chapter 29. Fast Graph Cuts for Computer Vision

    29.1. Introduction, Problem Statement, and Context

    29.2. Core Method

    29.3. Algorithms, Implementations, and Evaluations

    29.4. Final evaluation and validation of results

    29.5. Multilabel Graph Cuts

    Chapter 30. Visual Saliency Model on Multi-GPU

    30.1. Introduction

    30.2. Visual Saliency Model

    30.3. GPU Implementation

    30.4. Results

    30.5. Conclusion

    Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows

    31.1. Introduction, Problem Statement, and Context

    31.2. Core Method

    Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU

    32.1. Introduction

    32.2. Methods

    32.3. Implementation

    32.4. Results and Discussion

    32.5. Conclusion and Future Work

    Chapter 33. Haar Classifiers for Object Detection with CUDA

    33.1. Introduction

    33.2. Viola-Jones Object Detection Retrospective

    33.3. Object Detection Pipeline with NVIDIA CUDA

    33.4. Benchmarking and Implementation Details

    33.5. Future Direction

    33.6. Conclusion

    Introduction

    Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL

    34.1. Introduction, Problem Statement, and Background

    34.2. Core Technology or Algorithm

    34.3. Key Insights from Implementation and Evaluation

    34.4. Final Evaluation

    Chapter 35. Connected Component Labeling in CUDA

    35.1. Introduction

    35.2. Core Algorithm

    35.3. CUDA Algorithm and Implementation

    35.4. Final Evaluation and Results

    Chapter 36. Image De-Mosaicing

    36.1. Introduction, Problem Statement, and Context

    36.2. Core Method

    36.3. Algorithms, Implementations, and Evaluations

    36.4. Final Evaluation

    Introduction

    Chapter 37. Efficient Automatic Speech Recognition on the GPU

    37.1. Introduction, Problem Statement, and Context

    37.2. Core Methods

    37.3. Algorithms, Implementations, and Evaluations

    37.4. Conclusion and Future Directions

    Chapter 38. Parallel LDPC Decoding

    38.1. Introduction, Problem Statement, and Context

    38.2. Core Technology

    38.3. Algorithms, Implementations, and Evaluations

    38.4. Final Evaluation

    38.5. Future Directions

    Chapter 39. Large-Scale Fast Fourier Transform

    39.1. Introduction

    39.2. Memory Hierarchy of GPU Clusters

    39.3. Large-Scale Fast Fourier Transform

    39.4. Algebraic Manipulation of Array Dimensions

    39.5. Performance Results

    39.6. Conclusion and Future Work

    Introduction

    Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis

    40.1. Introduction

    40.2. Digital Breast Tomosynthesis

    40.3. Accelerating Iterative DBT using GPUs

    40.4. Conclusions

    Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU

    41.1. Introduction, Problem, and Context

    41.2. Core Methods

    41.3. Algorithms, Implementations, and Evaluations

    41.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    41.5. Related Work

    41.6. Future Directions

    41.7. Summary

    Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA

    42.1. Introduction

    42.2. Core Methods

    42.3. Implementation

    42.4. Evaluation and Validation of Results, Total Benefits, and Limitations

    42.5. Future Directions

    Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms

    43.1. Introduction, Problem Statement, and Context

    43.2. Core Method(s)

    43.3. Algorithms, Implementations, and Evaluations

    43.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    43.5. Future Directions

    Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation

    44.1. Introduction

    44.2. Core Method: Advanced Image Reconstruction Toolbox for MRI

    44.3. MRI Reconstruction Algorithms and Implementation on GPUs

    44.4. Final Results and Evaluation

    44.5. Conclusion and Future Directions

    Chapter 45. ℓ1 Minimization in ℓ1-SPIRiT Compressed Sensing MRI Reconstruction

    45.1. Introduction, Problem Statement, and Context

    45.2. Core Methods (High Level Description)

    45.3. Algorithms, Implementations, and Evaluations (Detailed Description)

    45.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    45.5. Discussion and Conclusion

    Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters

    46.1. Introduction

    46.2. Core Methods

    46.3. Implementation

    46.4. Results

    46.5. Future Directions

    46.6. Acknowledgments

    Chapter 47. Deformable Volumetric Registration Using B-Splines

    47.1. Introduction

    47.2. An Overview of B-Spline Registration

    47.3. Implementation Details

    47.4. Results

    47.5. Conclusions

    Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs

    48.1. Introduction, Problem Statement, and Context

    48.2. Core Methods

    48.3. Algorithms, Implementations, and Evaluations

    48.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    48.5. Future Directions

    Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs

    49.1. Introduction

    49.2. Core Methods

    49.3. Implementation

    49.4. Results

    49.5. Future Directions

    Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA

    50.1. Introduction, Problem Statement, and Context

    50.2. Core Methods

    50.3. Algorithms, Implementations, and Evaluations

    50.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations

    50.5. Future Directions

    Index

Product details

  • No. of pages: 886
  • Language: English
  • Copyright: © Morgan Kaufmann 2011
  • Published: January 13, 2011
  • Imprint: Morgan Kaufmann
  • Hardcover ISBN: 9780123849885
  • eBook ISBN: 9780123849892

About the Editor in Chief

Wen-mei Hwu

Wen-mei Hwu
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (www.impact.crhc.illinois.edu). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Affiliations and Expertise

CTO, MulticoreWare and professor specializing in compiler design, computer architecture, microarchitecture, and parallel processing, University of Illinois at Urbana-Champaign, USA