CUDA Application Design and Development book cover

CUDA Application Design and Development

As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.

The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.

Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.

Audience

Software engineers, programmers, hardware engineers, advanced students

Paperback, 336 Pages

Published: October 2011

Imprint: Morgan Kaufmann

ISBN: 978-0-12-388426-8

Reviews

  • The book by Rob Faber on CUDA Application Design and Development is required reading for anyone who wants to understand and efficiently program CUDA for scientific and visual programming. It provides a hands-on exposure to the details in a readable and easy to understand form. Jack Dongarra, Innovative Computing Laboratory, EECS Department, University of Tennessee

    GPUs have the potential to take computational simulations to new levels of scale and detail. Many scientists are already realising these benefits, tackling larger and more complex problems that are not feasible on conventional CPU-based systems. This book provides the tools and techniques for anyone wishing to join these pioneers, in an accessible though thorough text that a budding CUDA programmer would do well to keep close to hand. Dr. George Beckett, EPCC, University of Edinburgh

    With his book, Farber takes us on a journey to the exciting world of programming multi-core processor machines with CUDA. Farber's pragmatic approach is effective in guiding the reader across challenges and their solutions.   Farber's broader presentation of parallel programming with CUDA ranging from CUDA in Cloud and Cluster environments to CUDA for real problems and applications helps the reader learning about the unique opportunities this parallel programming language can offer to the scientific community. This book is definitely a must for students, teachers, and developers! Michela Taufer, Assistant Professor, Department of Computer and Information Sciences, University of Delaware

    Rob Farber has written an enlightening and accessible book on the application to CUDA for real research tasks, with an eye to developing scalable and distributed GPU applications.  He supplies clear and usable code examples combined with insight about _why_ one should use a particular approach.  This is an excellent book filled with practical advice for experienced CUDA programmers and ground-up guidance for beginners wondering if CUDA can accelerate their time to solution. Paul A. Navrátil, Manager, Visualization Software, Texas Advanced Computing Center

    The book provides a solid introduction to the CUDA programming language starting with the basics and progressively exposing the reader to advanced concepts through the well annotated implementation of real-world applications. It makes a first-rate presentation of CUDA, its use in the implementation of portable and efficient applications and the underlying architecture of GPGPU/CPU systems with particular emphasis on memory hierarchies. This is complemented by a thorough presentation both of the CUDA Tool Suite and of techniques for the parallelisation of applications. Farber's book is a valuable addition to the bookshelves of both the advanced and novice CUDA programmer. Francis Wray, Independent Consultant and Visiting Professor at the Faculty of Computing, Information Systems and Mathematics at the University of Kingston

    At a brisk pace, "CUDA Application Design and Development" will take one from the basics of CUDA programming to the level where real-time video processing becomes a stroll in the park. Along the way, the reader can get a clear understanding of how the hybrid CPU-GPU computing idea can be capitalized on, and how a 500-GPU configuration can be used in large scale machine learning problems.  Wasting no time on obscure issues of little relevance, the book provides an excellent account of the CUDA execution model, memory access issues, opportunities to increase parallelism in a program, and how advanced profiling can squeeze performance out of a code.  Rob provides a snapshot of everything that is relevant in CUDA based GPU computing in a style honed through a long series of Dr. Dobb’s articles that have delighted scores of CUDA programmers.  His followers will be delighted once again. Dan Negrut, Associate Professor, University of Wisconsin-Madison, NVIDIA CUDA Fellow


Contents

  • CHAPTER 1 First Programs and How to Think in CUDA

    Source Code and Wiki

    Distinguishing CUDA from Conventional Programming with a Simple Example

    Choosing a CUDA API

    Some Basic CUDA Concepts

    Understanding Our First Runtime Kernel

    Three Rules of GPGPU Programming

    Big-O Considerations and Data Transfers

    CUDA and Amdahl’s Law

    Data and Task Parallelism

    Hybrid Execution: Using Both CPU and GPU Resources

    Regression Testing and Accuracy

    Silent Errors

    Introduction to Debugging

    UNIX Debugging

    Windows Debugging with Parallel Nsight

    Summary

    CHAPTER 2 CUDA for Machine Learning and Optimization

    Modeling and Simulation

    Machine Learning and Neural Networks

    XOR: An Important Nonlinear Machine-Learning Problem

    Performance Results on XOR

    Performance Discussion

    Summary

    The C++ Nelder-Mead Template

    CHAPTER 3 The CUDA Tool Suite: Profiling a PCA/NLPCA

    Functor

    PCA and NLPCA

    Obtaining Basic Profile Information

    Gprof: A Common UNIX Profiler

    The NVIDIA Visual Profiler: Computeprof

    Parallel Nsight for Microsoft Visual Studio

    Tuning and Analysis Utilities (TAU)

    Summary

    CHAPTER 4 The CUDA Execution Model

    GPU Architecture Overview

    Warp Scheduling and TLP

    ILP: Higher Performance at Lower Occupancy

    Little’s Law

    CUDA Tools to Identify Limiting Factors

    Summary

    CHAPTER 5 CUDA Memory

    The CUDA Memory Hierarchy

    GPU Memory

    L2 Cache

    L1 Cache

    CUDA Memory Types

    Global Memory

    Summary

    CHAPTER 6 Efficiently Using GPU Memory

    Reduction

    Utilizing Irregular Data Structures

    Sparse Matrices and the CUSP Library

    Graph Algorithms

    SoA, AoS, and Other Structures

    Tiles and Stencils

    Summary

    CHAPTER 7 Techniques to Increase Parallelism

    CUDA Contexts Extend Parallelism

    Streams and Contexts

    Out-of-Order Execution with Multiple Streams

    Tying Data to Computation

    Summary

    CHAPTER 8 CUDA for All GPU and CPU Applications

    Pathways from CUDA to Multiple Hardware Backends

    Accessing CUDA from Other Languages

    Libraries

    CUBLAS

    CUFFT

    Summary

    CHAPTER 9 Mixing CUDA and Rendering

    OpenGL

    GLUT

    Introduction to the Files in the Framework

    Summary

    CHAPTER 10 CUDA in a Cloud and Cluster Environments

    The Message Passing Interface (MPI)

    How MPI Communicates

    Bandwidth

    Balance Ratios

    Considerations for Large MPI Runs

    Cloud Computing

    A Code Example

    Summary

    CHAPTER 11 CUDA for Real Problems

    Working with High-Dimensional Data

    PCA/NLPCA

    Force-Directed Graphs

    Monte Carlo Methods

    Molecular Modeling

    Quantum Chemistry

    Interactive Workflows

    A Plethora of Projects

    Summary

    CHAPTER 12 Application Focus on Live Streaming Video

    Topics in Machine Vision

    FFmpeg

    TCP Server

    Contents ix

    Live Stream Application

    The simpleVBO.cpp File

    The callbacksVBO.cpp File

    Building and Running the Code

    The Future

    Summary

    Listing for simpleVBO.cpp

Advertisement

advert image