CUDA Application Design and Development

1st Edition - October 8, 2011
Author: Rob Farber
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 3 8 8 4 2 6 - 8
eBook ISBN:
9 7 8 - 0 - 1 2 - 3 8 8 4 3 2 - 9

As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Companion materials(opens in new tab/window)Textbook support for instructors(opens in new tab/window)

As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.

The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.

Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.

CHAPTER 1 First Programs and How to Think in CUDA

Source Code and Wiki

Distinguishing CUDA from Conventional Programming with a Simple Example

Choosing a CUDA API

Some Basic CUDA Concepts

Understanding Our First Runtime Kernel

Three Rules of GPGPU Programming

Big-O Considerations and Data Transfers

CUDA and Amdahl’s Law

Data and Task Parallelism

Hybrid Execution: Using Both CPU and GPU Resources

Regression Testing and Accuracy

Silent Errors

Introduction to Debugging

UNIX Debugging

Windows Debugging with Parallel Nsight

Summary

CHAPTER 2 CUDA for Machine Learning and Optimization

Modeling and Simulation

Machine Learning and Neural Networks

XOR: An Important Nonlinear Machine-Learning Problem

Performance Results on XOR

Performance Discussion

Summary

The C++ Nelder-Mead Template

CHAPTER 3 The CUDA Tool Suite: Profiling a PCA/NLPCA

Functor

PCA and NLPCA

Obtaining Basic Profile Information

Gprof: A Common UNIX Profiler

The NVIDIA Visual Profiler: Computeprof

Parallel Nsight for Microsoft Visual Studio

Tuning and Analysis Utilities (TAU)

Summary

CHAPTER 4 The CUDA Execution Model

GPU Architecture Overview

Warp Scheduling and TLP

ILP: Higher Performance at Lower Occupancy

Little’s Law

CUDA Tools to Identify Limiting Factors

Summary

CHAPTER 5 CUDA Memory

The CUDA Memory Hierarchy

GPU Memory

L2 Cache

L1 Cache

CUDA Memory Types

Global Memory

Summary

CHAPTER 6 Efficiently Using GPU Memory

Reduction

Utilizing Irregular Data Structures

Sparse Matrices and the CUSP Library

Graph Algorithms

SoA, AoS, and Other Structures

Tiles and Stencils

Summary

CHAPTER 7 Techniques to Increase Parallelism

CUDA Contexts Extend Parallelism

Streams and Contexts

Out-of-Order Execution with Multiple Streams

Tying Data to Computation

Summary

CHAPTER 8 CUDA for All GPU and CPU Applications

Pathways from CUDA to Multiple Hardware Backends

Accessing CUDA from Other Languages

Libraries

CUBLAS

CUFFT

Summary

CHAPTER 9 Mixing CUDA and Rendering

OpenGL

GLUT

Introduction to the Files in the Framework

Summary

CHAPTER 10 CUDA in a Cloud and Cluster Environments

The Message Passing Interface (MPI)

How MPI Communicates

Bandwidth

Balance Ratios

Considerations for Large MPI Runs

Cloud Computing

A Code Example

Summary

CHAPTER 11 CUDA for Real Problems

Working with High-Dimensional Data

PCA/NLPCA

Force-Directed Graphs

Monte Carlo Methods

Molecular Modeling

Quantum Chemistry

Interactive Workflows

A Plethora of Projects

Summary

CHAPTER 12 Application Focus on Live Streaming Video

Topics in Machine Vision

FFmpeg

TCP Server

Contents ix

Live Stream Application

The simpleVBO.cpp File

The callbacksVBO.cpp File

Building and Running the Code

The Future

Summary

Listing for simpleVBO.cpp