COVID-19 Update: We are currently shipping orders daily. However, due to transit disruptions in some geographies, deliveries may be delayed. To provide all customers with timely access to content, we are offering 50% off Science and Technology Print & eBook bundle options. Terms & conditions.
Parallel Programming with OpenACC - 1st Edition - ISBN: 9780124103979, 9780124104594

Parallel Programming with OpenACC

1st Edition

Author: Rob Farber
eBook ISBN: 9780124104594
Paperback ISBN: 9780124103979
Imprint: Morgan Kaufmann
Published Date: 14th October 2016
Page Count: 326
Sales tax will be calculated at check-out Price includes VAT/GST
Price includes VAT/GST

Institutional Subscription

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.


Parallel Programming with OpenACC is a modern, practical guide to implementing dependable computing systems. The book explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas. The OpenACC directive-based programming model is designed to provide a simple, yet powerful, approach to accelerators without significant programming effort.

Author Rob Farber, working with a team of expert contributors, demonstrates how to turn existing applications into portable GPU accelerated programs that demonstrate immediate speedups. The book also helps users get the most from the latest NVIDIA and AMD GPU plus multicore CPU architectures (and soon for Intel® Xeon Phi™ as well). Downloadable example codes provide hands-on OpenACC experience for common problems in scientific, commercial, big-data, and real-time systems.

Topics include writing reusable code, asynchronous capabilities, using libraries, multicore clusters, and much more. Each chapter explains how a specific aspect of OpenACC technology fits, how it works, and the pitfalls to avoid. Throughout, the book demonstrates how the use of simple working examples that can be adapted to solve application needs.

Key Features

  • Presents the simplest way to leverage GPUs to achieve application speedups
  • Shows how OpenACC works, including working examples that can be adapted for application needs
  • Allows readers to download source code and slides from the book's companion web page


Scientists, professional developers, and engineers looking to leverage GPU computing; students studying parallel programming

Table of Contents

  • Contributors
  • Foreword by Michael Wolfe
  • Preface
  • Acknowledgments
  • Chapter 1: From serial to parallel programming using OpenACC
    • Abstract
    • A Simple Data-Parallel Loop
    • A Simple Task-Parallel Example
    • Amdahl’s Law and Scaling
    • Parallel Execution and Race Conditions
    • Lock-Free Programming
    • Controlling Parallel Resources
    • Make Your Life Simple
  • Chapter 2: Profile-guided development with OpenACC
    • Abstract
    • Benchmark Code: Conjugate Gradient
    • Describe Parallelism
    • Describe Data Movement
    • Optimize Loops
    • Running in Parallel on Multicore
    • Summary
  • Chapter 3: Profiling performance of hybrid applications with Score-P and Vampir
    • Abstract
    • Performance Analysis Techniques and Terminology
    • Evolutionary Performance Improvement
    • A Particle-in-Cell Simulation of a Laser Driven Electron Beam
    • Preparing the Measurement Through Code Instrumentation
    • Recording Performance Information During the Application Run
    • Looking at a First Parallel PIConGPU Implementation
    • Freeing Up the Host Process
    • Optimizing GPU Kernels
    • Adding GPU Task Parallelism
    • Investigating OpenACC Run Time Events With Score-P and Vampir
    • Summary
  • Chapter 4: Pipelining data transfers with OpenACC
    • Abstract
    • Introduction to Pipelining
    • Example Code: Mandelbrot Generator
    • Pipelining Across Multiple Devices
    • Conclusions
  • Chapter 5: Advanced data management
    • Abstract
    • Unstructured Data Regions
    • Aggregate Types With Dynamic Data Members
    • C++ Class Data Management
    • Using Global and Module Variables in Routines
    • Using Device Only Data
    • Code Examples
    • Runtime Results
    • Summary
  • Chapter 6: Tuning OpenACC loop execution
    • Abstract
    • The Loop Construct
    • Basic Loop Optimization Clauses
    • Advanced Loop Optimization Clauses
    • Performance Results
    • Conclusion
  • Chapter 7: Multidevice programming with OpenACC
    • Abstract
    • Introduction
    • Three Ways to Program Multiple Devices With OpenACC
    • Example: Jacobi Solver for the 2D Poisson Equation
    • Domain Decomposition
    • Debugging and Profiling
    • Conclusion
  • Chapter 8: Using OpenACC for stencil and Feldkamp algorithms
    • Abstract
    • Introduction
    • Experimental Setup
    • Hybrid OpenMP/OpenACC
    • Summary
  • Chapter 9: Accelerating 3D wave equations using OpenACC
    • Abstract
    • Introduction
    • Code Example: Solving 3D Scalar Wave Equation
    • Converting Stack to Heap
    • Measuring Host Baseline Scalability
    • Using OpenACC Tools
    • Using OpenACC Data Directives
    • Targeting Multicore Systems With OpenACC
    • Summary
  • Chapter 10: The detailed development of an OpenACC application
    • Abstract
    • Introducing CloverLeaf
    • Development Platform: Cray XK6
    • Development of OpenACC CloverLeaf
    • Conclusion
    • Summary
    • For More Information
  • Chapter 11: GPU-accelerated molecular dynamics clustering analysis with OpenACC
    • Abstract
    • Acknowledgments
    • Introduction
    • Overview of MD Clustering Analysis
    • Hardware Architecture Considerations
    • Implementation
    • Performance Results
    • Summary and Conclusion
  • Chapter 12: Incrementally accelerating the RI-MP2 correlated method of electronic structure theory using OpenACC compiler directives
    • Abstract
    • Acknowledgments
    • Introduction
    • Theory
    • Implementation
    • Results
    • Summary and Conclusion
  • Chapter 13: Using OpenACC to port large legacy climate and weather modeling code to GPUs
    • Abstract
    • Introduction
    • Porting Approach: Step by Step
    • Performance Optimization
    • Results for the Radiation Parameterization
  • Index


No. of pages:
© Morgan Kaufmann 2016
14th October 2016
Morgan Kaufmann
eBook ISBN:
Paperback ISBN:

About the Author

Rob Farber

Rob Farber

Rob Farber has served as a scientist in Europe at the Irish Center for High-End Computing as well as U.S. national labs in Los Alamos, Berkeley, and the Pacific Northwest. He has also been on the external faculty at the Santa Fe Institute, consultant to fortune 100 companies, and co-founder of two computational startups that achieved liquidity events. He is the author of “CUDA Application Design and Development” as well as numerous articles and tutorials that have appeared in Dr. Dobb's Journal and Scientific Computing, The Code Project and others.

Affiliations and Expertise

CEO/Publisher of, Wall Street Analyst, and consultant to scientific and commercial technology companies around the world.

Ratings and Reviews