CUDA Fortran for Scientists and Engineers

CUDA Fortran for Scientists and Engineers

Best Practices for Efficient CUDA Fortran Programming

1st Edition - September 11, 2013

Write a review

  • Authors: Gregory Ruetsch, Massimiliano Fatica
  • eBook ISBN: 9780124169722
  • Paperback ISBN: 9780124169708

Purchase options

Purchase options
DRM-free (Mobi, PDF, EPub)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison.

Key Features

  • Leverage the power of GPU computing with PGI’s CUDA Fortran compiler
  • Gain insights from members of the CUDA Fortran language development team
  • Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches
  • Includes full source code for all the examples and several case studies
  • Download source code and slides from the book's companion website


Professional scientists and engineers whose research codes are in Fortran; students studying parallel programming using Fortran.

Table of Contents

  • Dedication



    Companion Site

    Part I: CUDA Fortran Programming

    Chapter 1. Introduction


    1.1 A brief history of GPU computing

    1.2 Parallel computation

    1.3 Basic concepts

    1.4 Determining CUDA hardware features and limits

    1.5 Error handling

    1.6 Compiling CUDA Fortran code

    Chapter 2. Performance Measurement and Metrics


    2.1 Measuring kernel execution time

    2.2 Instruction, bandwidth, and latency bound kernels

    2.3 Memory bandwidth

    Chapter 3. Optimization


    3.1 Transfers between host and device

    3.2 Device memory

    3.3 On-chip memory

    3.4 Memory optimization example: matrix transpose

    3.5 Execution configuration

    3.6 Instruction optimization

    3.7 Kernel loop directives

    Chapter 4. Multi-GPU Programming


    4.1 CUDA multi-GPU features

    4.2 Multi-GPU Programming with MPI

    Part II: Case Studies

    Chapter 5. Monte Carlo Method


    5.1 CURAND

    5.2 Computing image with CUF kernels

    5.3 Computing image with reduction kernels

    5.4 Accuracy of summation

    5.5 Option pricing

    Chapter 6. Finite Difference Method


    6.1 Nine-Point 1D finite difference stencil

    6.2 2D Laplace equation

    Chapter 7. Applications of Fast Fourier Transform


    7.1 CUFFT

    7.2 Spectral derivatives

    7.3 Convolution

    7.4 Poisson Solver

    Part III: Appendices

    Appendix A. Tesla Specifications

    Appendix B. System and Environment Management

    B.1 Environment variables

    B.2 nvidia-smi System Management Interface

    Appendix C. Calling CUDA C from CUDA Fortran

    C.1 Calling CUDA C libraries

    C.2 Calling User-Written CUDA C Code

    Appendix D. Source Code

    D.1 Texture memory

    D.2 Matrix transpose

    D.3 Thread- and instruction-level parallelism

    D.4 Multi-GPU programming

    D.5 Finite difference code

    D.6 Spectral Poisson Solver



Product details

  • No. of pages: 338
  • Language: English
  • Copyright: © Morgan Kaufmann 2013
  • Published: September 11, 2013
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780124169722
  • Paperback ISBN: 9780124169708

About the Authors

Gregory Ruetsch

Greg Ruetsch is a Senior Applied Engineer at NVIDIA, where he works on CUDA Fortran and performance optimization of HPC codes. He holds a Bachelor’s degree in mechanical and aerospace engineering from Rutgers University and a Ph.D. in applied mathematics from Brown University. Prior to joining NVIDIA he has held research positions at Stanford University’s Center for Turbulence Research and Sun Microsystems Laboratories.

Affiliations and Expertise

Senior Applied Engineer, NVIDIA

Massimiliano Fatica

Massimiliano Fatica is the manager of the Tesla HPC Group at NVIDIA where he works in the area of GPU computing (high-performance computing and clusters). He holds a laurea in Aeronautical Engineering and a Phd in Theoretical and Applied Mechanics from the University of Rome “La Sapienza”. Prior to joining NVIDIA, he was a research staff member at Stanford University where he worked at the Center for Turbulence Research and Center for Integrated Turbulent Simulations on applications for the Stanford Streaming Supercomputer.

Affiliations and Expertise

Manager Tesla HPC Group, NVIDIA

Ratings and Reviews

Write a review

Latest reviews

(Total rating for all reviews)

  • Hsin-YuKo Mon Jan 07 2019

    well written/planed book

    A very easy to read and useful book.