Search:

Product Information All Elsevier Sites   Advanced Product Search
SiteStat.jsp
PROGRAMMING MASSIVELY PARALLEL PROCESSORS
Programming Massively Parallel Processors
A Hands-on Approach
To order this title, and for more information, click here

By
David Kirk, Chief Scientist, NVIDIA
Wen-mei Hwu, Professor, University of Illinois

Description
Multi-core processors are no longer the future of computing-they are the present day reality. A typical mass-produced CPU features multiple processor cores, while a GPU (Graphics Processing Unit) may have hundreds or even thousands of cores. With the rise of multi-core architectures has come the need to teach advanced programmers a new and essential skill: how to program massively parallel processors. Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs.

Audience
Advanced Students, Software engineers, Programmers, Hardware Engineers

Contents
Chapter 1: Introduction GPUs as Parallel Computers Architecture of a Modern GPU Why More Speed or Parallelism? Parallel Programming Languages and Models Overarching Goals Organization of the Book Chapter 2: History of GPU Computing 2.1. Evolution of Graphics Pipelines The Era of Fixed Function Graphics Pipeline Evolution of Programmable Real-Time Graphics Unified Graphics and Computing Processors 2.2. GPGPU: an Intermediate Step Scalable GPUs Recent Developments Future Trends Chapter 3: Introduction to CUDA 3.1. Data Parallelism 3.2. CUDA Program Structure 3.3. A Matrix-Matrix Multiplication Example 3.4. Device Memories and Data Transfer 3.5. Kernel Functions and Threading 3.6. Summary Function Declarations Kernel Launch Predefined Variables Runtime API Chapter 4: CUDA Threads 4.1. CUDA Thread Organization 4.2. More on BlockIdx and ThreadIdx 4.3. Synchronization and Transparent Scalability 4.4. Thread Assignment 4.5. Thread Scheduling and Latency Tolerance 4.6. Summary Chapter 5: CUDA Memories 5.1. Importance of Memory Access Efficiency 5.2. CUDA Device Memory Types 5.3. A Strategy for Reducing Global Memory Traffic 5.4. Memory as a Limiting Factor to Parallelism 5.5. Summary Chapter 6: Performance Considerations 6.1. More on Thread Execution 6.2. Global Memory Bandwidth 6.3. Dynamic Partitioning of SM Resources 6.4. Data Prefetching 6.5. Instruction Mix 6.6. Thread Granularity 6.7. Measured Performance and Summary Chapter 7: Floating-Point Considerations 7.1. Floating-Point Format Normalized representation of M Excess encoding of E 7.2. Representable Numbers 7.3. Special Bit Patterns and Precision 7.4. Arithmetic Accuracy and Rounding 7.5. Algorithm Considerations 7.6. Summary Chapter 8: Application Case Study I - Advanced MRI Reconstruction 8.1. Application Background 8.2. Iterative Reconstruction 8.3. Computing F H d Step 1: Determine the Kernel Parallelism Structure Step 2: Getting Around the Memory Bandwidth Limitation Step 3: Use Hardware Trigonometry Functions Step 4: Experimental Performance Testing 8.4. Final Evaluation Chapter 9: Application Case Study II - Molecular Visualization and Analysis 9.1. Application Background 9.2. A Simple Kernel Implementation 9.3. Instruction Execution Efficiency 9.4. Memory Coalescing 9.5. Additional Performance Comparisons 9.6. Using Multiple GPUs Chapter 10: Parallel Programming and Computational Thinking 10.1. Goals of Parallel Programming 10.2. Problem Decomposition 10.3. Algorithm Selection 10.4. Computational Thinking Chapter 11: A Brief Introduction to OpenCL ? 11.1. Background 11.2. Data Parallelism Model 11.3. Device Architecture 11.4. Kernel Functions 11.5. Device Management and Kernel Launch 11.6. Electrostatic Potential Map in OpenCL 11.7. Summary Chapter 12: Conclusion and Future Outlook 12.1. Goals Revisited 12.2. Memory Architecture Evolution 12.3. Kernel Execution Control Evolution 12.4. Core Performance 12.5. Programming Environment 12.6. A Bright Outlook Appendix A: Matrix Multiplication Example Code Appendix B: Speed and feed of current generation CUDA devices

Bibliographic details
Paperback, 256 pages, publication date: JAN-2010
ISBN-13: 978-0-12-381472-2
Imprint: MORGAN KAUFFMAN

Price and Ordering
Price:
USD 69.95
EUR 50.95
GBP 42.99
order now
Books and book related electronic products are priced in US dollars (USD), euro (EUR), and Great Britain Pounds (GBP). USD prices apply to the Americas and Asia Pacific. EUR prices apply in Europe and the Middle East. GBP prices apply to the UK and all other countries.
See also information about conditions of sale & ordering procedures, and links to our regional sales offices.

077/745
Last update: 20 Nov 2009
Book contents
Table of contents
Reviews
Submit your review
Bookmark this page
Recommend this publication
Overview of all books
Printer-friendly version   Printer-friendly version