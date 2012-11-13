Heterogeneous Computing with OpenCL
2nd Edition
Revised OpenCL 1.2 Edition
Description
Heterogeneous Computing with OpenCL, Second Edition teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. It is the first textbook that presents OpenCL programming appropriate for the classroom and is intended to support a parallel programming course. Students will come away from this text with hands-on experience and significant knowledge of the syntax and use of OpenCL to address a range of fundamental parallel algorithms.
Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, Heterogeneous Computing with OpenCL explores memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. It includes detailed examples throughout, plus additional online exercises and other supporting materials that can be downloaded at http://www.heterogeneouscompute.org/?page_id=7
This book will appeal to software engineers, programmers, hardware engineers, and students/advanced students.
Key Features
- Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.
- Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.
- Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures
- Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms
Readership
Software engineers, programmers, hardware engineers, students / advanced students
Table of Contents
Foreword to the Revised OpenCL 1.2 Edition
Foreword to the First Edition
Preface
Our Heterogeneous World
OpenCL
This Text
Acknowledgments
About the Authors
Chapter 1. Introduction to Parallel Programming
Introduction
OpenCL
The Goals of This Book
Thinking Parallel
Concurrency and Parallel Programming Models
Structure
Reference
Further Reading and Relevant Websites
Chapter 2. Introduction to OpenCL
Introduction
Platform and Devices
The Execution Environment
Memory Model
Writing Kernels
Full Source Code Example for Vector Addition
Vector Addition with C++ Wrapper
Summary
Reference
Chapter 3. OpenCL Device Architectures
Introduction
Hardware trade-offs
The architectural design space
Summary
References
Chapter 4. Basic OpenCL Examples
Introduction
Example Applications
Compiling OpenCL Host Applications
Summary
Chapter 5. Understanding OpenCL’s Concurrency and Execution Model
Introduction
Kernels, Work-Items, Workgroups, and the Execution Domain
OpenCL Synchronization: Kernels, Fences, and Barriers
Queuing and Global Synchronization
The Host-Side Memory Model
The Device-Side Memory Model
Summary
Chapter 6. Dissecting a CPU/GPU OpenCL Implementation
Introduction
OpenCL on an AMD Bulldozer CPU
OpenCL on the AMD Radeon HD7970 GPU
Memory Performance Considerations in OpenCL
Summary
References
Chapter 7. Data Management
Memory management
Data transfer in a discrete environment
Data placement in a shared-memory environment
Example application—work group reduction
References
Chapter 8. OpenCL Case Study: Convolution
Introduction
Convolution Kernel
Conclusions
Code Listings
Reference
Chapter 9. OpenCL Case Study: Histogram
Introduction
Choosing the Number of Workgroups
Choosing the Optimal Workgroup Size
Optimizing Global Memory Data Access Patterns
Using Atomics to Perform Local Histogram
Optimizing Local Memory Access
Local Histogram Reduction
The Global Reduction
Full Kernel Code
Performance and Summary
Chapter 10. OpenCL Case Study: Mixed Particle Simulation
Introduction
Overview of the Computation
GPU Implementation
CPU Implementation
Load Balancing
Performance and Summary
Kernel for Uniform Grid Creation
Kernels for Simulation
Chapter 11. OpenCL Extensions
Introduction
Overview of Extension Mechanism
Device Fission
Double Precision
References
Chapter 12. Foreign Lands: Plugging OpenCL In
Introduction
Beyond C and C++
Haskell OpenCL
Summary
References
Chapter 13. OpenCL Profiling and Debugging
Introduction
Profiling with events
AMD Accelerated Parallel Processing Profiler
AMD Accelerated Parallel Processing KernelAnalyzer
Walking through the AMD APP Profiler
Debugging OpenCL Applications
Overview of gDEBugger
AMD Printf Extension
Conclusion
Chapter 14. Performance Optimization of an Image Analysis Application
Introduction
Description of the algorithm
Migrating multithreaded CPU implementation to OpenCL
Performance optimization
Power and performance analysis
Conclusion
References
Index
Details
- No. of pages:
- 308
- Language:
- English
- Copyright:
- © Morgan Kaufmann 2013
- Published:
- 13th November 2012
- Imprint:
- Morgan Kaufmann
- Paperback ISBN:
- 9780124058941
- eBook ISBN:
- 9780124055209
About the Author
Benedict Gaster
Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL's design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D in computer science for his work on type systems for extensible records and variants.
Affiliations and Expertise
OpenCL Architect, AMD
Lee Howes
Lee Howes has spent the last two years working at AMD and currently focuses on programming models for the future of heterogeneous computing. Lee's interests lie in declaratively representing mappings of iteration domains to data and in communicating complicated architectural concepts and optimizations succinctly to a developer audience, both through programming model improvements and education. Lee has a Ph.D. in computer science from Imperial College London for work in this area.
Affiliations and Expertise
Member of Technical Staff, AMD
David Kaeli
David Kaeli received a BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is the Associate Dean of Undergraduate Programs in the College of Engineering and a Full Processor on the ECE faculty at Northeastern University, Boston, MA where he directs the Northeastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, Kaeli spent 12 years at IBM, the last 7 at T.J. Watson Research Center, Yorktown Heights, NY.
Dr. Kaeli has co-authored more than 200 critically reviewed publications. His research spans a range of areas including microarchitecture to back-end compilers and software engineering. He leads a number of research projects in the area of GPU Computing. He presently serves as the Chair of the IEEE Technical Committee on Computer Architecture. Dr. Kaeli is an IEEE Fellow and a member of the ACM.
Affiliations and Expertise
Northeastern University, Boston, MA, USA
Perhaad Mistry
Perhaad Mistry works in AMD’s developer tools group at the Boston Design Center focusing on developing debugging and performance profiling tools for heterogeneous architectures. He is presently focused on debugger architectures for upcoming platforms shared memory and discrete Graphics Processing Unit (GPU) platforms. Perhaad has been working on GPU architectures and parallel programming since CUDA 0.8 in 2007. He has enjoyed implementing medical imaging algorithms for GPGPU platforms and architecture aware data structures for surgical simulators. Perhaad's present work focuses on the design of debuggers and architectural support for performance analysis for the next generation of applications that will target GPU platforms.
Perhaad graduated after 7 years with a PhD from Northeastern University in Electrical and Computer Engineering and was advised by Dr. David Kaeli who the leads Northeastern University Computer Architecture Research Laboratory (NUCAR). Even after graduating, Perhaad is still a member of NUCAR and is advising on research projects on performance analysis of parallel architectures. He received a BS in Electronics Engineering from University of Mumbai and an MS in Computer Engineering from Northeastern University in Boston. He is presently based in Boston.
Affiliations and Expertise
Northeastern University, Boston, MA, USA
Dana Schaa
Dana Schaa received a BS in Computer Engineering from Cal Poly, San Luis Obispo, and an MS and PhD in Electrical and Computer Engineering from Northeastern University. He works on GPU architecture modeling at AMD, and has interests and expertise that include memory systems, microarchitecture, performance analysis, and general purpose computing on GPUs. His background includes the development OpenCL-based medical imaging applications ranging from real-time visualization of 3D ultrasound to CT image reconstruction in heterogeneous environments. Dana married his wonderful wife Jenny in 2010, and they live together in San Jose with their charming cats.
Affiliations and Expertise
Northeastern University, Boston, MA, USA
Reviews
"With parallel computing now in the mainstream, this book provides an excellent reference on the state-of-the-art techniques in accelerating applications on CPU-GPU systems." --David A. Bader, Georgia Institute of Technology
"Intended for software architects and engineers, this guide to OpenCL examines potential uses and practical application of the cross platform programming language for heterogeneous computing. The work explores the use of OpenCL to design and produce scalable applications that have the ability to be optimized for processor core and GPU usage. Chapters cover an overview of OpenCL, basic examples, CPU/GPU implementation and extensions. Illustrations and sample code, as well as sections outlining case studies for the use of OpenCL in several common situations, are provided." --SciTech Book News
"I always enjoy reviewing later editions of a book…this book does not disappoint. It is definitely worth the time spent reading it." --ComputingReviews.com, 2013