Heterogeneous Computing with OpenCL 2.0

1st Edition - May 18, 2015
Authors: David R. Kaeli, Perhaad Mistry, Dana Schaa, Dong Ping Zhang
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 8 0 1 4 1 4 - 1
eBook ISBN:
9 7 8 - 0 - 1 2 - 8 0 1 6 4 9 - 7

Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs,… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Companion materials(opens in new tab/window)Textbook support for instructors(opens in new tab/window)

Heterogeneous Computing with OpenCL 2.0
teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs). This fully-revised edition includes the latest enhancements in OpenCL 2.0 including:

• Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources
• Dynamic parallelism which reduces processor load and avoids bottlenecks
• Improved imaging support and integration with OpenGL

Designed to work on multiple platforms, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book explores memory spaces, optimization techniques, extensions, debugging and profiling. Multiple case studies and examples illustrate high-performance algorithms, distributing work across heterogeneous systems, embedded domain-specific languages, and will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms.

List of Figures
List of Tables
Foreword
Acknowledgments
Chapter 1: Introduction
- Abstract
- 1.1 Introduction to Heterogeneous Computing
- 1.2 The Goals of This Book
- 1.3 Thinking Parallel
- 1.4 Concurrency and Parallel Programming Models
- 1.5 Threads and Shared Memory
- 1.6 Message-Passing Communication
- 1.7 Different Grains of Parallelism
- 1.8 Heterogeneous Computing with OpenCL
- 1.9 Book Structure
Chapter 2: Device architectures
- Abstract
- 2.1 Introduction
- 2.2 Hardware Trade-offs
- 2.3 The Architectural Design Space
- 2.4 Summary
Chapter 3: Introduction to OpenCL
- Abstract
- 3.1 Introduction
- 3.2 The OpenCL Platform Model
- 3.3 The OpenCL Execution Model
- 3.4 Kernels and the OpenCL Programming Model
- 3.5 OpenCL Memory Model
- 3.6 The OpenCL Runtime with an Example
- 3.7 Vector Addition Using an OpenCL C++ Wrapper
- 3.8 OpenCL for CUDA Programmers
- 3.9 Summary
Chapter 4: Examples
- Abstract
- 4.1 OpenCL Examples
- 4.2 Histogram
- 4.3 Image Rotation
- 4.4 Image Convolution
- 4.5 Producer-Consumer
- 4.6 Utility Functions
- 4.7 Summary
Chapter 5: OpenCL runtime and concurrency model
- Abstract
- 5.1 Commands and the Queuing Model
- 5.2 Multiple Command-Queues
- 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges
- 5.4 Native and Built-In Kernels
- 5.5 Device-Side Queuing
- 5.6 Summary
Chapter 6: OpenCL host-side memory model
- Abstract
- 6.1 Memory Objects
- 6.2 Memory Management
- 6.3 Shared Virtual Memory
- 6.4 Summary
Chapter 7: OpenCL device-side memory model
- Abstract
- 7.1 Synchronization and Communication
- 7.2 Global Memory
- 7.3 Constant Memory
- 7.4 Local Memory
- 7.5 Private Memory
- 7.6 Generic Address Space
- 7.7 Memory Ordering
- 7.8 Summary
Chapter 8: Dissecting OpenCL on a heterogeneous system
- Abstract
- 8.1 OpenCL on an AMD FX-8350 CPU
- 8.2 OpenCL on the AMD Radeon R9 290X GPU
- 8.3 Memory Performance Considerations in OpenCL
- 8.4 Summary
Chapter 9: Case study: Image clustering
- Abstract
- 9.1 Introduction
- 9.2 The Feature Histogram on the CPU
- 9.3 OpenCL Implementation
- 9.4 Performance Analysis
- 9.5 Conclusion
Chapter 10: OpenCL profiling and debugging
- Abstract
- 10.1 Introduction
- 10.2 Profiling OpenCL Code Using Events
- 10.3 AMD CodeXL
- 10.4 Profiling Using CodeXL
- 10.5 Analyzing Kernels Using CodeXL
- 10.6 Debugging OpenCL Kernels Using CodeXL
- 10.7 Debugging Using printf
- 10.8 Summary
Chapter 11: Mapping high-level programming languages to OpenCL 2.0: A compiler writer’s perspective
- Abstract
- 11.1 Introduction
- 11.2 A Brief Introduction to C++ AMP
- 11.3 OpenCL 2.0 as a Compiler Target
- 11.4 Mapping Key C++ AMP Constructs to OpenCL
- 11.5 C++ AMP Compilation Flow
- 11.6 Compiled C++ AMP Code
- 11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in
- 11.8 Compiler Support for Tiling in C++AMP
- 11.9 Address Space Deduction
- 11.10 Data Movement Optimization
- 11.11 Binomial Options: A Full Example
- 11.12 Preliminary Results
- 11.13 Conclusion
Chapter 12: WebCL: Enabling OpenCL acceleration of Web applications
- Abstract
- 12.1 Introduction
- 12.2 Programming with WebCL
- 12.3 Synchronization
- 12.4 Interoperability with WebGL
- 12.5 Example Application
- 12.6 Security Enhancement
- 12.7 WebCL on the Server
- 12.8 Status and Future of WebCL
- Works Cited
Chapter 13: Foreign lands: Plugging OpenCL in
- Abstract
- 13.1 Introduction
- 13.2 Beyond C and C+ +
- 13.3 Haskell OpenCL
- 13.4 Summary
Index

David R. Kaeli

David Kaeli received a BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is the Associate Dean of Undergraduate Programs in the College of Engineering and a Full Processor on the ECE faculty at Northeastern University, Boston, MA where he directs the Northeastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, Kaeli spent 12 years at IBM, the last 7 at T.J. Watson Research Center, Yorktown Heights, NY.

Dr. Kaeli has co-authored more than 200 critically reviewed publications. His research spans a range of areas including microarchitecture to back-end compilers and software engineering. He leads a number of research projects in the area of GPU Computing. He presently serves as the Chair of the IEEE Technical Committee on Computer Architecture. Dr. Kaeli is an IEEE Fellow and a member of the ACM.

Affiliations and expertise

Northeastern University, Boston, MA, USA

Perhaad Mistry

Perhaad Mistry works in AMD’s developer tools group at the Boston Design Center focusing on developing debugging and performance profiling tools for heterogeneous architectures. He is presently focused on debugger architectures for upcoming platforms shared memory and discrete Graphics Processing Unit (GPU) platforms. Perhaad has been working on GPU architectures and parallel programming since CUDA 0.8 in 2007. He has enjoyed implementing medical imaging algorithms for GPGPU platforms and architecture aware data structures for surgical simulators. Perhaad's present work focuses on the design of debuggers and architectural support for performance analysis for the next generation of applications that will target GPU platforms.

Perhaad graduated after 7 years with a PhD from Northeastern University in Electrical and Computer Engineering and was advised by Dr. David Kaeli who the leads Northeastern University Computer Architecture Research Laboratory (NUCAR). Even after graduating, Perhaad is still a member of NUCAR and is advising on research projects on performance analysis of parallel architectures. He received a BS in Electronics Engineering from University of Mumbai and an MS in Computer Engineering from Northeastern University in Boston. He is presently based in Boston.

Affiliations and expertise

Northeastern University, Boston, MA, USA

Dana Schaa

Dana Schaa received a BS in Computer Engineering from Cal Poly, San Luis Obispo, and an MS and PhD in Electrical and Computer Engineering from Northeastern University. He works on GPU architecture modeling at AMD, and has interests and expertise that include memory systems, microarchitecture, performance analysis, and general purpose computing on GPUs. His background includes the development OpenCL-based medical imaging applications ranging from real-time visualization of 3D ultrasound to CT image reconstruction in heterogeneous environments. Dana married his wonderful wife Jenny in 2010, and they live together in San Jose with their charming cats.

Affiliations and expertise

Northeastern University, Boston, MA, USA

Dong Ping Zhang

Affiliations and expertise

AMD, Sunnyvale, California, USA