Multicore and GPU Programming

Multicore and GPU Programming

An Integrated Approach

2nd Edition - February 9, 2022

Write a review

  • Author: Gerassimos Barlas
  • Paperback ISBN: 9780128141205
  • eBook ISBN: 9780128141212

Purchase options

Purchase options
Available for Pre-Order
DRM-free (EPub, PDF)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

Multicore and GPU Programming: An Integrated Approach, Second Edition offers broad coverage of key parallel computing tools, essential for multi-core CPU programming and many-core "massively parallel" computing. Using threads, OpenMP, MPI, CUDA and other state-of-the-art tools, the book teaches the design and development of software capable of taking advantage of modern computing platforms that incorporate CPUs, GPUs and other accelerators. Presenting material refined over more than two decades of teaching parallel computing, author Gerassimos Barlas minimizes the challenge of transitioning from sequential programming to mastering parallel platforms with multiple examples, extensive case studies, and full source code. By using this book, readers will better understand how to develop programs that run over distributed memory machines using MPI, create multi-threaded applications with either libraries or directives, write optimized applications that balance the workload between available computing resources, and profile and debug programs targeting parallel machines.

Key Features

    • Includes comprehensive coverage of all major multi-core and many-core programming tools and platforms, including threads, OpenMP, MPI, CUDA, OpenCL and Thrust.
    • Covers the most recent versions of the above at the time of publication.
    • Demonstrates parallel programming design patterns and examples of how different tools and paradigms can be integrated for superior performance.
    • Updates in the second edition include the use of the C++17 standard for all sample code, a new chapter on concurrent data structures, a new chapter on OpenCL, and the latest research on load balancing.
    • Includes downloadable source code, examples and instructor support materials on the book’s companion website.

     

    Readership

    Graduate students in parallel computing courses covering both traditional and GPU computing (or a two-semester sequence); professionals and researchers looking to master parallel computing

    Table of Contents

    • Cover image
    • Title page
    • Table of Contents
    • Copyright
    • Dedication
    • List of tables
    • Bibliography
    • Preface
    • What is in this book
    • Using this book as a textbook
    • Software and hardware requirements
    • Sample code
    • Part 1: Introduction
    • Chapter 1: Introduction
    • 1.1. The era of multicore machines
    • 1.2. A taxonomy of parallel machines
    • 1.3. A glimpse of influential computing machines
    • 1.4. Performance metrics
    • 1.5. Predicting and measuring parallel program performance
    • Exercises
    • Bibliography
    • Chapter 2: Multicore and parallel program design
    • 2.1. Introduction
    • 2.2. The PCAM methodology
    • 2.3. Decomposition patterns
    • 2.4. Program structure patterns
    • 2.5. Matching decomposition patterns with program structure patterns
    • Exercises
    • Bibliography
    • Part 2: Programming with threads and processes
    • Chapter 3: Threads and concurrency in standard C++
    • 3.1. Introduction
    • 3.2. Threads
    • 3.3. Thread creation and initialization
    • 3.4. Sharing data between threads
    • 3.5. Design concerns
    • 3.6. Semaphores
    • 3.7. Applying semaphores in classical problems
    • 3.8. Atomic data types
    • 3.9. Monitors
    • 3.10. Applying monitors in classical problems
    • 3.11. Asynchronous threads
    • 3.12. Dynamic vs. static thread management
    • 3.13. Threads and fibers
    • 3.14. Debugging multi-threaded applications
    • Exercises
    • Bibliography
    • Chapter 4: Parallel data structures
    • 4.1. Introduction
    • 4.2. Lock-based structures
    • 4.3. Lock-free structures
    • 4.4. Closing remarks
    • Exercises
    • Bibliography
    • Chapter 5: Distributed memory programming
    • 5.1. Introduction
    • 5.2. MPI
    • 5.3. Core concepts
    • 5.4. Your first MPI program
    • 5.5. Program architecture
    • 5.6. Point-to-point communication
    • 5.7. Alternative point-to-point communication modes
    • 5.8. Non-blocking communications
    • 5.9. Point-to-point communications: summary
    • 5.10. Error reporting & handling
    • 5.11. Collective communications
    • 5.12. Persistent communications
    • 5.13. Big-count communications in MPI 4.0
    • 5.14. Partitioned communications
    • 5.15. Communicating objects
    • 5.16. Node management: communicators and groups
    • 5.17. One-sided communication
    • 5.18. I/O considerations
    • 5.19. Combining MPI processes with threads
    • 5.20. Timing and performance measurements
    • 5.21. Debugging, profiling, and tracing MPI programs
    • 5.22. The Boost.MPI library
    • 5.23. A case study: diffusion-limited aggregation
    • 5.24. A case study: brute-force encryption cracking
    • 5.25. A case study: MPI implementation of the master–worker pattern
    • Exercises
    • Bibliography
    • Chapter 6: GPU programming: CUDA
    • 6.1. Introduction
    • 6.2. CUDA's programming model: threads, blocks, and grids
    • 6.3. CUDA's execution model: streaming multiprocessors and warps
    • 6.4. CUDA compilation process
    • 6.5. Putting together a CUDA project
    • 6.6. Memory hierarchy
    • 6.7. Optimization techniques
    • 6.8. Graphs
    • 6.9. Warp functions
    • 6.10. Cooperative groups
    • 6.11. Dynamic parallelism
    • 6.12. Debugging CUDA programs
    • 6.13. Profiling CUDA programs
    • 6.14. CUDA and MPI
    • 6.15. Case studies
    • Exercises
    • Bibliography
    • Chapter 7: GPU and accelerator programming: OpenCL
    • 7.1. The OpenCL architecture
    • 7.2. The platform model
    • 7.3. The execution model
    • 7.4. The programming model
    • 7.5. The memory model
    • 7.6. Shared virtual memory
    • 7.7. Atomics and synchronization
    • 7.8. Work group functions
    • 7.9. Events and profiling OpenCL programs
    • 7.10. OpenCL and other parallel software platforms
    • 7.11. Case study: Mandelbrot set
    • Exercises
    • Part 3: Higher-level parallel programming
    • Chapter 8: Shared-memory programming: OpenMP
    • 8.1. Introduction
    • 8.2. Your first OpenMP program
    • 8.3. Variable scope
    • 8.4. Loop-level parallelism
    • 8.5. Task parallelism
    • 8.6. Synchronization constructs
    • 8.7. Cancellation constructs
    • 8.8. SIMD extensions
    • 8.9. Offloading to devices
    • 8.10. The loop construct
    • 8.11. Thread affinity
    • 8.12. Correctness and optimization issues
    • 8.13. A case study: sorting in OpenMP
    • 8.14. A case study: brute-force encryption cracking, combining MPI and OpenMP
    • Exercises
    • Bibliography
    • Chapter 9: High-level multi-threaded programming with the Qt library
    • 9.1. Introduction
    • 9.2. Implicit thread creation
    • 9.3. Qt's pool of threads
    • 9.4. Higher-level constructs – multi-threaded programming without threads!
    • Exercises
    • Bibliography
    • Chapter 10: The Thrust template library
    • 10.1. Introduction
    • 10.2. First steps in Thrust
    • 10.3. Working with Thrust datatypes
    • 10.4. Thrust algorithms
    • 10.5. Fancy iterators
    • 10.6. Switching device back-ends
    • 10.7. Thrust execution policies and asynchronous execution
    • 10.8. Case studies
    • Exercises
    • Bibliography
    • Part 4: Advanced topics
    • Chapter 11: Load balancing
    • 11.1. Introduction
    • 11.2. Dynamic load balancing: the Linda legacy
    • 11.3. Static load balancing: the divisible load theory approach
    • 11.4. DLTLib: a library for partitioning workloads
    • 11.5. Case studies
    • Exercises
    • Bibliography
    • Appendix A: Creating Qt programs
    • A.1. Using an IDE
    • A.2. The qmake utility
    • Appendix B: Running MPI programs: preparatory and configuration steps
    • B.1. Preparatory steps
    • B.2. Computing nodes discovery for MPI program deployment
    • Appendix C: Time measurement
    • C.1. Introduction
    • C.2. POSIX high-resolution timing
    • C.3. Timing in C++11
    • C.4. Timing in Qt
    • C.5. Timing in OpenMP
    • C.6. Timing in MPI
    • C.7. Timing in CUDA
    • Appendix D: Boost.MPI
    • D.1. Mapping from MPI C to Boost.MPI
    • Appendix E: Setting up CUDA
    • E.1. Installation
    • E.2. Issues with GCC
    • E.3. Combining CUDA with third-party libraries
    • Appendix F: OpenCL helper functions
    • F.1. Function readCLFromFile
    • F.2. Function isError
    • F.3. Function getCompilationError
    • F.4. Function handleError
    • F.5. Function setupDevice
    • F.6. Function setupProgramAndKernel
    • Appendix G: DLTlib
    • G.1. DLTlib functions
    • G.2. DLTlib files
    • Bibliography
    • Glossary
    • Bibliography
    • Bibliography
    • Index

    Product details

    • No. of pages: 1024
    • Language: English
    • Copyright: © Morgan Kaufmann 2022
    • Published: February 9, 2022
    • Imprint: Morgan Kaufmann
    • Paperback ISBN: 9780128141205
    • eBook ISBN: 9780128141212

    About the Author

    Gerassimos Barlas

    Gerassimos Barlas is a Professor with the Computer Science & Engineering Department, American University of Sharjah, Sharjah, UAE. His research interest includes parallel algorithms, development, analysis and modeling frameworks for load balancing, and distributed Video on-Demand. Prof. Barlas has taught parallel computing for more than 12 years, has been involved with parallel computing since the early 90s, and is active in the emerging field of Divisible Load Theory for parallel and distributed systems.

    Affiliations and Expertise

    Professor, Computer Science and Engineering Department, American University of Sharjah, UAE

    Ratings and Reviews

    Write a review

    There are currently no reviews for "Multicore and GPU Programming"