Intel Xeon Phi Coprocessor High-Performance Programming - 1st Edition - ISBN: 9780124104143, 9780124104945

Intel Xeon Phi Coprocessor High-Performance Programming

1st Edition

Authors: James Jeffers James Reinders
Paperback ISBN: 9780124104143
eBook ISBN: 9780124104945
Imprint: Morgan Kaufmann
Published Date: 15th February 2013
Page Count: 432
Tax/VAT will be calculated at check-out
Compatible Not compatible
VitalSource PC, Mac, iPhone & iPad Amazon Kindle eReader
ePub & PDF Apple & PC desktop. Mobile devices (Apple & Android) Amazon Kindle eReader
Mobi Amazon Kindle eReader Anything else

Institutional Access


Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products.

This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture.

Key Features

    • A practical guide to the essentials of the Intel Xeon Phi coprocessor
    • Presents best practices for portable, high-performance computing and a familiar and proven threaded, scalar-vector programming model
    • Includes simple but informative code examples that explain the unique aspects of this new highly parallel and high performance computational product
    • Covers wide vectors, many cores, many threads and high bandwidth cache/memory architecture


    Software engineers,  High Performance and Super Computing developers, scientific researchers in need of high-performance computing resources

    Table of Contents





    Chapter 1. Introduction

    Trend: more parallelism

    Why Intel® Xeon Phi™ coprocessors are needed

    Platforms with coprocessors

    The first Intel® Xeon Phi™ coprocessor

    Keeping the “Ninja Gap” under control

    Transforming-and-tuning double advantage

    When to use an Intel® Xeon Phi™ coprocessor

    Maximizing performance on processors first

    Why scaling past one hundred threads is so important

    Maximizing parallel program performance

    Measuring readiness for highly parallel execution

    What about GPUs?

    Beyond the ease of porting to increased performance

    Transformation for performance

    Hyper-threading versus multithreading

    Coprocessor major usage model: MPI versus offload

    Compiler and programming models

    Cache optimizations

    Examples, then details

    For more information

    Chapter 2. High Performance Closed Track Test Drive!

    Looking under the hood: coprocessor specifications

    Starting the car: communicating with the coprocessor

    Taking it out easy: running our first code

    Starting to accelerate: running more than one thread

    Petal to the metal: hitting full speed using all cores

    Easing in to the first curve: accessing memory bandwidth

    High speed banked curved: maximizing memory bandwidth

    Back to the pit: a summary

    Chapter 3. A Friendly Country Road Race

    Preparing for our country road trip: chapter focus

    Getting a feel for the road: the 9-point stencil algorithm

    At the starting line: the baseline 9-point stencil implementation

    Rough road ahead: running the baseline stencil code

    Cobblestone street ride: vectors but not yet scaling

    Open road all-out race: vectors plus scaling

    Some grease and wrenches!: a bit of tuning


    For more information

    Chapter 4. Driving Around Town: Optimizing A Real-World Code Example

    Choosing the direction: the basic diffusion calculation

    Turn ahead: accounting for boundary effects

    Finding a wide boulevard: scaling the code

    Thunder road: ensuring vectorization

    Peeling out: peeling code from the inner loop

    Trying higher octane fuel: improving speed using data locality and tiling

    High speed driver certificate: summary of our high speed tour

    Chapter 5. Lots of Data (Vectors)

    Why vectorize?

    How to vectorize

    Five approaches to achieving vectorization

    Six step vectorization methodology

    Streaming through caches: data layout, alignment, prefetching, and so on

    Compiler tips

    Compiler options

    Compiler directives

    Use array sections to encourage vectorization

    Look at what the compiler created: assembly code inspection

    Numerical result variations with vectorization


    For more information

    Chapter 6. Lots of Tasks (not Threads)

    OpenMP, Fortran 2008, Intel® TBB, Intel® Cilk™ Plus, Intel® MKL


    Fortran 2008

    Intel® TBB

    Cilk Plus


    For more information

    Chapter 7. Offload

    Two offload models

    Choosing offload vs. native execution

    Language extensions for offload

    Using pragma/directive offload

    Using offload with shared virtual memory

    About asynchronous computation

    About asynchronous data transfer

    Applying the target attribute to multiple declarations

    Performing file I/O on the coprocessor

    Logging stdout and stderr from offloaded code


    For more information

    Chapter 8. Coprocessor Architecture

    The Intel® Xeon Phi™ coprocessor family

    Coprocessor card design

    Intel® Xeon Phi™ coprocessor silicon overview

    Individual coprocessor core architecture

    Instruction and multithread processing

    Cache organization and memory access considerations


    Vector processing unit architecture

    Coprocessor PCIe system interface and DMA

    Coprocessor power management capabilities

    Reliability, availability, and serviceability (RAS)

    Coprocessor system management controller (SMC)



    For more information

    Chapter 9. Coprocessor System Software

    Coprocessor software architecture overview

    Coprocessor programming models and options

    Coprocessor software architecture components

    Intel® manycore platform software stack

    Linux support for Intel® Xeon Phi™ coprocessors

    Tuning memory allocation performance


    For more information

    Chapter 10. Linux on the Coprocessor

    Coprocessor Linux baseline

    Introduction to coprocessor Linux bootstrap and configuration

    Default coprocessor Linux configuration

    Changing coprocessor configuration

    The micctrl utility

    Adding software

    Coprocessor Linux boot process

    Coprocessors in a Linux cluster


    For more information

    Chapter 11. Math Library

    Intel Math Kernel Library overview

    Intel MKL and Intel compiler

    Coprocessor support overview

    Using the coprocessor in native mode

    Using automatic offload mode

    Using compiler-assisted offload

    Precision choices and variations


    For more information

    Chapter 12. MPI

    MPI overview

    Using MPI on Intel® Xeon PhiTM coprocessors

    Prerequisites (batteries not included)

    Offload from an MPI rank

    Using MPI natively on the coprocessor


    For more information

    Chapter 13. Profiling and Timing

    Event monitoring registers on the coprocessor

    Efficiency metrics

    Potential performance issues

    Intel® VTune™ Amplifier XE product

    Performance application programming interface

    MPI analysis: Intel Trace Analyzer and Collector



    For more information

    Chapter 14. Summary


    Additional resources

    Another book coming?

    Feedback appreciated




    No. of pages:
    © Morgan Kaufmann 2013
    Morgan Kaufmann
    eBook ISBN:
    Paperback ISBN:

    About the Author

    James Jeffers

    Jim Jeffers was the primary strategic planner and one of the first full-time employees on the program that became Intel ® MIC. He served as lead SW Engineering Manager on the program and formed and launched the SW development team. As the program evolved, he became the workloads (applications) and SW performance team manager. He has some of the deepest insight into the market, architecture and programming usages of the MIC product line. He has been a developer and development manager for embedded and high performance systems for close to 30 years.

    Affiliations and Expertise

    Principal Engineer and Visualization Lead, Intel Corporation

    James Reinders

    James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the world’s first TeraFLOP supercomputer (ASCI Red), as well as compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James has published numerous articles, contributed to several books and is widely interviewed on parallelism. James has managed software development groups, customer service and consulting teams, business development and marketing teams. James is sought after to keynote on parallel programming, and is the author/co-author of three books currently in print including Structured Parallel Programming, published by Morgan Kaufmann in 2012.

    Affiliations and Expertise

    Director and Programming Model Architect, Intel Corporation


    Intel Recommended Reading List for Developers, 1st Half 2014– Books for Software Developers, Intel


    "Read this book. Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, to create this authoritative first book on the essentials of programming for this new architecture and these new products.", May 5, 2013
    "The authors…are uniquely experienced in software development for this new silicon. As a result, this book is the definitive programming reference for the 60+ core monster from Intel…highly readable and interlaced with lots of code examples.", April 2, 2013
    "This book belongs on the bookshelf of every HPC professional. Not only does it successfully and accessibly teach us how to use and obtain high performance on the Intel MIC architecture, it is about much more than that. It takes us back to the universal fundamentals of high-performance computing including how to think and reason about the performance of algorithms mapped to modern architectures, and it puts into your hands powerful tools that will be useful for years to come."
    Robert J. Harrison, Institute for Advanced Computational Science, Stony Brook University, from the Foreword
    "The book benefits software engineers, scientific researchers, and high performance and supercomputing developers in need of high-performance computing resources…", March 31, 2013
    "The book benefits software engineers, scientific researchers, and high performance and supercomputing developers in need of high-performance computing resources…I got my hands on a preliminary copy of the book back in November at SC12, and I can tell you that Jim and James did a great job.", April 1, 2013