Intel Xeon Phi Coprocessor High Performance Programming

Intel Xeon Phi Coprocessor High Performance Programming

1st Edition - February 11, 2013

Write a review

  • Authors: James Jeffers, James Reinders
  • eBook ISBN: 9780124104945

Purchase options

Purchase options
DRM-free (EPub, Mobi, PDF)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products. This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture.

Key Features

    • A practical guide to the essentials of the Intel Xeon Phi coprocessor
    • Presents best practices for portable, high-performance computing and a familiar and proven threaded, scalar-vector programming model
    • Includes simple but informative code examples that explain the unique aspects of this new highly parallel and high performance computational product
    • Covers wide vectors, many cores, many threads and high bandwidth cache/memory architecture


    Software engineers,  High Performance and Super Computing developers, scientific researchers in need of high-performance computing resources

    Table of Contents

    • Foreword




      Chapter 1. Introduction

      Trend: more parallelism

      Why Intel® Xeon Phi™ coprocessors are needed

      Platforms with coprocessors

      The first Intel® Xeon Phi™ coprocessor

      Keeping the “Ninja Gap” under control

      Transforming-and-tuning double advantage

      When to use an Intel® Xeon Phi™ coprocessor

      Maximizing performance on processors first

      Why scaling past one hundred threads is so important

      Maximizing parallel program performance

      Measuring readiness for highly parallel execution

      What about GPUs?

      Beyond the ease of porting to increased performance

      Transformation for performance

      Hyper-threading versus multithreading

      Coprocessor major usage model: MPI versus offload

      Compiler and programming models

      Cache optimizations

      Examples, then details

      For more information

      Chapter 2. High Performance Closed Track Test Drive!

      Looking under the hood: coprocessor specifications

      Starting the car: communicating with the coprocessor

      Taking it out easy: running our first code

      Starting to accelerate: running more than one thread

      Petal to the metal: hitting full speed using all cores

      Easing in to the first curve: accessing memory bandwidth

      High speed banked curved: maximizing memory bandwidth

      Back to the pit: a summary

      Chapter 3. A Friendly Country Road Race

      Preparing for our country road trip: chapter focus

      Getting a feel for the road: the 9-point stencil algorithm

      At the starting line: the baseline 9-point stencil implementation

      Rough road ahead: running the baseline stencil code

      Cobblestone street ride: vectors but not yet scaling

      Open road all-out race: vectors plus scaling

      Some grease and wrenches!: a bit of tuning


      For more information

      Chapter 4. Driving Around Town: Optimizing A Real-World Code Example

      Choosing the direction: the basic diffusion calculation

      Turn ahead: accounting for boundary effects

      Finding a wide boulevard: scaling the code

      Thunder road: ensuring vectorization

      Peeling out: peeling code from the inner loop

      Trying higher octane fuel: improving speed using data locality and tiling

      High speed driver certificate: summary of our high speed tour

      Chapter 5. Lots of Data (Vectors)

      Why vectorize?

      How to vectorize

      Five approaches to achieving vectorization

      Six step vectorization methodology

      Streaming through caches: data layout, alignment, prefetching, and so on

      Compiler tips

      Compiler options

      Compiler directives

      Use array sections to encourage vectorization

      Look at what the compiler created: assembly code inspection

      Numerical result variations with vectorization


      For more information

      Chapter 6. Lots of Tasks (not Threads)

      OpenMP, Fortran 2008, Intel® TBB, Intel® Cilk™ Plus, Intel® MKL


      Fortran 2008

      Intel® TBB

      Cilk Plus


      For more information

      Chapter 7. Offload

      Two offload models

      Choosing offload vs. native execution

      Language extensions for offload

      Using pragma/directive offload

      Using offload with shared virtual memory

      About asynchronous computation

      About asynchronous data transfer

      Applying the target attribute to multiple declarations

      Performing file I/O on the coprocessor

      Logging stdout and stderr from offloaded code


      For more information

      Chapter 8. Coprocessor Architecture

      The Intel® Xeon Phi™ coprocessor family

      Coprocessor card design

      Intel® Xeon Phi™ coprocessor silicon overview

      Individual coprocessor core architecture

      Instruction and multithread processing

      Cache organization and memory access considerations


      Vector processing unit architecture

      Coprocessor PCIe system interface and DMA

      Coprocessor power management capabilities

      Reliability, availability, and serviceability (RAS)

      Coprocessor system management controller (SMC)



      For more information

      Chapter 9. Coprocessor System Software

      Coprocessor software architecture overview

      Coprocessor programming models and options

      Coprocessor software architecture components

      Intel® manycore platform software stack

      Linux support for Intel® Xeon Phi™ coprocessors

      Tuning memory allocation performance


      For more information

      Chapter 10. Linux on the Coprocessor

      Coprocessor Linux baseline

      Introduction to coprocessor Linux bootstrap and configuration

      Default coprocessor Linux configuration

      Changing coprocessor configuration

      The micctrl utility

      Adding software

      Coprocessor Linux boot process

      Coprocessors in a Linux cluster


      For more information

      Chapter 11. Math Library

      Intel Math Kernel Library overview

      Intel MKL and Intel compiler

      Coprocessor support overview

      Using the coprocessor in native mode

      Using automatic offload mode

      Using compiler-assisted offload

      Precision choices and variations


      For more information

      Chapter 12. MPI

      MPI overview

      Using MPI on Intel® Xeon PhiTM coprocessors

      Prerequisites (batteries not included)

      Offload from an MPI rank

      Using MPI natively on the coprocessor


      For more information

      Chapter 13. Profiling and Timing

      Event monitoring registers on the coprocessor

      Efficiency metrics

      Potential performance issues

      Intel® VTune™ Amplifier XE product

      Performance application programming interface

      MPI analysis: Intel Trace Analyzer and Collector



      For more information

      Chapter 14. Summary


      Additional resources

      Another book coming?

      Feedback appreciated



    Product details

    • No. of pages: 432
    • Language: English
    • Copyright: © Morgan Kaufmann 2013
    • Published: February 11, 2013
    • Imprint: Morgan Kaufmann
    • eBook ISBN: 9780124104945

    About the Authors

    James Jeffers

    James Jeffers
    Jim Jeffers was the primary strategic planner and one of the first full-time employees on the program that became Intel ® MIC. He served as lead SW Engineering Manager on the program and formed and launched the SW development team. As the program evolved, he became the workloads (applications) and SW performance team manager. He has some of the deepest insight into the market, architecture and programming usages of the MIC product line. He has been a developer and development manager for embedded and high performance systems for close to 30 years.

    Affiliations and Expertise

    Principal Engineer and Visualization Lead, Intel Corporation

    James Reinders

    James Reinders
    James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the world’s first TeraFLOP supercomputer (ASCI Red), as well as compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James has published numerous articles, contributed to several books and is widely interviewed on parallelism. James has managed software development groups, customer service and consulting teams, business development and marketing teams. James is sought after to keynote on parallel programming, and is the author/co-author of three books currently in print including Structured Parallel Programming, published by Morgan Kaufmann in 2012.

    Affiliations and Expertise

    Director and Programming Model Architect, Intel Corporation

    Ratings and Reviews

    Write a review

    There are currently no reviews for "Intel Xeon Phi Coprocessor High Performance Programming"