Parallel Programming with MPI


  • Peter Pacheco, University of San Francisco, USA

A hands-on introduction to parallel programming based on the Message-Passing Interface (MPI) standard, the de-facto industry standard adopted by major vendors of commercial parallel systems. This textbook/tutorial, based on the C language, contains many fully-developed examples and exercises. The complete source code for the examples is available in both C and Fortran 77. Students and professionals will find that the portability of MPI, combined with a thorough grounding in parallel programming principles, will allow them to program any parallel system, from a network of workstations to a parallel supercomputer.
View full description


Book information

  • Published: October 1996
  • ISBN: 978-1-55860-339-4


"...the detailed discussion of many complex and confusing issues makes the book an important information source for programmers developing large applications using MPI." -—L.M. Liebrock, ACM Computing Reviews

Table of Contents

ForewordPrefaceChapter 1 Introduction1.1 The Need for More Computational Power1.2 The Need for Parallel Computing1.3 The Bad News1.4 MPI1.5 The Rest of the Book1.6 Typographic ConventionsChapter 2 An Overview of Parallel Computing2.1 Hardware2.1.1 Flynn's Taxonomy2.1.2 The Classical von Neumann Machine2.1.3 Pipeline and Vector Architectures2.1.4 SIMD Systems2.1.5 General MIMD Systems2.1.6 Shared-Memory MIMD2.1.7 Distributed-Memory MIMD2.1.8 Communication and Routing2.2 Software Issues2.2.1 Shared-Memory Programming2.2.2 Message Passing2.2.3 Data-Parallel Languages2.2.4 RPC and Active Messages2.2.5 Data Mapping2.3 Summary2.4 References2.5 ExercisesChapter 3 Greetings!3.1 The Program3.2 Execution3.3 MPI3.3.1 General MPI Programs3.3.2 Finding Out about the Rest of the World3.3.3 Message: Data + Envelope3.3.4 Sending Messages3.4 Summary3.5 References3.6 Exercises3.7 Programming AssignmentChapter 4 An Application: Numerical Integration4.1 The Trapezoidal Rule4.2 Parallelizing the Trapezoidal Rule4.3 I/O on Parallel Systems4.4 Summary4.5 References4.6 Exercises4.7 Programming AssignmentsChapter 5 Collective Communication5.1 Tree-Structured Communication5.2 Broadcast5.3 Tags, Safety, Buffering, and Synchronization5.4 Reduce5.5 Dot Product5.6 Allreduce5.7 Gather and Scatter5.8 Summary5.10 References5.11 Exercises5.12 Programming AssignmentsChapter 6 Grouping Data for Communication6.1 The Count Parameter 6.2 Derived Types and MPI_Type_struct6.3 Other Derived Datatype Constructors6.4 Type Matching6.5 Pack/Unpack6.6 Deciding Which Method to Use6.7 Summary6.8 References6.9 Exercises6.10 Programming AssignmentsChapter 7 Communicators and Topologies7.1 Matrix Multiplication7.2 Fox's Algorithm7.3 Communicators7.4 Working with Groups, Contexts, and Communicators7.5 MPI_Comm_split7.6 Topologies7.7 MPI_Cart_sub7.8 Implementation of Fox's Algorithm7.9 Summary7.10 References7.11 Exercises7.12 Programming AssignmentsChapter 8 Dealing with I/O8.1 Dealing with stdin, stdout, and stderr8.1.1 Attribute caching8.1.2 Callback Functions8.1.3 Identifying the I/O process rank8.1.4 Caching an I/O Process Rank8.1.5 Retrieving the I/O Process Rank8.1.6 Reading from stdin8.1.7 Writing to stdout8.1.8 Writing to stderr and Error Checking8.2 Limited Access to stdin8.3 File I/O8.4 Array I/O8.4.1 Data Distributions8.4.2 Model Problem8.4.3 Distribution of the Input8.4.4 Derived Datatypes8.4.5 The Extent of a Derived Datatype8.4.6 The Input Code8.4.7 Printing the Array8.4.8 An Example8.5 Summary8.6 References8.7 Exercises8.8 Programming AssignmentsChapter 9 Debugging Your Program9.1 Quick Review of Serial Debugging9.1.1 Examine the Source Code9.1.2 Add Debugging Output9.1.3 Use a Debugger9.2 More on Serial debugging9.3 Parallel Debugging9.4 Nondeterminism9.5 An Example9.5.1 The Program?9.5.2 Debugging The Program9.5.3 A Brief Discussion of Parallel Debuggers9.5.4 The Old Standby: printf/fflush9.5.5 The Classical Bugs in Parallel Programs9.5.6 First Fix9.5.7 many parallel Programming Bugs are Really Serial Programming Bugs9.5.8 Different Systems, Different Errors9.5.9 Moving to Multiple Processes9.5.10 Confusion about I/O9.5.11 Finishing Up9.6 Error Handling in MPI9.7 Summary9.8 References9.9 Exercises9.10 Programming AssignmentsChapter 10 Design and Coding of Parallel Programs10.1 Data-Parallel Programs10.2 Jacobi's Method10.3 Parallel Jacobi's Method10.4 Coding Parallel Programs10.5 An Example: Sorting10.5.1 Main Program10.5.2 The "Input" Functions10.5.3 All-to-all Scatter/Gather10.5.4 Redistributing the Keys10.5.5 Pause to Clean Up10.5.6 Find_alltoall_send_params10.5.7 Finishing Up10.8 Summary10.7 References10.8 Exercises10.9 Programming AssignmentsChapter 11 Performance11.1 Serial Program Performance11.2 An Example: The Serial Trapezoidal Rule11.3 What about the I/O?11.4 Parallel Program Performance Analysis11.5 The Cost of Communication11.6 An Example: The Parallel Trapezoidal Rule11.7 Taking Timings11.8 Summary11.9 References11.10 Exercises11.11 Programming AssignmentsChapter 12 More on Performance12.1 Amdahl's Law12.2 Work and Overhead12.3 Sources of Overhead12.4 Scalability12.5 Potential Problems in Estimating Performance12.5.1 Networks of Workstations and Resource Contention12.5.2 Load Balancing and Idle Time12.5.3 Overlapping Communication and Computation12.5.4 Collective Communication12.6 Performance Evaluation Tools12.6.1 MPI's Profiling Interface12.6.2 Upshot12.7 Summary12.8 References12.9 Exercises12.10 Programming AssignmentsChapter 13 Advanced Point-to-Point Communication13.1 An Example: Coding Allgather13.1.1 Function Parameters13.1.2 Ring Pass Allgather13.2 Hypercubes13.2.1 Additional Issues in the Hypercube Exchange13.2.2 Details of the Hypercube Algorithm13.3 Send-receive13.4 Null Processes13.5 Nonblocking Communication13.5.1 Ring Allgather with Nonblocking Communication13.5.2 Hypercube Allgather with Nonblocking Communication13.6 Persistent Communication Requests13.7 Communication Modes13.7.1 Synchronous Mode13.7.2 Ready Mode13.7.3 Buffered Mode13.8 The Last Word on Point-to-Point Communication13.9 Summary13.10 References13.11 Exercises13.12 Programming AssignmentsChapter 14 Parallel Algorithms14.1 Designing a Parallel Algorithm14.2 Sorting14.3 Serial Bitonic Sort14.4 Parallel Bitonic Sort14.5 Tree Searches and Combinatorial Optimization14.6 Serial Tree Search14.7 Parallel Tree Search14.7.1 Par_dfs14.7.2 Service_requests14.7.3 Work_remains14.7.4 Distributed Termination Detection14.8 Summary14.9 References14.10 Exercises14.11 Programming AssignmentsChapter 15 Parallel Libraries15.1 Using Libraries: Pro and Con15.2 Using More than One Language15.3 ScaLAPACK15.4 An Example of a ScaLAPACK Program15.5 PETSc15.6 A PETSc Example15.7 Summary15.8 References15.9 Exercises15.10 Programming AssignmentsChapter 16 Wrapping Up16.1 Where to Go from Here16.2 The Future of MPIAppendix A Summary of MPI CommandsA.1 Point-to-Point Communication FunctionsA.1.1 Blocking Sends and ReceivesA.1.2 Communication ModesA.1.3 Buffer AllocationA.1.4 Nonblocking CommunicationA.1.5 Probe and CancelA.1.6 Persistent Communication RequestsA.1.7 Send-receiveA.2 Derived Datatypes and MPI_Pack/UnpackA.2.1 Derived DatatypesA.2.2 MPI_Pack and MPI_UnpackA.3 Collective Communication FunctionsA.3.1 Barrier and BroadcastA.3.2 Gather ScatterA.3.3 Reduction OperationsA.4 Groups, Contexts, and CommunicatorsA.4.1 Group ManagementA.4.2 Communicator ManagementA.4.3 Inter-communicatorsA.4.4 Attribute CachingA.5 Process TopologiesA.5.1 General Topology FunctionsA.5.2 Cartesian Topology ManagementA.5.3 Graph Topology ManagementA.6 Environmental ManagementA.6.1 Implementation InformationA.6.2 Error HandlingA.6.3 TimersA.6.4 StartupA.7 ProfilingA.8 ConstantsA.9 Type DefinitionsAppendix B MPI on the InternetB.1 Implementations of MPIB.2 The MPI FAQB.3 MPI Web PagesB.4 MPI NewsgroupB.5 MPI-2 and MPI-IOB.6 Parallel Programming with MPIBibliographyIndex