Parallel Computing Works!

1st Edition - May 1, 1994
Authors: Geoffrey C. Fox, Roy D. Williams, Guiseppe C. Messina
Language: English
eBook ISBN:
9 7 8 - 0 - 0 8 - 0 5 1 3 5 1 - 5

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, bi… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture.

The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop algorithms for frequently used mathematicalcomputations. They also devise performance models, measure the performancecharacteristics of several computers, and create a high-performancecomputing facility based exclusively on parallel computers. By addressingall issues involved in scientific problem solving, Parallel ComputingWorks! provides valuable insight into computational science for large-scaleparallel architectures. For those in the sciences, the findings reveal theusefulness of an important experimental tool. Anyone in supercomputing andrelated computational fields will gain a new perspective on the potentialcontributions of parallelism. Includes over 30 full-color illustrations.

Parallel Computing Works!

by Geoffrey C. Fox, Roy D. Williams, Paul C. Messina

Preface

1 Introduction

1.1 Introduction

1.2 The National Vision for Parallel Computation

1.3 Caltech Concurrent Computation Program

1.4 How Parallel Computing Works

2 Technical Backdrop

2.2 Hardware

2.2.1 Parallel Scientific Computers Before 1980

2.2.2 Early 1980s

2.2.3 Birth of the Hypercube

2.2.4 Mid-1980s

2.2.5 Late 1980s

2.2.6 Parallel Systems-1992

2.3 Software

2.3.1 Languages and Compilers

2.3.2 Tools

2.4 Summary

3. A Methodology for Computation

3.1 Introduction

3.2 The Process

3.3 Complex Systems and Space-Time Structure

3.4 The Temporal Properties of Complex Systems

3.5 Spatial Properties of Complex Systems

3.6 Compound Complex Systems

3.7 Mapping Complex Systems

3.8 Parallel Computing Works?

4 Synchronous Applications I

4.1 QCD and the Beginning of C3P

4.2 Synchronous Applications

4.3 Quantum Chromodynamics

4.3.1 Introduction

4.3.2 Monte Carlo

4.3.3 QCD

4.3.4 Lattice QCD

4.3.5 Concurrent QCD Machines

4.3.6 QCD on the Caltech Hypercubes

4.3.7 QCD on the Connection Machine

4.3.8 Status and Prospects

4.4 Spin Models

4.4.1 Introduction

4.4.2 Ising Model

4.4.3 Potts Model

4.4.4 XY Model

4.4.5 O(3) Model

4.5 An Automata Model of Granular Materials

4.5.1 Introduction

4.5.2 Comparison to Particle Dynamics Models

4.5.3 Comparison to Lattice Gas Models

4.5.4 The Rules for the Lattice Grain Model

4.5.5 O(3) Implementation on a Parallel Computer

4.5.6 Simulations

4.5.7 Conclusion

5 Express and CrOS

5.1 Multicomputer Operating Systems

5.2 A "Packet" History of Message-passing Systems

5.2.1 Prehistory

5.2.2 Application-driven Development

5.2.3 Collective Communication

5.2.4 Automated Decompositon-whoami

5.2.5 "Melting"-A Non-crystalline Problem

5.2.6 The Mark III

5.2.7 Host Programs

5.2.8 A Ray Tracer - and an "Operating System"

5.2.9 The Crystal Router

5.2.10 Portability

5.2.11 Express

5.2.12 Other Message-passing Systems

5.2.13 What Did We Learn?

5.2.14 Conclusions

5.3 Parallel Debugging

5.3.1 Introduction and History

5.3.2 Designing a Parallel Debugger

5.3.3 Conclusions

5.4 Parallel Profiling

5.4.1 Missing Point

5.4.2 Visualization

5.4.3 Goals in Performance Analysis

5.4.4 Overhead Analysis

5.4.5 Event Tracing

5.4.6 Data Distribution Analysis

5.4.7 CPU Usage Analysis

5.4.8 Why So Many Separate Tools?

5.4.9 Conclusions

6 Synchronous Applications II

6.1 Computational Issues in Synchronous Problems

6.2 Convectively-Dominated Flows

6.2.1 An Overview of the FCT Technique

6.2.2 Mathematics and the FCT Algorithm

6.2.3 Parallel Issues

6.2.4 Example Problem

6.2.5 Performance and Results

6.2.6 Summary

6.3 Magnetism

6.3.1 Introduction

6.3.2 The Computational Algorithm

6.3.3 Parallel Implementation and Performance

6.3.4 Physics Results

6.3.5 Conclusions

6.4 Phase Transitions

6.4.1 The case of h ( J: Antiferromagnetic Transitions

6.4.2 The Case of h = -J: Quantum XY Model and Topological Transition

6.5 Surface Reconstruction

6.5.1 Multigrid Method with Discontinuities

6.5.2 Interacting Line Processes

6.5.3 Generic Look-up Table and Specific Parametrication

6.5.4 Pyramid on a 2D Mesh of Processors

6.5.5 Results for Orientation Constraints

6.5.6 Results for Depth Constraints

6.5.7 Conclusions

6.6 Character Recognition by Neural Nets

6.6.1 MLP in General

6.6.2 Character Recognition using MLP

6.6.3 The Multiscale The Multiscale Technique

6.6.4 Results

6.6.5 Comments and Variants on the Method

6.7 An Adaptive Multiscale Scheme

6.7.1 Errors in Computing the Motion Field

6.7.2 Adaptive Multiscale Scheme on a Multicomputer

6.7.3 Conclusions

6.8 Collective Stereopsis

7 Independent Parallelism

7.1 Embarrassingly Parallel Problem Structure

7.2 Dynamically Triangulated Random Surfaces

7.2.1 Introduction

7.2.2 Discretized Strings

7.2.3 Computational Aspects

7.2.4 Performance of String Program

7.2.5 Conclusion

7.3 Numerical Study of High-Tc Spin Systems

7.4 Statistical Gravitational Lensing

7.5 Parallel Random Number Generators

7.6 The GENESIS Project

7.6.1 What is Computational Neurobiology?

7.6.2 Parallel Computers?

7.6.3 Problems with Most Present Day Parallel Computers

7.6.4 What is GENESIS?

7.6.5 Task Farming

7.6.6 Distributed Modelling via the Postmaster Element

8 Full Matrix Algorithms

8.1 Full and Banded Matrix Algorithms

8.1.1 Matrix Decomposition

8.1.2 Basic Matrix Arithmetic

8.1.3 Matrix Multiplication for Banded Matrices

8.1.4 Systems of Linear Equations

8.1.5 The Gauss-Jordan Method

8.1.6 Other Matrix Algorithms

8.1.7 Concurrent Linear Algebra Libraries

8.1.8 Problem Structure

8.1.9 Conclusions

8.2 Quantum Mechanical Reactive Scattering

8.2.1 Introduction

8.2.2 Methodology

8.2.3 Parallel Algorithm

8.2.4 Results and Discussion

8.3 Electron-Molecule Collisions

8.3.1 Introduction

8.3.2 The SMC Method and Its Implementation

8.3.3 Parallel Implementation

8.3.4 Performance

8.3.5 Selected Results

8.3.6 Conclusion

9 Loosely Synchronous Problems

9.1 Problem Structure

9.2 Micromechanical Simulations

9.3 Plasma Particle-in-Cell Simulation

9.3.1 Introduction

9.3.2 GCPIC Algorithm

9.3.3 Electron Beam Plasma Instability

9.3.4 Performance Results for 1D Electrostatic Code

9.3.5 One-Dimensional Electromagnetic Code

9.3.6 Dynamic Load Balancing

9.3.7 Summary

9.4 Computational Electromagnetics

9.5 LU Factorization

9.5.1 Introduction

9.5.2 Design Overview

9.5.3 Reduced-Communication Pivoting

9.5.4 New Data Distributions

9.5.5 Performance Versus Scattering

9.5.6 Performance

9.5.7 Conclusions

9.6 Concurrent DASSL

9.6.1 Introduction

9.6.2 Mathematical Formulation

9.6.3 proto-Cdyn - Simulation Layer

9.6.4 Concurrent Formulation

9.6.5 Chemical Engineering Example

9.6.6 Conclusions

9.7 Adaptive Multigrid

9.7.1 Introduction

9.7.2 The Basic Algorithm

9.7.3 The Adaptive Algorithm

9.7.4 The Concurrent Algorithm

9.7.5 Summary

9.8 Munkres Algorithm for Assignment

9.8.1 Introduction

9.8.2 The Sequential Algorithm

9.8.3 The Concurrent Algorithm

9.9. Optimization Methods for Neural Nets

9.9.1 Deficiencies of Steepest Descent

9.9.2 The "Bold Driver" Network

9.9.3 The Broyden-Fletcher-Goldfarb-Shanno One-Step Memoryless Quasi-Newton Method

9.9.4 Parallel Optimization

9.9.5 Experiment: The Dichotomy Problem

9.9.6 Experiment: Time Series Prediction

9.9.7 Summary

10 DIME Programming Environment

10.1 DIME

10.1.1 Applications and Extensions

10.1.2 The Components of DIME

10.1.3 Domain Definition

10.1.4 Mesh Structure

10.1.5 Refinement

10.1.6 Load Balancing

10.1.7 Summary

10.2 DIMEFEM

10.2.1 Memory Allocation

10.2.2 Operations and Elements

10.2.3 Navier-Stokes Solver

10.2.4 Results

10.2.5 Summary

11 Load Balancing and Optimization

11.1 Load Balancing as an Optimization Problem

11.1.1 Load Balancing a Finite-Element Mesh

11.1.2 The Optimization Problem and Physical Analogy

11.1.3 Algorithms for Load Balancing

11.1.4 Simulated Annealing

11.1.5 Recursive Bisection

11.1.6 Eigenvalue Recursive Bisection

11.1.7 Testing Procedure

11.1.8 Test Results

11.1.9 Conclusions

11.2 Physical Analogy

11.3 Physical Optimization

11.4 The Travelling Salesman Problem

11.4.1 Background on Local Search Heuristics

11.4.2 Background on Markov Chains and Simulated Annealing

11.4.3 The New Algorithms-Large-Step Markov Chains

11.4.4 Results

12 Irregular LS Problems

12.1 Irregular Loosely Synchronous Problems are Hard

12.2 The Elctrosensory System of the Fish

12.2.1 Physical Model

12.2.2 Mathematical theory

12.2.3 Results

12.2.4 Summary

12.3 Transonic Flow

12.3.1 Compressible Flow Algorithm

12.3.2 Adaptive Refinement

12.3.3 Examples

12.3.4 Performance

12.3.5 Summary

12.4 Tree Codes for N-body Simulations

12.4.1 Oct-Trees

12.4.2 Computing Forces

12.4.3 Parallelism in Tree Codes

12.4.4 Acquiring Locally Essential Data

12.4.5 Comments on Performance

12.5 Fast vortex Algorithm

12.5.1 Vortex Methods

12.5.2 Fast Algorithms

12.5.3 Hypercube Implementation

12.5.4 Efficiency of Parallel Implementation

12.5.5 Results

12.6 Cluster Algorithms for Spin Models

12.6.1 Monte Carlo Calculations of Spin Models

12.6.2 Cluster Algorithms

12.6.3 Parallel Cluster Algorithms

12.6.4 Self-labelling

12.6.5 Global Equivalencing

12.6.6 Other Algorithms

12.6.7 Summary

12.7 Sorting

12.7.1 The Merge Strategy

12.7.2 The Bitonic Algorithm

12.7.3 Shellsort or Diminishing Increment Algorithm

12.7.4 Quicksort or Samplesort Algorithm

12.8 Hierarchical Tree-Structures

12.8.1 Introduction

12.8.2 Adaptive Structures

12.8.3 Tree as Grid

12.8.4 Conclusion

13 Data Parallel C and Fortran

13.1 High-Level Languages

13.1.1 High Performance Fortran Perspective

13.1.2 Problem Architecture and Message-Passing Fortran

13.1.3 Problem Architecture and Fortran 77

13.2 A Software Tool

13.2.1 Is Any Assistance Really Needed?

13.2.2 Overview of the Tool

13.2.3 Dependence-based Data Partitioning

13.2.4 Mapping Data to Processors

13.2.5 Communication Analysis and Performance Improvement Transformations

13.2.6 Communication Analysis Algorithm

13.2.7 Static Performance Estimator

13.2.8 Conclusion

13.3 Fortran 90 Experiments

13.4 Optimizing Compilers by Neural Networks

13.5 ASPAR

13.5.1 Degrees of Difficulty

13.5.2 Various Parallelizing Technologies

13.5.3 The Local View

13.5.4 The "Global" View

13.5.5 Global Strategies

13.5.6 Dynamic Data Distribution

13.5.7 Conclusions

13.6 Coherent Parallel C

13.7 Hierarchical Memory

14 Asynchronous Applications

14.1 Asynchronous Problems

14.2 Melting in Two Dimensions

14.2.1 Problem Description

14.2.2 Solution Method

14.2.3 Concurrent Update Procedure

14.2.4 Performance Analysis

14.3 Computer Chess

14.3.1 Sequential Computer Chess

14.3.2 Parallel Computer chess: the Hardware

14.3.3 Parallel Alpha-Beta Pruning

14.3.4 Load Balancing

14.3.5 Speedup Measurements

14.3.6 Real-time Graphical Performance Monitoring

14.3.7 Speculation

14.3.8 Summary

15 High-Level Asynchronous Software

15.1 Asynchronous Software Paradigms

15.2 MOOS II

15.2.1 Design of MOOSE

15.2.2 Dynamic Load-Balancing Support

15.2.3. What We Learned

15.3 Time Warp

16 The Zipcode Message-Passing System

16.1 Overview of Zipcode

16.2 Low-Level Primitives

16.2.1 CE/RK Overview

16.2.2 Interface with the CE/RK system

16.2.3 CE Functions

16.2.4 RK Calls

16.2.5 Zipcode Calls

16.3 High-Level Primitives

16.3.1 Invoices

16.3.2 Packing and Unpacking

16.3.3 The Packed-Message Functions

16.3.4 Fortran Interface

16.4 Details of Execution

16.4.1 Initialization/Termination

16.4.2 Process Creation/Destruction

16.5 Conclusions

17 MOVIE

17.1 Introduction

17.1.1 The Beginning

17.1.2 Towards the MOVIE System

17.1.3 Current Status and Outlook

17.2 System Overview

17.2.1 The MOVIE System in a Nutshell

17.2.2 MovieScript as Virtual Machine Language

17.2.3 Data-Parallel Computing

17.2.4 Model for MIMD-parallelism

17.2.5 Distributed Computing

17.2.6 Object Orientation

17.2.7 Integrated Visualization Model

17.2.8 "In Large" Extensibility Model

17.2.9 CASE Tools

17.2.10 Planned MOVIE Applications

17.3 Map Separates

17.3.1 Problem Specification

17.3.2 Test Case

17.3.3 Segmentation via RGB Clustering

17.3.4 Comparison with JPL Neural Net Results

17.3.5 Edge Detection via Zero Crossing

17.3.6 Towards the Map Expert System

17.3.7 Summary

18 Simulation and Analysis

18.1 MetaProblems and MetaSoftware

18.1.1 Applications

18.1.2 Asynchronous versus Loosely Synchronous?

18.1.3 Software for Compound Problems

18.2 ISIS: An Interactive Seismic Imaging System

18.2.1 Introduction

18.2.2 Concepts of Interactive Imaging

18.2.3 Geologist-As-Analyst

18.2.4 Why Interactive Imaging?

18.2.5 System Design

18.2.6 Performance Considerations

18.2.7 Trace Manager

18.2.8 Display Manager

18.2.9 User Interface

18.2.10 Computation

18.2.11 Prototype System

18.2.12 Conclusions

18.3 Parallel Simulations that Emulate Function

18.3.1 The Basic Simulation Structure

18.3.2 The Run-time Environment-the Centaur Operating System

18.3.3 SDI Simulation Evolution

18.3.4 Simulation Framework and Synchronization Control

18.4 Multitarget Tracking

18.4.1 Nature of the Problem

18.4.2 Tracking Techniques

18.4.3 Algorithm Overview

18.4.4 Two-dimensional Mono Tracking

18.4.5 Three-dimensional Tracking

19 Parallel Computing in Industry

19.1 Motivation

19.2 Examples of Industrial Applications

20 Computational Science

20.1 Lessons

20.2 Computational Science

Appendices

A C3P Reports

B Selected Biographic Information

Bibliography

Index