Parallel Computing Works! - 1st Edition - ISBN: 9780080513515

Parallel Computing Works!

1st Edition

Authors: Geoffrey Fox Roy Williams Guiseppe Messina
eBook ISBN: 9780080513515
Imprint: Morgan Kaufmann
Published Date: 28th June 2014
Page Count: 977
Tax/VAT will be calculated at check-out Price includes VAT (GST)
54.95
43.99
126.00
72.95
Unavailable
Price includes VAT (GST)
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Description

A clear illustration of how parallel computers can be successfully applied to large-scale scientific computations. This book demonstrates how a variety of applications in physics, biology, mathematics and other sciences were implemented on real parallel computers to produce new scientific results. It investigates issues of fine-grained parallelism relevant for future supercomputers with particular emphasis on hypercube architecture.

The authors describe how they used an experimental approach to configure different massively parallel machines, design and implement basic system software, and develop algorithms for frequently used mathematical computations. They also devise performance models, measure the performance characteristics of several computers, and create a high-performance computing facility based exclusively on parallel computers. By addressing all issues involved in scientific problem solving, Parallel Computing Works! provides valuable insight into computational science for large-scale parallel architectures. For those in the sciences, the findings reveal the usefulness of an important experimental tool. Anyone in supercomputing and related computational fields will gain a new perspective on the potential contributions of parallelism. Includes over 30 full-color illustrations.

Table of Contents

Parallel Computing Works!
by Geoffrey C. Fox, Roy D. Williams, Paul C. Messina
  • Preface
  • 1 Introduction
    • 1.1 Introduction
    • 1.2 The National Vision for Parallel Computation
    • 1.3 Caltech Concurrent Computation Program
    • 1.4 How Parallel Computing Works
  • 2 Technical Backdrop
    • 2.2 Hardware
      • 2.2.1 Parallel Scientific Computers Before 1980
      • 2.2.2 Early 1980s
      • 2.2.3 Birth of the Hypercube
      • 2.2.4 Mid-1980s
      • 2.2.5 Late 1980s
      • 2.2.6 Parallel Systems-1992
    • 2.3 Software
      • 2.3.1 Languages and Compilers
      • 2.3.2 Tools
    • 2.4 Summary
  • 3. A Methodology for Computation
    • 3.1 Introduction
    • 3.2 The Process
    • 3.3 Complex Systems and Space-Time Structure
    • 3.4 The Temporal Properties of Complex Systems
    • 3.5 Spatial Properties of Complex Systems
    • 3.6 Compound Complex Systems
    • 3.7 Mapping Complex Systems
    • 3.8 Parallel Computing Works?

  • 4 Synchronous Applications I
    • 4.1 QCD and the Beginning of C3P
    • 4.2 Synchronous Applications
    • 4.3 Quantum Chromodynamics
      • 4.3.1 Introduction
      • 4.3.2 Monte Carlo
      • 4.3.3 QCD
      • 4.3.4 Lattice QCD
      • 4.3.5 Concurrent QCD Machines
      • 4.3.6 QCD on the Caltech Hypercubes
      • 4.3.7 QCD on the Connection Machine
      • 4.3.8 Status and Prospects
    • 4.4 Spin Models
      • 4.4.1 Introduction
      • 4.4.2 Ising Model
      • 4.4.3 Potts Model
      • 4.4.4 XY Model
      • 4.4.5 O(3) Model
    • 4.5 An Automata Model of Granular Materials
      • 4.5.1 Introduction
      • 4.5.2 Comparison to Particle Dynamics Models
      • 4.5.3 Comparison to Lattice Gas Models
      • 4.5.4 The Rules for the Lattice Grain Model
      • 4.5.5 O(3) Implementation on a Parallel Computer
      • 4.5.6 Simulations
      • 4.5.7 Conclusion
  • 5 Express and CrOS
    • 5.1 Multicomputer Operating Systems
    • 5.2 A "Packet" History of Message-passing Systems
      • 5.2.1 Prehistory
      • 5.2.2 Application-driven Development
      • 5.2.3 Collective Communication
      • 5.2.4 Automated Decompositon-whoami
      • 5.2.5 "Melting"-A Non-crystalline Problem
      • 5.2.6 The Mark III
      • 5.2.7 Host Programs
      • 5.2.8 A Ray Tracer - and an "Operating System"
      • 5.2.9 The Crystal Router
      • 5.2.10 Portability
      • 5.2.11 Express
      • 5.2.12 Other Message-passing Systems
      • 5.2.13 What Did We Learn?
      • 5.2.14 Conclusions
    • 5.3 Parallel Debugging
      • 5.3.1 Introduction and History
      • 5.3.2 Designing a Parallel Debugger
      • 5.3.3 Conclusions
    • 5.4 Parallel Profiling
      • 5.4.1 Missing Point
      • 5.4.2 Visualization
      • 5.4.3 Goals in Performance Analysis
      • 5.4.4 Overhead Analysis
      • 5.4.5 Event Tracing
      • 5.4.6 Data Distribution Analysis
      • 5.4.7 CPU Usage Analysis
      • 5.4.8 Why So Many Separate Tools?
      • 5.4.9 Conclusions
  • 6 Synchronous Applications II
    • 6.1 Computational Issues in Synchronous Problems
    • 6.2 Convectively-Dominated Flows
      • 6.2.1 An Overview of the FCT Technique
      • 6.2.2 Mathematics and the FCT Algorithm
      • 6.2.3 Parallel Issues
      • 6.2.4 Example Problem
      • 6.2.5 Performance and Results
      • 6.2.6 Summary
    • 6.3 Magnetism
      • 6.3.1 Introduction
      • 6.3.2 The Computational Algorithm
      • 6.3.3 Parallel Implementation and Performance
      • 6.3.4 Physics Results
      • 6.3.5 Conclusions
    • 6.4 Phase Transitions
      • 6.4.1 The case of h ( J: Antiferromagnetic Transitions
      • 6.4.2 The Case of h = -J: Quantum XY Model and Topological Transition
    • 6.5 Surface Reconstruction
      • 6.5.1 Multigrid Method with Discontinuities
      • 6.5.2 Interacting Line Processes
      • 6.5.3 Generic Look-up Table and Specific Parametrication
      • 6.5.4 Pyramid on a 2D Mesh of Processors
      • 6.5.5 Results for Orientation Constraints
      • 6.5.6 Results for Depth Constraints
      • 6.5.7 Conclusions
    • 6.6 Character Recognition by Neural Nets
      • 6.6.1 MLP in General
      • 6.6.2 Character Recognition using MLP
      • 6.6.3 The Multiscale The Multiscale Technique
      • 6.6.4 Results
      • 6.6.5 Comments and Variants on the Method
    • 6.7 An Adaptive Multiscale Scheme
      • 6.7.1 Errors in Computing the Motion Field
      • 6.7.2 Adaptive Multiscale Scheme on a Multicomputer
      • 6.7.3 Conclusions
    • 6.8 Collective Stereopsis
  • 7 Independent Parallelism
    • 7.1 Embarrassingly Parallel Problem Structure
    • 7.2 Dynamically Triangulated Random Surfaces
      • 7.2.1 Introduction
      • 7.2.2 Discretized Strings
      • 7.2.3 Computational Aspects
      • 7.2.4 Performance of String Program
      • 7.2.5 Conclusion
    • 7.3 Numerical Study of High-Tc Spin Systems
    • 7.4 Statistical Gravitational Lensing
    • 7.5 Parallel Random Number Generators
    • 7.6 The GENESIS Project
      • 7.6.1 What is Computational Neurobiology?
      • 7.6.2 Parallel Computers?
      • 7.6.3 Problems with Most Present Day Parallel Computers
      • 7.6.4 What is GENESIS?
      • 7.6.5 Task Farming
      • 7.6.6 Distributed Modelling via the Postmaster Element
  • 8 Full Matrix Algorithms
    • 8.1 Full and Banded Matrix Algorithms
      • 8.1.1 Matrix Decomposition
      • 8.1.2 Basic Matrix Arithmetic
      • 8.1.3 Matrix Multiplication for Banded Matrices
      • 8.1.4 Systems of Linear Equations
      • 8.1.5 The Gauss-Jordan Method
      • 8.1.6 Other Matrix Algorithms
      • 8.1.7 Concurrent Linear Algebra Libraries
      • 8.1.8 Problem Structure
      • 8.1.9 Conclusions
    • 8.2 Quantum Mechanical Reactive Scattering
      • 8.2.1 Introduction
      • 8.2.2 Methodology
      • 8.2.3 Parallel Algorithm
      • 8.2.4 Results and Discussion
    • 8.3 Electron-Molecule Collisions
      • 8.3.1 Introduction
      • 8.3.2 The SMC Method and Its Implementation
      • 8.3.3 Parallel Implementation
      • 8.3.4 Performance
      • 8.3.5 Selected Results
      • 8.3.6 Conclusion
  • 9 Loosely Synchronous Problems
    • 9.1 Problem Structure
    • 9.2 Micromechanical Simulations
    • 9.3 Plasma Particle-in-Cell Simulation
      • 9.3.1 Introduction
      • 9.3.2 GCPIC Algorithm
      • 9.3.3 Electron Beam Plasma Instability
      • 9.3.4 Performance Results for 1D Electrostatic Code
      • 9.3.5 One-Dimensional Electromagnetic Code
      • 9.3.6 Dynamic Load Balancing
      • 9.3.7 Summary
    • 9.4 Computational Electromagnetics
    • 9.5 LU Factorization
      • 9.5.1 Introduction
      • 9.5.2 Design Overview
      • 9.5.3 Reduced-Communication Pivoting
      • 9.5.4 New Data Distributions
      • 9.5.5 Performance Versus Scattering
      • 9.5.6 Performance
      • 9.5.7 Conclusions
    • 9.6 Concurrent DASSL
      • 9.6.1 Introduction
      • 9.6.2 Mathematical Formulation
      • 9.6.3 proto-Cdyn - Simulation Layer
      • 9.6.4 Concurrent Formulation
      • 9.6.5 Chemical Engineering Example
      • 9.6.6 Conclusions
    • 9.7 Adaptive Multigrid
      • 9.7.1 Introduction
      • 9.7.2 The Basic Algorithm
      • 9.7.3 The Adaptive Algorithm
      • 9.7.4 The Concurrent Algorithm
      • 9.7.5 Summary
    • 9.8 Munkres Algorithm for Assignment
      • 9.8.1 Introduction
      • 9.8.2 The Sequential Algorithm
      • 9.8.3 The Concurrent Algorithm
    • 9.9. Optimization Methods for Neural Nets
      • 9.9.1 Deficiencies of Steepest Descent
      • 9.9.2 The "Bold Driver" Network
      • 9.9.3 The Broyden-Fletcher-Goldfarb-Shanno One-Step Memoryless Quasi-Newton Method
      • 9.9.4 Parallel Optimization
      • 9.9.5 Experiment: The Dichotomy Problem
      • 9.9.6 Experiment: Time Series Prediction
      • 9.9.7 Summary
  • 10 DIME Programming Environment
    • 10.1 DIME
      • 10.1.1 Applications and Extensions
      • 10.1.2 The Components of DIME
      • 10.1.3 Domain Definition
      • 10.1.4 Mesh Structure
      • 10.1.5 Refinement
      • 10.1.6 Load Balancing
      • 10.1.7 Summary
    • 10.2 DIMEFEM
      • 10.2.1 Memory Allocation
      • 10.2.2 Operations and Elements
      • 10.2.3 Navier-Stokes Solver
      • 10.2.4 Results
      • 10.2.5 Summary
  • 11 Load Balancing and Optimization
    • 11.1 Load Balancing as an Optimization Problem
      • 11.1.1 Load Balancing a Finite-Element Mesh
      • 11.1.2 The Optimization Problem and Physical Analogy
      • 11.1.3 Algorithms for Load Balancing
      • 11.1.4 Simulated Annealing
      • 11.1.5 Recursive Bisection
      • 11.1.6 Eigenvalue Recursive Bisection
      • 11.1.7 Testing Procedure
      • 11.1.8 Test Results
      • 11.1.9 Conclusions
    • 11.2 Physical Analogy
    • 11.3 Physical Optimization
    • 11.4 The Travelling Salesman Problem
      • 11.4.1 Background on Local Search Heuristics
      • 11.4.2 Background on Markov Chains and Simulated Annealing
      • 11.4.3 The New Algorithms-Large-Step Markov Chains
      • 11.4.4 Results
  • 12 Irregular LS Problems
    • 12.1 Irregular Loosely Synchronous Problems are Hard
    • 12.2 The Elctrosensory System of the Fish
      • 12.2.1 Physical Model
      • 12.2.2 Mathematical theory
      • 12.2.3 Results
      • 12.2.4 Summary
    • 12.3 Transonic Flow
      • 12.3.1 Compressible Flow Algorithm
      • 12.3.2 Adaptive Refinement
      • 12.3.3 Examples
      • 12.3.4 Performance
      • 12.3.5 Summary
    • 12.4 Tree Codes for N-body Simulations
      • 12.4.1 Oct-Trees
      • 12.4.2 Computing Forces
      • 12.4.3 Parallelism in Tree Codes
      • 12.4.4 Acquiring Locally Essential Data
      • 12.4.5 Comments on Performance
    • 12.5 Fast vortex Algorithm
      • 12.5.1 Vortex Methods
      • 12.5.2 Fast Algorithms
      • 12.5.3 Hypercube Implementation
      • 12.5.4 Efficiency of Parallel Implementation
      • 12.5.5 Results
    • 12.6 Cluster Algorithms for Spin Models
      • 12.6.1 Monte Carlo Calculations of Spin Models
      • 12.6.2 Cluster Algorithms
      • 12.6.3 Parallel Cluster Algorithms
      • 12.6.4 Self-labelling
      • 12.6.5 Global Equivalencing
      • 12.6.6 Other Algorithms
      • 12.6.7 Summary
    • 12.7 Sorting
      • 12.7.1 The Merge Strategy
      • 12.7.2 The Bitonic Algorithm
      • 12.7.3 Shellsort or Diminishing Increment Algorithm
      • 12.7.4 Quicksort or Samplesort Algorithm
    • 12.8 Hierarchical Tree-Structures
      • 12.8.1 Introduction
      • 12.8.2 Adaptive Structures
      • 12.8.3 Tree as Grid
      • 12.8.4 Conclusion
  • 13 Data Parallel C and Fortran
    • 13.1 High-Level Languages
      • 13.1.1 High Performance Fortran Perspective
      • 13.1.2 Problem Architecture and Message-Passing Fortran
      • 13.1.3 Problem Architecture and Fortran 77
    • 13.2 A Software Tool
      • 13.2.1 Is Any Assistance Really Needed?
      • 13.2.2 Overview of the Tool
      • 13.2.3 Dependence-based Data Partitioning
      • 13.2.4 Mapping Data to Processors
      • 13.2.5 Communication Analysis and Performance Improvement Transformations
      • 13.2.6 Communication Analysis Algorithm
      • 13.2.7 Static Performance Estimator
      • 13.2.8 Conclusion
    • 13.3 Fortran 90 Experiments
    • 13.4 Optimizing Compilers by Neural Networks
    • 13.5 ASPAR
      • 13.5.1 Degrees of Difficulty
      • 13.5.2 Various Parallelizing Technologies
      • 13.5.3 The Local View
      • 13.5.4 The "Global" View
      • 13.5.5 Global Strategies
      • 13.5.6 Dynamic Data Distribution
      • 13.5.7 Conclusions
    • 13.6 Coherent Parallel C
    • 13.7 Hierarchical Memory
  • 14 Asynchronous Applications
    • 14.1 Asynchronous Problems
    • 14.2 Melting in Two Dimensions
      • 14.2.1 Problem Description
      • 14.2.2 Solution Method
      • 14.2.3 Concurrent Update Procedure
      • 14.2.4 Performance Analysis
    • 14.3 Computer Chess
      • 14.3.1 Sequential Computer Chess
      • 14.3.2 Parallel Computer chess: the Hardware
      • 14.3.3 Parallel Alpha-Beta Pruning
      • 14.3.4 Load Balancing
      • 14.3.5 Speedup Measurements
      • 14.3.6 Real-time Graphical Performance Monitoring
      • 14.3.7 Speculation
      • 14.3.8 Summary
  • 15 High-Level Asynchronous Software
    • 15.1 Asynchronous Software Paradigms
    • 15.2 MOOS II
      • 15.2.1 Design of MOOSE
      • 15.2.2 Dynamic Load-Balancing Support
      • 15.2.3. What We Learned
    • 15.3 Time Warp
  • 16 The Zipcode Message-Passing System
    • 16.1 Overview of Zipcode
    • 16.2 Low-Level Primitives
      • 16.2.1 CE/RK Overview
      • 16.2.2 Interface with the CE/RK system
      • 16.2.3 CE Functions
      • 16.2.4 RK Calls
      • 16.2.5 Zipcode Calls
    • 16.3 High-Level Primitives
      • 16.3.1 Invoices
      • 16.3.2 Packing and Unpacking
      • 16.3.3 The Packed-Message Functions
      • 16.3.4 Fortran Interface
    • 16.4 Details of Execution
      • 16.4.1 Initialization/Termination
      • 16.4.2 Process Creation/Destruction
    • 16.5 Conclusions
  • 17 MOVIE
    • 17.1 Introduction
      • 17.1.1 The Beginning
      • 17.1.2 Towards the MOVIE System
      • 17.1.3 Current Status and Outlook
    • 17.2 System Overview
      • 17.2.1 The MOVIE System in a Nutshell
      • 17.2.2 MovieScript as Virtual Machine Language
      • 17.2.3 Data-Parallel Computing
      • 17.2.4 Model for MIMD-parallelism
      • 17.2.5 Distributed Computing
      • 17.2.6 Object Orientation
      • 17.2.7 Integrated Visualization Model
      • 17.2.8 "In Large" Extensibility Model
      • 17.2.9 CASE Tools
      • 17.2.10 Planned MOVIE Applications
    • 17.3 Map Separates
      • 17.3.1 Problem Specification
      • 17.3.2 Test Case
      • 17.3.3 Segmentation via RGB Clustering
      • 17.3.4 Comparison with JPL Neural Net Results
      • 17.3.5 Edge Detection via Zero Crossing
      • 17.3.6 Towards the Map Expert System
      • 17.3.7 Summary
  • 18 Simulation and Analysis
    • 18.1 MetaProblems and MetaSoftware
      • 18.1.1 Applications
      • 18.1.2 Asynchronous versus Loosely Synchronous?
      • 18.1.3 Software for Compound Problems
    • 18.2 ISIS: An Interactive Seismic Imaging System
      • 18.2.1 Introduction
      • 18.2.2 Concepts of Interactive Imaging
      • 18.2.3 Geologist-As-Analyst
      • 18.2.4 Why Interactive Imaging?
      • 18.2.5 System Design
      • 18.2.6 Performance Considerations
      • 18.2.7 Trace Manager
      • 18.2.8 Display Manager
      • 18.2.9 User Interface
      • 18.2.10 Computation
      • 18.2.11 Prototype System
      • 18.2.12 Conclusions
    • 18.3 Parallel Simulations that Emulate Function
      • 18.3.1 The Basic Simulation Structure
      • 18.3.2 The Run-time Environment-the Centaur Operating System
      • 18.3.3 SDI Simulation Evolution
      • 18.3.4 Simulation Framework and Synchronization Control
    • 18.4 Multitarget Tracking
      • 18.4.1 Nature of the Problem
      • 18.4.2 Tracking Techniques
      • 18.4.3 Algorithm Overview
      • 18.4.4 Two-dimensional Mono Tracking
      • 18.4.5 Three-dimensional Tracking
  • 19 Parallel Computing in Industry
    • 19.1 Motivation
    • 19.2 Examples of Industrial Applications
  • 20 Computational Science
    • 20.1 Lessons
    • 20.2 Computational Science
  • Appendices
  • A C3P Reports
  • B Selected Biographic Information
  • Bibliography
  • Index

Details

No. of pages:
977
Language:
English
Copyright:
© Morgan Kaufmann 1994
Published:
Imprint:
Morgan Kaufmann
eBook ISBN:
9780080513515

About the Author

Geoffrey Fox

Geoffrey Fox is a Distinguished Professor of Informatics, Computing and Physics and Associate Dean of Graduate studies and Research in the School of Informatics and Computing, Indiana University. He has taught and led many research groups at Caltech and Syracuse University, previously. He received his Ph.D. from Cambridge University, U.K. Fox is well known for his comprehensive work and extensive publications in parallel architecture, distributed programming, grid computing, web services, and Internet applications. His book on Grid Computing (coauthored with F. Berman and Tony Hey) is widely used by the research community. He has produced over 60 Ph.D. students in physics, computer science and engineering over the years.

Affiliations and Expertise

Indiana University, USA

Roy Williams

Guiseppe Messina