Scalable Shared-Memory Multiprocessing - 1st Edition - ISBN: 9781558603158, 9781483296012

Scalable Shared-Memory Multiprocessing

1st Edition

Authors: Daniel Lenoski Wolf-Dietrich Weber
eBook ISBN: 9781483296012
Imprint: Morgan Kaufmann
Published Date: 1st June 1995
Page Count: 341
Sales tax will be calculated at check-out Price includes VAT/GST
Price includes VAT/GST
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.


Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

Table of Contents

Scalable Shared-Memory Multiprocessing
by Daniel E. Lenoski and Wolf-Dietrich Weber
    Part 1 General Concepts
    Chapter 1 Multiprocessing and Scalability
      1.1 Multiprocessor Architecture
        1.1.1 Single versus Multiple Instruction Streams
        1.1.2 Message-Passing versus Shared-Memory Architectures
      1.2 Cache Coherence
        1.2.1 Uniprocessor Caches
        1.2.2 Multiprocessor Caches
      1.3 Scalability
        1.3.1 Scalable Interconnection Networks
        1.3.2 Scalable Cache Coherence
        1.3.3 Scalable I/O
        1.3.4 Summary of Hardware Architecture Scalability
        1.3.5 Scalability of Parallel Software
      1.4 Scaling and Processor Grain Size
      1.5 Chapter conclusions

    Chapter 2 Shared-Memory Parallel Programs
      2.1 Basic Concepts
      2.2 Parallel Application Set
        2.2.1 MP3D
        2.2.2 Water
        2.2.3 PTHOR
        2.2.4 LocusRoute
        2.2.5 Cholesky
        2.2.6 Barnes-Hut
      2.3 Simulation Environment
        2.3.1 Basic Program Characteristics
      2.4 Parallel Application Execution Model
      2.5 Parallel Execution under a PRAM Memory Model
      2.6 Parallel Execution with Shared Data Uncached
      2.7 Parallel Execution with Shared Data Cached
      2.8 Summary of Results with Different Memory System Models
      2.9 Communication Behavior of Parallel Applications
      2.10 Communication-to-Computation Ratios
      2.11 Invalidation Patterns
        2.11.1 Classification of Data Objects
        2.11.2 Average Invalidation Characteristics
        2.11.3 Basic Invalidation Patterns for Each Application
        2.11.4 MP3D
        2.11.5 Water
        2.11.6 PTHOR
        2.11.7 LocusRoute
        2.11.8 Cholesky
        2.11.9 Barnes-Hut
        2.11.10 Summary of Individual Invalidation Distributions
        2.11.11 Effect of Problem Size
        2.11.12 Effect of Number of Processors
        2.11.13 Effect of Finite Caches and Replacement Hints
        2.11.14 Effect of Cache Line Size
        2.11.15 Invalidation Patterns Summary
      2.12 Chapter Conclusions

    Chapter 3 System Performance Issues
      3.1 Memory Latency
      3.2 Memory Latency Reduction
        3.2.1 Nonuniform Memory Access (NUMA)
        3.2.2 Cache-Only Memory Architecture (COMA)
        3.2.3 Direct Interconnect Networks
        3.2.4 Hierarchical Access
        3.2.5 Protocol Optimizations
        3.2.6 Latency Reduction Summary
      3.3 Latency Hiding
        3.3.1 Weak Consistency Models
        3.3.2 Prefetch
        3.3.3 Multiple-Context Processors
        3.3.4 Producer-Initiated Communications
        3.3.5 Latency Hiding Summary
      3.4 Memory Bandwidth
        3.4.1 Hot Spots
        3.4.2 Synchronization Support
      3.5 Chapter Conclusions

    Chapter 4 System Implementation
      4.1 Scalability of System Costs
        4.1.1 Directory Storage overhead
        4.1.2 Sparse Directories
        4.1.3 Hierarchical Directories
        4.1.4 Summary of Directory Storage overhead
      4.2 Implementation Issues and Design Correctness
        4.2.1 Unbounded Number of Requests
        4.2.2 Distributed memory Operations
        4.2.3 Request Starvation
        4.2.4 Error Detection and Fault tolerance
        4.2.5 Design Verification
      4.3 Chapter Conclusions

    Chapter 5 Scalable Shared-Memory Systems
      5.1 Directory-Based Systems
        5.1.1 DASH
        5.1.2 Alewife
        5.1.4 IEEE Scalable Coherent Interface
        5.1.5 Convex Exemplar
      5.2 Hierarchical Systems
        5.2.1 Encore GigaMax
        5.2.2 ParaDiGM
        5.2.3 Data Diffusion Machine
        5.2.4 Kendall Square Research KSR-1 and KSR-2
      5.3 Reflective Memory Systems
        5.3.1 Plus
        5.3.2 Merlin and Sesame
      5.4 Non-Cache Coherent Systems
        5.4.1 NYU Ultracomputer
        5.4.2 IBM RP3 and BBN TC2000
        5.4.3 Cray Research T3D
      5.5 Vector Supercomputer Systems
        5.5.1 Cray Research Y-MP C90
        5.5.2 Tera Computer MTA
      5.6 Virtual Shared-Memory Systems
        5.6.1 Ivy and Munin/Treadmarks
        5.6.2 J-Machine
        5.6.3 MIT/Motorola T and T-NG
      5.7 Chapter Conclusions

    Part 2 Experience with DASH
    Chapter 6 DASH Prototype System
      6.1 System Organization
        6.1.1 Cluster Organization
        6.1.2 Directory Logic
        6.1.3 Interconnection Network
      6.2 Programmer's Model
      6.3 Coherence Protocol
        6.3.1 Nomenclature
        6.3.2 Basic Memory Operations
        6.3.3 Prefetch Operations
        6.3.4 DMA/Uncached Operations
      6.4 Synchronization Protocol
        6.4.1 Granting Locks
        6.4.2 Fetch&Op Variables
        6.4.3 Fence Operations
      6.5 Protocol General Exceptions
      6.6 Chapter Conclusions

    Chapter 7 Prototype Hardware Structures
      7.1 Base Cluster Hardware
        7.1.1 SGI Multiprocessor Bus (MPBUS)
        7.1.2 SGI CPU Board
        7.1.3 SGI Memory Board
        7.1.4 SGI I/O Board
      7.2 Directory Controller
      7.3 Reply Controller
      7.4 Pseudo-CPU
      7.5 Network and Network Interface
      7.6 Performance Monitor
      7.7 Logic Overhead of Directory-Based Coherence
      7.8 Chapter Conclusions

    Chapter 8 Prototype Performance Analysis
      8.1 Base Memory Performance
        8.1.1 Overall Memory System Bandwidth
        8.1.2 Other Memory Bandwidth Limits
        8.1.3 Processor Issue Bandwidth and Latency
        8.1.4 Interprocessor Latency
        8.1.5 Summary of Memory System Bandwidth and Latency
      8.2 Parallel Application Performance
        8.2.1 Application Run-time Environment
        8.2.2 Application Speedups
        8.2.3 Detailed Case Studies
        8.2.4 Application Speedup Summary
      8.3 Protocol Effectiveness
        8.3.1 Base Protocol Features
        8.3.2 Alternative Memory Operations
      8.4 Chapter Conclusions

    Part 3 Future Trends
    Chapter 9 TeraDASH
      9.1 TeraDASH System Organization
        9.1.1 TeraDASH Cluster Structure
        9.1.2 Intracluster Operations
        9.1.3 TeraDASH Mesh Network
        9.1.4 Tera \DASH Directory Structure
      9.2. TeraDASH Coherence Protocol
        9.2.1 Required Changes for the Scalable Directory Structure
        9.2.2 Enhancements for Increased protocol Robustness
        9.2.3 Enhancements for Increased Performance
      9.3 TeraDASH Performance
        9.3.1 Access Latencies
        9.3.2 Potential Application Speedup
      9.4 Chapter Conclusions

    Chapter 10 Conclusions and Future Directions
      10.1 SSMP Design Conclusions
      10.2 Current Trends
      10.3 Future Trends

    Appendix Multiprocessor Systems


No. of pages:
© Morgan Kaufmann 1995
Morgan Kaufmann
eBook ISBN:

About the Author

Daniel Lenoski

Dr. Lenoski has experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. He was a key contributor to the architecture and design of the DASH multiprocessor. Currently, he is involved with commercializing scalable shared-memory technology.

Wolf-Dietrich Weber

Dr. Weber has experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. He was a key contributor to the architecture and design of the DASH multiprocessor. Currently, he is involved with commercializing scalable shared-memory technology.

Ratings and Reviews