Is Parallel Programming Hard, And, If So, What Can You Do About It?

PDF

Contents

1 Introduction

1.1 Historic Parallel Programming Diffculties

1.2 Parallel Programming Goals

1.2.1 Performance

1.2.2 Productivity

1.2.3 Generality

1.3 Alternatives to Parallel Programming

1.3.1 Multiple Instances of a Sequential Application

1.3.2 Make Use of Existing Parallel Software

1.3.3 Performance Optimization

1.4 What Makes Parallel Programming Hard?

1.4.1 Work Partitioning

1.4.2 Parallel Access Control

1.4.3 Resource Partitioning and Replication

1.4.4 Interacting With Hardware

1.4.5 Composite Capabilities

1.4.6 How Do Languages and Environments Assist With These Tasks?

1.5 Guide to This Book

1.5.1 Quick Quizzes

1.5.2 Sample Source Code

2 Hardware and its Habits

2.1 Overview

2.1.1 Pipelined CPUs

2.1.2 Memory References

2.1.3 Atomic Operations

2.1.4 Memory Barriers

2.1.5 Cache Misses

2.1.6 I/O Operations

2.2 Overheads

2.2.1 Hardware System Architecture

2.2.2 Costs of Operations

2.3 Hardware Free Lunch?

2.3.1D Integration

2.3.2 Novel Materials and Processes

2.3.3 Special-Purpose Accelerators

2.3.4 Existing Parallel Software

2.4 Software Design Implications

3 Tools of the Trade

3.1 Scripting Languages

3.2 POSIX Multiprocessing

3.2.1 POSIX Process Creation and Destruction

3.2.2 POSIX Thread Creation and Destruction

3.2.3 POSIX Locking

3.2.4 POSIX Reader-Writer Locking

3.3 Atomic Operations

3.4 Linux-Kernel Equivalents to POSIX Operations

3.5 The Right Tool for the Job: How to Choose?

4 Counting

4.1 Why Isn’t Concurrent Counting Trivial?

4.2 Statistical Counters

4.2.1 Design

4.2.2 Array-Based Implementation

4.2.3 Eventually Consistent Implementation

4.2.4 Per-Thread-Variable-Based Implementation

4.2.5 Discussion

4.3 Approximate Limit Counters

4.3.1 Design

4.3.2 Simple Limit Counter Implementation

4.3.3 Simple Limit Counter Discussion

4.3.4 Approximate Limit Counter Implementation

4.3.5 Approximate Limit Counter Discussion

4.4 Exact Limit Counters

4.4.1 Atomic Limit Counter Implementation

4.4.2 Atomic Limit Counter Discussion

4.4.3 Signal-Theft Limit Counter Design

4.4.4 Signal-Theft Limit Counter Implementation

4.4.5 Signal-Theft Limit Counter Discussion

4.5 Applying Specialized Parallel Counters

4.6 Parallel Counting Discussion

5 Partitioning and Synchronization Design

5.1 Partitioning Exercises

5.1.1 Dining Philosophers Problem

5.1.2 Double-Ended Queue

5.1.3 Partitioning Example Discussion

5.2 Design Criteria

5.3 Synchronization Granularity

5.3.1 Sequential Program

5.3.2 Code Locking

5.3.3 Data Locking

5.3.4 Data Ownership

5.3.5 Locking Granularity and Performance

5.4 Parallel Fastpath

5.4.1 Reader/Writer Locking

5.4.2 Hierarchical Locking

5.4.3 Resource Allocator Caches

5.5 Performance Summary

6 Locking

6.1 Staying Alive

6.1.1 Deadlock

6.1.2 Livelock and Starvation

6.1.3 Unfairness

6.1.4 Inef?ciency

6.2 Types of Locks

6.2.1 Exclusive Locks

6.2.2 Reader-Writer Locks

6.2.3 Beyond Reader-Writer Locks

6.3 Locking Implementation Issues

6.3.1 Sample Exclusive-Locking Implementation Based on Atomic Exchange

6.3.2 Other Exclusive-Locking Implementations

6.4 Lock-Based Existence Guarantees

6.5 Locking: Hero or Villain?

7 Data Ownership

8 Deferred Processing

8.1 Reference Counting

8.1.1 Implementation of Reference-Counting Categories

8.1.2 Linux Primitives Supporting Reference Counting

8.1.3 Counter Optimizations

8.2 Sequence Locks

8.3 Read-Copy Update (RCU)

8.3.1 Introduction to RCU

8.3.2 RCU Fundamentals

8.3.3 RCU Usage

8.3.4 RCU Linux-Kernel API

8.3.5 “Toy” RCU Implementations

8.3.6 RCU Exercises

9 Applying RCU

9.1 RCU and Per-Thread-Variable-Based Statistical Counters

9.1.1 Design

9.1.2 Implementation

9.1.3 Discussion

9.2 RCU and Counters for Removable I/O Devices

10 Validation: Debugging and Analysis

10.1 Tracing

10.2 Assertions

10.3 Static Analysis

10.4 Probability and Heisenbugs

10.5 Pro?ling

10.6 Differential Pro?ling

10.7 Performance Estimation vi CONTENTS

11 Data Structures

11.1 Lists

11.2 Computational Complexity and Performance

11.3 Design Tradeoffs

11.4 Protection

11.5 Bits and Bytes

11.6 Hardware Considerations

12 Advanced Synchronization

12.1 Avoiding Locks

12.2 Memory Barriers

12.2.1 Memory Ordering and Memory Barriers

12.2.2 If B Follows A, and C Follows B, Why Doesn’t C Follow A?

12.2.3 Variables Can Have More Than One Value

12.2.4 What Can You Trust?

12.2.5 Review of Locking Implementations

12.2.6 A Few Simple Rules

12.2.7 Abstract Memory Access Model

12.2.8 Device Operations

12.2.9 Guarantees

12.2.10What Are Memory Barriers?

12.2.11 Locking Constraints

12.2.12Memory-Barrier Examples 148

12.2.13 The Effects of the CPU Cache

12.2.14Where Are Memory Barriers Needed?

12.3 Non-Blocking Synchronization

12.3.1 Simple NBS

12.3.2 Hazard Pointers

12.3.3 Atomic Data Structures

12.3.4 “Macho” NBS

13 Ease of Use

13.1 Rusty Scale for API Design

13.2 Shaving the Mandelbrot Set

14 Time Management

15 Conflicting Visions of the Future

15.1 The Future of CPU Technology Ain’t What it Used to Be

15.1.1 Uniprocessor Über Alles

15.1.2 Multithreaded Mania

15.1.3 More of the Same

15.1.4 Crash Dummies Slamming into the Memory Wall

15.2 Transactional Memory

15.2.1 I/O Operations

15.2.2 RPC Operations

15.2.3 Memory-Mapping Operations

15.2.4 Multithreaded Transactions

15.2.5 Extra-Transactional Accesses

15.2.6 Time Delays 168CONTENTS vii

15.2.7 Locking

15.2.8 Reader-Writer Locking

15.2.9 Persistence

15.2.10 Dynamic Linking and Loading

15.2.11 Debugging

15.2.12 The exec() System Call

15.2.13 RCU

15.2.14 Discussion

15.3 Shared-Memory Parallel Functional Programming

15.4 Process-Based Parallel Functional Programming

GUTE DINGE

Mittwoch, 20. Juli 2011

parallel programming 教程

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Keine Kommentare:

Beliebte Posts

Follower

Meine Blog-Liste

Free Source