Mittwoch, 20. Juli 2011

parallel programming 教程

Is Parallel Programming Hard, And, If So, What Can You Do About It?


Contents
1 Introduction
1.1 Historic Parallel Programming Diffculties
1.2 Parallel Programming Goals
1.2.1 Performance
1.2.2 Productivity
1.2.3 Generality
1.3 Alternatives to Parallel Programming
1.3.1 Multiple Instances of a Sequential Application
1.3.2 Make Use of Existing Parallel Software
1.3.3 Performance Optimization
1.4 What Makes Parallel Programming Hard?
1.4.1 Work Partitioning
1.4.2 Parallel Access Control
1.4.3 Resource Partitioning and Replication
1.4.4 Interacting With Hardware
1.4.5 Composite Capabilities
1.4.6 How Do Languages and Environments Assist With These Tasks?
1.5 Guide to This Book
1.5.1 Quick Quizzes
1.5.2 Sample Source Code

2 Hardware and its Habits
2.1 Overview
2.1.1 Pipelined CPUs
2.1.2 Memory References
2.1.3 Atomic Operations
2.1.4 Memory Barriers
2.1.5 Cache Misses
2.1.6 I/O Operations
2.2 Overheads
2.2.1 Hardware System Architecture
2.2.2 Costs of Operations
2.3 Hardware Free Lunch?
2.3.1D Integration
2.3.2 Novel Materials and Processes
2.3.3 Special-Purpose Accelerators
2.3.4 Existing Parallel Software
2.4 Software Design Implications

3 Tools of the Trade
3.1 Scripting Languages
3.2 POSIX Multiprocessing
3.2.1 POSIX Process Creation and Destruction
3.2.2 POSIX Thread Creation and Destruction
3.2.3 POSIX Locking
3.2.4 POSIX Reader-Writer Locking
3.3 Atomic Operations
3.4 Linux-Kernel Equivalents to POSIX Operations
3.5 The Right Tool for the Job: How to Choose?

4 Counting
4.1 Why Isn’t Concurrent Counting Trivial?
4.2 Statistical Counters
4.2.1 Design
4.2.2 Array-Based Implementation
4.2.3 Eventually Consistent Implementation
4.2.4 Per-Thread-Variable-Based Implementation
4.2.5 Discussion
4.3 Approximate Limit Counters
4.3.1 Design
4.3.2 Simple Limit Counter Implementation
4.3.3 Simple Limit Counter Discussion
4.3.4 Approximate Limit Counter Implementation
4.3.5 Approximate Limit Counter Discussion
4.4 Exact Limit Counters
4.4.1 Atomic Limit Counter Implementation
4.4.2 Atomic Limit Counter Discussion
4.4.3 Signal-Theft Limit Counter Design
4.4.4 Signal-Theft Limit Counter Implementation
4.4.5 Signal-Theft Limit Counter Discussion
4.5 Applying Specialized Parallel Counters
4.6 Parallel Counting Discussion

5 Partitioning and Synchronization Design
5.1 Partitioning Exercises
5.1.1 Dining Philosophers Problem
5.1.2 Double-Ended Queue
5.1.3 Partitioning Example Discussion
5.2 Design Criteria
5.3 Synchronization Granularity
5.3.1 Sequential Program
5.3.2 Code Locking
5.3.3 Data Locking
5.3.4 Data Ownership
5.3.5 Locking Granularity and Performance
5.4 Parallel Fastpath
5.4.1 Reader/Writer Locking
5.4.2 Hierarchical Locking
5.4.3 Resource Allocator Caches
5.5 Performance Summary

6 Locking
6.1 Staying Alive
6.1.1 Deadlock
6.1.2 Livelock and Starvation
6.1.3 Unfairness
6.1.4 Inef?ciency
6.2 Types of Locks
6.2.1 Exclusive Locks
6.2.2 Reader-Writer Locks
6.2.3 Beyond Reader-Writer Locks
6.3 Locking Implementation Issues
6.3.1 Sample Exclusive-Locking Implementation Based on Atomic Exchange
6.3.2 Other Exclusive-Locking Implementations
6.4 Lock-Based Existence Guarantees
6.5 Locking: Hero or Villain?

7 Data Ownership

8 Deferred Processing
8.1 Reference Counting
8.1.1 Implementation of Reference-Counting Categories
8.1.2 Linux Primitives Supporting Reference Counting
8.1.3 Counter Optimizations
8.2 Sequence Locks
8.3 Read-Copy Update (RCU)
8.3.1 Introduction to RCU
8.3.2 RCU Fundamentals
8.3.3 RCU Usage
8.3.4 RCU Linux-Kernel API
8.3.5 “Toy” RCU Implementations
8.3.6 RCU Exercises

9 Applying RCU
9.1 RCU and Per-Thread-Variable-Based Statistical Counters
9.1.1 Design
9.1.2 Implementation
9.1.3 Discussion
9.2 RCU and Counters for Removable I/O Devices
10 Validation: Debugging and Analysis
10.1 Tracing
10.2 Assertions
10.3 Static Analysis
10.4 Probability and Heisenbugs
10.5 Pro?ling
10.6 Differential Pro?ling
10.7 Performance Estimation vi CONTENTS

11 Data Structures
11.1 Lists
11.2 Computational Complexity and Performance
11.3 Design Tradeoffs
11.4 Protection
11.5 Bits and Bytes
11.6 Hardware Considerations

12 Advanced Synchronization
12.1 Avoiding Locks
12.2 Memory Barriers
12.2.1 Memory Ordering and Memory Barriers
12.2.2 If B Follows A, and C Follows B, Why Doesn’t C Follow A?
12.2.3 Variables Can Have More Than One Value
12.2.4 What Can You Trust?
12.2.5 Review of Locking Implementations
12.2.6 A Few Simple Rules
12.2.7 Abstract Memory Access Model
12.2.8 Device Operations
12.2.9 Guarantees
12.2.10What Are Memory Barriers?
12.2.11 Locking Constraints
12.2.12Memory-Barrier Examples 148
12.2.13 The Effects of the CPU Cache
12.2.14Where Are Memory Barriers Needed?
12.3 Non-Blocking Synchronization
12.3.1 Simple NBS
12.3.2 Hazard Pointers
12.3.3 Atomic Data Structures
12.3.4 “Macho” NBS

13 Ease of Use
13.1 Rusty Scale for API Design
13.2 Shaving the Mandelbrot Set

14 Time Management

15 Conflicting Visions of the Future
15.1 The Future of CPU Technology Ain’t What it Used to Be
15.1.1 Uniprocessor Über Alles
15.1.2 Multithreaded Mania
15.1.3 More of the Same
15.1.4 Crash Dummies Slamming into the Memory Wall
15.2 Transactional Memory
15.2.1 I/O Operations
15.2.2 RPC Operations
15.2.3 Memory-Mapping Operations
15.2.4 Multithreaded Transactions
15.2.5 Extra-Transactional Accesses
15.2.6 Time Delays 168CONTENTS vii
15.2.7 Locking
15.2.8 Reader-Writer Locking
15.2.9 Persistence
15.2.10 Dynamic Linking and Loading
15.2.11 Debugging
15.2.12 The exec() System Call
15.2.13 RCU
15.2.14 Discussion
15.3 Shared-Memory Parallel Functional Programming
15.4 Process-Based Parallel Functional Programming

Keine Kommentare: