目录


第 1 章 
并行程序设计概述
.
........................................................................1

1.1 并行性概述 ..................................................................................................1

1.2 如何衡量计算速度 ......................................................................................3

1.3 并行计算系统基本知识 ..............................................................................6

1.3.1 弗林分类 .........................................................................................6

1.3.2 共享内存系统与消息传递系统 .....................................................8

1.3.3 几种常见的并行计算系统 ...........................................................10

1.3.4 互连网络 .......................................................................................15

1.3.5 多级存储体系结构 .......................................................................16

1.4 并行编程语言/接口分类 ........................................................................17

1.5 浮点数格式 ................................................................................................19

1.6 例子程序 ....................................................................................................21

1.6.1 矩阵相乘 .......................................................................................21

1.6.2 规约和扫描 ...................................................................................22

1.7 小结 ............................................................................................................26

 习题 ............................................................................................................27

第 2 章 
共享内存系统并行编程
.
.............................................................28

2.1 共享内存系统中的并行模型 ....................................................................28

2.1.1 多线程并行概述 ...........................................................................29

2.1.2 同步与互斥的概念 .......................................................................30

2.2 OpenMP编程 ............................................................................................31

2.2.1 概述 ...............................................................................................31

2.2.2 OpenMP的基本命令 .....................................................................33





并行程序设计



X

2.2.3 共享工作构造及其组合 ...............................................................................35

2.2.4 线程间同步与互斥 .......................................................................................40

2.2.5 常用子句 .......................................................................................................43

2.2.6 OpenMP 示例程序:级数法计算圆周率 ....................................................51

2.2.7 task 工作构造 ................................................................................................52

2.3 Pthreads编程 .............................................................................................................57

2.3.1 Pthreads 简介 .................................................................................................57

2.3.2 线程的创建和终止 .......................................................................................57

2.3.3 线程互斥 .......................................................................................................63

2.3.4 Pthreads 示例程序:级数法计算圆周率 .....................................................67

2.3.5 线程同步 .......................................................................................................69

2.3.6 Pthreads 示例程序:生产者–消费者 ...........................................................76

2.3.7 线程死锁与锁粒度 .......................................................................................79

2.4 面向多核系统的新型编程语言/接口 ....................................................................82

2.4.1 Cilk与Cilk++ .................................................................................................82

2.4.2 TBB ................................................................................................................85

2.5 小结 ............................................................................................................................88

 习题 ............................................................................................................................88

第 3 章 
消息传递系统并行编程
.
...........................................................................90

3.1 MPI 简介 ...................................................................................................................90

3.1.1 MPI 是什么? ...............................................................................................90

3.1.2 MPI 的并行模式 ...........................................................................................91

3.1.3 一个简单的MPI 程序 ...................................................................................92

3.1.4 MPI 基本环境 ...............................................................................................93

3.1.5 通信子、进程组、进程号 ...........................................................................95

3.1.6 MPI 数据类型 ...............................................................................................96

3.1.7 MPI 通信简介 ...............................................................................................98

3.2 点对点通信 ................................................................................................................99

3.2.1 标准通信模式 .............................................................................................100

3.2.2 缓存通信模式 .............................................................................................104

3.2.3 同步通信模式 .............................................................................................106

3.2.4 就绪通信模式 .............................................................................................106



3.2.5 四种通信模式小结 .....................................................................................107

3.2.6 组合发送接收 .............................................................................................108

3.2.7 非阻塞通信 .................................................................................................109

3.3 集合通信 ..................................................................................................................117

3.3.1 集合通信概述 .............................................................................................117

3.3.2 数据广播MPI_Bcast ...................................................................................118

3.3.3 数据分发MPI_Scatter .................................................................................119

3.3.4 数据收集MPI_Gather .................................................................................121

3.3.5 组收集MPI_Allgather .................................................................................123

3.3.6 全互换MPI_Alltoall ....................................................................................124

3.3.7 规约MPI_Reduce ........................................................................................126

3.3.8 组规约MPI_Allreduce .................................................................................130

3.3.9 扫描MPI_Scan .............................................................................................130

3.3.10 栅栏MPI_Barrier .......................................................................................131

3.4 一个MPI示例程序 ................................................................................................132

3.4.1 数值积分的计算 .........................................................................................132

3.4.2 基于数值积分的圆周率计算程序 .............................................................133

3.4.3 MPI墙钟时间 ..............................................................................................134

3.5 进程组和通信子 ......................................................................................................135

3.5.1 组管理 .........................................................................................................136

3.5.2 通信子管理 .................................................................................................138

3.5.3 组间通信子 .................................................................................................140

3.6 MPI与多线程 .........................................................................................................141

3.6.1 如何在MPI程序中使用多线程 ..................................................................141

3.6.2 MPI+OpenMP示例程序 ..............................................................................142

3.6.3 分析和讨论 .................................................................................................144

3.7 进程拓扑 ..................................................................................................................145

3.7.1 进程拓扑简介 .............................................................................................145

3.7.2 创建进程拓扑 .............................................................................................146

3.7.3 进程拓扑相关的通信函数 .........................................................................149

3.8 PGAS编程及语言 ..................................................................................................150

3.9 作业管理系统及使用 ..............................................................................................156

3.9.1 作业管理系统简介 .....................................................................................156

3.9.2 Slurm简介 ....................................................................................................156



3.9.3 在Slurm中以作业方式执行程序 ................................................................158

3.9.4 Slurm的作业脚本 ........................................................................................160

3.9.5 在Slurm中以其他方式执行程序 ................................................................161

3.9.6 Slurm常用命令 ............................................................................................162

3.10 小结 ........................................................................................................................166

 习题 .........................................................................................................................167

第 4 章 
异构系统并行编程
.
..................................................................................169

4.1 异构系统编程概述 ..................................................................................................169

4.2 面向NVIDIA GPU的CUDA编程 .......................................................................170

4.2.1 CUDA概述 ..................................................................................................170

4.2.2 Hello World程序:CUDA程序的基本形态 ..............................................172

4.2.3 两个整数相加程序:CPU-GPU数据交换 ................................................173

4.2.4 向量求和程序:CUDA多线程 ..................................................................176

4.2.5 CUDA线程组织 ..........................................................................................177

4.2.6 CUDA内存层次与变量修饰符 ..................................................................181

4.2.7 函数修饰符 .................................................................................................184

4.2.8 CUDA流 ......................................................................................................185

4.2.9 性能优化 .....................................................................................................192

4.2.10 CUDA统一内存空间 ................................................................................197

4.2.11 使用多GPU ................................................................................................198

4.3 OpenCL编程 ...........................................................................................................200

4.3.1 OpenCL概述 ................................................................................................200

4.3.2 OpenCL程序的执行流程及相关API .........................................................202

4.3.3 OpenCL示例程序一:向量求和 ................................................................211

4.3.4 OpenCL的执行模型与线程组织 ................................................................215

4.3.5 OpenCL的内存层次结构 ............................................................................218

4.3.6 OpenCL示例程序二:矩阵相乘 ................................................................220

4.4 面向申威处理器的Athread编程 ...........................................................................222

4.4.1 申威处理器及其编程简介 .........................................................................222

4.4.2 Hello World程序:Athread程序的基本形态 .............................................223

4.4.3 Athread变量的局部存储空间属性 .............................................................225

4.4.4 Athread主–从核编程接口 ...........................................................................225

4.4.5 Athread寄存器通信 .....................................................................................229



4.4.6 Athread版的Cannon并行矩阵相乘 ............................................................230

4.5 OpenACC编程 ........................................................................................................234

4.5.1 OpenACC概述 .............................................................................................234

4.5.2 OpenACC语法 .............................................................................................234

4.5.3 OpenACC循环并行性 .................................................................................237

4.5.4 基于申威处理器的OpenACC编程 .............................................................238

4.6 小结 ..........................................................................................................................246

 习题 ..........................................................................................................................246

第 5 章 
并行程序性能优化
.
.................................................................................248

5.1 Amdahl定律 ............................................................................................................248

5.2 影响性能的主要因素 ..............................................................................................250

5.2.1 并行开销 .....................................................................................................250

5.2.2 负载均衡 .....................................................................................................251

5.2.3 并行粒度 .....................................................................................................252

5.2.4 并行划分 .....................................................................................................252

5.2.5 依赖关系 .....................................................................................................253

5.2.6 局部性 .........................................................................................................254

5.3 并行程序的可扩展性及性能优化方法 ..................................................................255

5.3.1 什么是并行程序的可扩展性? .................................................................255

5.3.2 确保并行程序可扩展性的重要原则:独立计算块 .................................256

5.3.3 数据划分对性能和可扩展性的影响 .........................................................259

5.3.4 其他常用性能优化方法 .............................................................................264

5.4 PCAM并行设计方法 .............................................................................................266

5.4.1 划分 .............................................................................................................266

5.4.2 通信 .............................................................................................................268

5.4.3 组合 .............................................................................................................270

5.4.4 映射 .............................................................................................................271

5.5 小结 ..........................................................................................................................272

 习题 ..........................................................................................................................272

第 6 章 
典型并行应用算法
.
.................................................................................274

6.1 矩阵相乘 ..................................................................................................................274



6.1.1 基于分块的并行矩阵相乘 .........................................................................274

6.1.2 改进的分块矩阵相乘——Cannon算法 .....................................................275

6.1.3 支持矩阵相乘的专用硬件——脉动阵列 .................................................277

6.2 线性方程组的直接求解 ..........................................................................................279

6.2.1 线性方程组及其求解方法简介 .................................................................279

6.2.2 三角方程组的回代求解 .............................................................................281

6.2.3 高斯消去法 .................................................................................................281

6.2.4 LU分解算法 ................................................................................................282

6.2.5 并行LU分解:逐行交错条带划分和块–循环分配 ..................................285

6.3 线性方程组的迭代求解 ..........................................................................................286

6.3.1 经典迭代求解方法 .....................................................................................286

6.3.2 共轭梯度求解方法 .....................................................................................289

6.3.3 迭代法求解示例:偏微分方程求解 .........................................................295

6.3.4 几种迭代法的并行性讨论 .........................................................................298

6.3.5 稀疏矩阵的压缩数据格式 .........................................................................299

6.4 快速排序 ..................................................................................................................301

6.5 快速傅里叶变换 ......................................................................................................303

6.5.1 算法背景 .....................................................................................................303

6.5.2 算法原理 .....................................................................................................303

6.5.3 递归算法转换为迭代算法 .........................................................................306

6.5.4 并行算法 .....................................................................................................307

6.6 基础线性代数库和软件包 ......................................................................................309

6.6.1 线性代数算法库BLAS ...............................................................................309

6.6.2 线性代数软件包LAPACK ..........................................................................312

6.7 小结 ..........................................................................................................................314

 习题 ..........................................................................................................................314

附录A 
英文缩写词
.
................................................................................................316

参考文献...................................................................................................................318