Parallel Programming
Table of Contents
OpenMP
- 线程级别
- 共享存储
- 隐式(数据分配方式)
- 可扩展性差
Data-Sharing Rules
OpenMP does not put any restriction to prevent data races between shared variables. This is a responsibility of a programmer.
-
- shared
- there exists one instance of this variable which is shared among all threads
-
- private
- each thread in a team of threads has its own local copy of the private variable
- Implicit Rules
- The data-sharing attribute of variables, which are declared outside the parallel region, is usually shared
- The loop iteration variables, however, are private by default
- The variables which are declared locally within the parallel region are private
- Explicit rules
- Shared
- The
shared(list)
clause declares that all the variables inlist
are shared - Shared variables introduce an overhead, because one instance of a variable is shared between multiple threads. Therefore, it is often best to minimize the number of shared variables when a good performance is desired.
- The
- Private
- The
private(list)
clause declares that all the variables inlist
are private - When a variable is declared private, OpenMP replicates this variable and assigns its local copy to each thread
- The behavior of private variables is sometimes unintuitive. Let us assume that a private variable has a value before a parallel region. However, the value of the variable at the beginning of the parallel region is undefined. Additionally, the value of the variable is undefined also after the parallel region.
- The
- Default
default(shared)
default(none)
- forces a programmer to explicitly specify the data-sharing attributes of all variables
- Shared
- Rule NO.1
- always write parallel regions with the
default(none)
clause - declare private variables inside parallel regions whenever possible
- always write parallel regions with the
http://jakascorner.com/blog/2016/06/omp-data-sharing-attributes.html
size_t count_mp(const vector<int>& v) {
size_t n = v.size(), cnt = 0;
#pragma omp parallel for shared(n) reduction(+:cnt)
for (int i = 1; i < n; ++i) /* i is private by default */
cnt += count(v[i]);
return cnt;
}
MPI (multi-process)
- 进程级别
- 分布式存储
- 显式(数据分配方式)
- 可扩展性好
Conclusion
OpenMP 采用共享存储,意味着只适应于 SMP,DSM 机器,不适合集群。
- MPI 虽然适合于各种机器,但是编程模型复杂
- 需要分析及划分应用程序问题,并将问题映射到分布式进程集合;
- 需要解决通信延迟和负载不均衡两个主要问题。
- 调试 MPI 程序麻烦
- MPI 程序可靠性差,一个进程出问题,整个程序将错误
一个并行算法的好坏,主要看是否很好的解决了通信延迟和负载不均衡问题。
与 OpenMP,MPI 相比,MapReduce 优势在于:
- 自动并行
- 容错
- 学习门槛低
SMP: Symmetric multi-processing: 共享总线与内存,单一操作系统映象。在软件上可扩展,而硬件上不能。
DSM: Distributed shared memory: SMP 的扩展。物理上分布存储;单一地址空间;非一致内存访问;单一操作系统映象。