分布式系统中常常用到MPI,这里简单地学习一下基础用法,并作个笔记。node
教程c++
通信器(communicator)。通信器定义了一组可以互相发消息的进程。在这组进程中,每一个进程会被分配一个序号,称做(rank).分布式
本身先把本身想要发送的数据写在一个buffer里,该buffer能够是MPI_Datatype
类型的指针所指向的一片内存区域,调用Send的时候就将该类型指针转为void *
.ide
MPI_Send( void* data, int count, MPI_Datatype datatype, int destination, int tag, MPI_Comm communicator)
MPI_Recv( void* data, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm communicator, MPI_Status* status)
注意学习
MPI_STATUS_IGNORE
。若是调用MPI_Recv
时提供了MPI_Status
参数,假设是一个名为stat的MPI_Status,就会往里面填入一些信息,主要是如下3个:ui
stat.MPI_SOURCE
去access;stat.MPI_TAG
;MPI_Get_count( MPI_Status* status, MPI_Datatype datatype, int* count)
The count
variable is the total number of datatype
elements that were received.
到这里就有一个疑问?为何须要这3个信息?指针
MPI_Recv
里的count是最多接收多少个datatype类型的元素,可是MPI_Status里的count是实际接收到了多少个元素。MPI_ANY_TAG
来表示接收任意tag的message,这时如何去分辨收到的message是属于何种tag就只能依靠Status里的信息MPI_ANY_SOURCE
来表示接收来自任何sender的message,这时如何分辨收到的message来自哪一个sender也只能靠status的信息。由于在调用MPI_Recv
的时候须要提供一个buffer去存收到的消息嘛,而每每真正收到消息以前咱们并不知道消息有多大,因此先调用Probe去探测一下。而后再调用MPI_Recv
去真正接收message。rest
MPI_Probe( int source, int tag, MPI_Comm comm, MPI_Status* status)
MPI_Barrier(MPI_Comm communicator)
: 就是BSP里的barrier啦。code
关于同步最后一个要注意的地方是:始终记得每个你调用的集体通讯方法都是同步的。也就是说,若是你无法让全部进程都完成 MPI_Barrier,那么你也无法完成任何集体调用。若是你在没有确保全部进程都调用 MPI_Barrier 的状况下调用了它,那么程序会空闲下来。这对初学者来讲会很迷惑,因此当心这类问题。orm
MPI_Bcast( void* data, int count, MPI_Datatype datatype, int root, MPI_Comm communicator)
不管是发送方仍是接收方,都是调用一样的MPI_Bcast
。这与点对点通讯不同。
问题:compare_bcast.c
中的第二个MPI_Barrier有什么做用?
MPI_Bcast与Scatter的区别:
MPI_Scatter( void* send_data, int send_count, MPI_Datatype send_datatype, void* recv_data, int recv_count, MPI_Datatype recv_datatype, int root, MPI_Comm communicator)
MPI_Reduce( void* send_data, void* recv_data, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm communicator)
MPI_Allreduce( void* send_data, void* recv_data, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm communicator)
MPI_Allreduce与MPI_Allgather
相似,就是普通的MPI_gather
是将结果放到一个进程里,可是MPI_Allgather
是将结果返回到全部进程,能被全部进程access到。MPI_Allreduce
也同样,将reduce的结果能够被全部的进程access到。
前面的应用,要么是talk to one process或者talk to all the processes, 只是用了默认的一个communicator。 随着程序规模的增大,可能须要只与部分processes通讯,因此引入了group,每一个group分别对应一个communicator。如何建立多个communicator呢?
MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm* newcomm)
MPI_Comm_split
creates new communicators by “splitting” a communicator into a group of sub-communicators based on the input values color and key.
The first argument, comm, is the communicator that will be used as the basis for the new communicators. This could be MPI_COMM_WORLD, but it could be any other communicator as well.
The second argument, color, determines to which new communicator each processes will belong. All processes which pass in the same value for color are assigned to the same communicator. If the color is MPI_UNDEFINED, that process won’t be included in any of the new communicators.
The third argument, key, determines the ordering (rank) within each new communicator. The process which passes in the smallest value for key will be rank 0, the next smallest will be rank 1, and so on. If there is a tie, the process that had the lower rank in the original communicator will be first.
When you print things out in an MPI program, each process has to send its output back to the place where you launched your MPI job before it can be printed to the screen. This tends to mean that the ordering gets jumbled so you can’t ever assume that just because you print things in a specific rank order, that the output will actually end up in the same order you expect. The output was just rearranged here to look nice.
MPI has a limited number of objects that it can create at a time and not freeing your objects could result in a runtime error if MPI runs out of allocatable objects.
MPI Init
or MPI_Init_thread
?int MPI_Init_thread(int *argc, char *((*argv)[]), int required, int *provided)
int MPI::Init_thread(int& argc, char**& argv, int required) int MPI::Init_thread(int required)
required: the desired level of thread support, 可能的取值:
Vendors may provide (implementation dependent) means to specify the level(s) of thread support available when the MPI program is started, e.g., with arguments to mpiexec. This will affect the outcome of calls to MPI_INIT and MPI_INIT_THREAD.
Suppose, for example, that an MPI program has been started so that only MPI_THREAD_MULTIPLE is available. Then MPI_INIT_THREAD will return provided = MPI_THREAD_MULTIPLE, irrespective of the value of required; a call to MPI_INIT will also initialize the MPI thread support level to MPI_THREAD_MULTIPLE.
Suppose, on the other hand, that an MPI program has been started so that all four levels of thread support are available. Then, a call to MPI_INIT_THREAD will return provided = required; on the other hand, a call to MPI_INIT will initialize the MPI thread support level to MPI_THREAD_SINGLE. 当提供的参数required为MPI_THREAD_SINGLE时,与MPI_Init效果同样
https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm