MPI入门

MPI入门

分布式系统中常常用到MPI,这里简单地学习一下基础用法,并作个笔记。node

教程c++

通信器(communicator)。通信器定义了一组可以互相发消息的进程。在这组进程中,每一个进程会被分配一个序号,称做(rank).分布式

点对点通讯

本身先把本身想要发送的数据写在一个buffer里,该buffer能够是MPI_Datatype类型的指针所指向的一片内存区域,调用Send的时候就将该类型指针转为void *.ide

MPI_Send(
    void* data,
    int count,
    MPI_Datatype datatype,
    int destination,
    int tag,
    MPI_Comm communicator)
MPI_Recv(
    void* data,
    int count,
    MPI_Datatype datatype,
    int source,
    int tag,
    MPI_Comm communicator,
    MPI_Status* status)

注意学习

  • MPI_Recv的source能够是任意tag,表示接收来自任何source的msg,可是MPI_Send的destination应该不能够用来发送消息给任何des吧
  • 目前MPI_Recv的status尚未用到,想忽略这个就直接置为MPI_STATUS_IGNORE

若是调用MPI_Recv时提供了MPI_Status参数,假设是一个名为stat的MPI_Status,就会往里面填入一些信息,主要是如下3个:ui

  • The rank of the sender: 经过stat.MPI_SOURCE去access;
  • tag of the message: stat.MPI_TAG;
  • length of the message: 不能直接经过访问stat的某个元素去access,须要调用下面的方法去access:
MPI_Get_count(
    MPI_Status* status,
    MPI_Datatype datatype,
    int* count)

The count variable is the total number of datatype elements that were received.
到这里就有一个疑问?为何须要这3个信息?指针

  • MPI_Recv里不是已经有一个参数是count了吗?
    事实上,MPI_Recv里的count是最多接收多少个datatype类型的元素,可是MPI_Status里的count是实际接收到了多少个元素。
  • MPI_Recv里不是已经有一个参数tag了吗?
    普通的使用场景下,tag是个固定的值,可是也能够传MPI_ANY_TAG来表示接收任意tag的message,这时如何去分辨收到的message是属于何种tag就只能依靠Status里的信息
  • 相似地,MPI_Recv里也能够指定MPI_ANY_SOURCE来表示接收来自任何sender的message,这时如何分辨收到的message来自哪一个sender也只能靠status的信息。

由于在调用MPI_Recv的时候须要提供一个buffer去存收到的消息嘛,而每每真正收到消息以前咱们并不知道消息有多大,因此先调用Probe去探测一下。而后再调用MPI_Recv去真正接收message。rest

MPI_Probe(
    int source,
    int tag,
    MPI_Comm comm,
    MPI_Status* status)

collective通讯

MPI_Barrier(MPI_Comm communicator): 就是BSP里的barrier啦。code

关于同步最后一个要注意的地方是:始终记得每个你调用的集体通讯方法都是同步的。也就是说,若是你无法让全部进程都完成 MPI_Barrier,那么你也无法完成任何集体调用。若是你在没有确保全部进程都调用 MPI_Barrier 的状况下调用了它,那么程序会空闲下来。这对初学者来讲会很迷惑,因此当心这类问题。orm

MPI_Bcast

MPI_Bcast(
    void* data,
    int count,
    MPI_Datatype datatype,
    int root,
    MPI_Comm communicator)

不管是发送方仍是接收方,都是调用一样的MPI_Bcast。这与点对点通讯不同。

问题:compare_bcast.c中的第二个MPI_Barrier有什么做用?

MPI Scatter, Gather, and Allgather

MPI_Bcast与Scatter的区别:

  • Bcast将一样的数据发送给别的进程,而Scatter将数据的不一样分片发给不一样的进程,即每一个进程只能得到一部分数据;
  • Bcast是发送给其余(即除了本身以外的)进程,而Scatter是发送给communicator中包括本身的全部进程。(这句话的前半句不太肯定)。我的分析:Bcast中应该是发给除本身以外的其余进程,不然的话,它只提供了一个data指针,指向一个buffer,那么若是root也发给本身,那一方面是不必,另外一方面是data必然同时做为send_buffer和recv_buffer,这不可能。而在Scatter中,就提供了2个buffer。由于root就须要同时发送和接收。
MPI_Scatter(
    void* send_data,
    int send_count,
    MPI_Datatype send_datatype,
    void* recv_data,
    int recv_count,
    MPI_Datatype recv_datatype,
    int root,
    MPI_Comm communicator)

MPI Reduce and Allreduce

MPI_Reduce(
    void* send_data,
    void* recv_data,
    int count,
    MPI_Datatype datatype,
    MPI_Op op,
    int root,
    MPI_Comm communicator)
MPI_Allreduce(
    void* send_data,
    void* recv_data,
    int count,
    MPI_Datatype datatype,
    MPI_Op op,
    MPI_Comm communicator)

MPI_Allreduce与MPI_Allgather相似,就是普通的MPI_gather是将结果放到一个进程里,可是MPI_Allgather是将结果返回到全部进程,能被全部进程access到。MPI_Allreduce也同样,将reduce的结果能够被全部的进程access到。

Groups and Communicators

前面的应用,要么是talk to one process或者talk to all the processes, 只是用了默认的一个communicator。 随着程序规模的增大,可能须要只与部分processes通讯,因此引入了group,每一个group分别对应一个communicator。如何建立多个communicator呢?

MPI_Comm_split(
    MPI_Comm comm,
    int color,
    int key,
    MPI_Comm* newcomm)

MPI_Comm_split creates new communicators by “splitting” a communicator into a group of sub-communicators based on the input values color and key.

The first argument, comm, is the communicator that will be used as the basis for the new communicators. This could be MPI_COMM_WORLD, but it could be any other communicator as well.

The second argument, color, determines to which new communicator each processes will belong. All processes which pass in the same value for color are assigned to the same communicator. If the color is MPI_UNDEFINED, that process won’t be included in any of the new communicators.

The third argument, key, determines the ordering (rank) within each new communicator. The process which passes in the smallest value for key will be rank 0, the next smallest will be rank 1, and so on. If there is a tie, the process that had the lower rank in the original communicator will be first.

When you print things out in an MPI program, each process has to send its output back to the place where you launched your MPI job before it can be printed to the screen. This tends to mean that the ordering gets jumbled so you can’t ever assume that just because you print things in a specific rank order, that the output will actually end up in the same order you expect. The output was just rearranged here to look nice.

MPI has a limited number of objects that it can create at a time and not freeing your objects could result in a runtime error if MPI runs out of allocatable objects.

额外的问题

  1. 初始化MPI的时候,MPI Init or MPI_Init_thread
int MPI_Init_thread(int *argc, char *((*argv)[]), int required, int *provided)
int MPI::Init_thread(int& argc, char**& argv, int required) 
int MPI::Init_thread(int required)
  • argc和argv是optional,在C语言里,经过传NULL实现,C++里就是重载了两个额外的MPI_Init_thread。
  • required: the desired level of thread support, 可能的取值:

    • MPI_THREAD_SINGLE: Only one thread will execute.
    • MPI_THREAD_FUNNELED: The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are ``funneled'' to the main thread).
    • MPI_THREAD_SERIALIZED: The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time: MPI calls are not made concurrently from two distinct threads (all MPI calls are ``serialized'').
    • MPI_THREAD_MULTIPLE: Multiple threads may call MPI, with no restrictions.
  • The call returns in provided information about the actual level of thread support that will be provided by MPI. It can be one of the four values listed above.

Vendors may provide (implementation dependent) means to specify the level(s) of thread support available when the MPI program is started, e.g., with arguments to mpiexec. This will affect the outcome of calls to MPI_INIT and MPI_INIT_THREAD.

Suppose, for example, that an MPI program has been started so that only MPI_THREAD_MULTIPLE is available. Then MPI_INIT_THREAD will return provided = MPI_THREAD_MULTIPLE, irrespective of the value of required; a call to MPI_INIT will also initialize the MPI thread support level to MPI_THREAD_MULTIPLE.

Suppose, on the other hand, that an MPI program has been started so that all four levels of thread support are available. Then, a call to MPI_INIT_THREAD will return provided = required; on the other hand, a call to MPI_INIT will initialize the MPI thread support level to MPI_THREAD_SINGLE. 当提供的参数required为MPI_THREAD_SINGLE时,与MPI_Init效果同样

https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm

相关文章
相关标签/搜索