[mvapich-discuss] mvapich2 Connection to Self Rejected

Hari Subramoni subramoni.1 at osu.edu
Tue May 2 09:01:42 EDT 2017


Hi Melissa,

In MVAPICH2, a process should be able to establish loopback connections
with itself.

Are you using the default version of MVAPICH2 installed on Comet? If not,
you should use the CH3:OFA-IB-CH3 interface. The following userguide link
has more information on how to build MVAPICH2 in this way.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3a-userguide.html#x1-120004.4

MVAPICH2 also has support for SLURM. The following userguide link has more
information on how to configure MVAPICH2 to run with SLURM.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3a-userguide.html#x1-100004.3.2

Can you please share the output of mpiname -a. This will tell us how the
MVAPICH2 you're using is configured.

Would it be possible to share your program so that we can try things
locally?

Regards,
Hari.

On Mon, May 1, 2017 at 4:28 PM, Melissa Romanus <melissa.romanus at rutgers.edu
> wrote:

> Hi All,
>
> First, apologies if you received this twice, I incorrectly sent it to
> discuss at mpich.org first before I was directed here.
>
> I am writing to ask for some help with mvapich2 on SDSC Comet. I am using
> the intel compilers with mvapich2. The scheduling system on Comet is
> slurm. It seems like the code is seg-faulting inside of MPI_Comm_dup, but
> prior to that, it seems like it is rejecting a connection request to “self”
> (i.e., same IP to same IP).
>
> The modules loaded are:
>
> $ module list
>
> Currently Loaded Modulefiles:
>   1) intel/2013_sp1.2.144   2) mvapich2_ib/2.1        3) gnutools/2.69
>
> I am attempting to use the ib0 interface to create the InfiniBand
> connections. They are manually created in one of my programs, because I am
> running the lowest level RDMA transfer commands.
>
> In my job script, I am launching 3 different applications. I am *not*
> using slurm --multi-prog, because I cannot share an MPI communicator. I
> am instead using 3 different srun commands in a single job script.
> Unfortunately, there’s no flexibility in the manner in which I launch my
> jobs.
>
> Using OpenMPI, I can set the MCA parameters to allow connections from self
> at the byte-transfer layer, i.e., OMPI_MCA_btl="self,openib" and specify
> to slurm that I would like to use --mpi=pmi2. The specific error is of
> the following format: Connection Rejected peer# 0 (198.202.118.238) to
> peer# 0 (198.202.118.238).
>
> After reading through some old mailing lists, I think I want the
> --with-device=ch3:nemesis:ib command in some capacity, but I’m not sure
> if that would be enough to allow the connection from the node to itself. Is
> the self connection inherently a TCP connection? When using mvapich2, I
> still attempt to use srun and I do not specify the --mpi option. I tried
> using mpiexec.hydra explicitly as well, but that was also a problem.
>
> Any of the backtraces I obtain seem to indicate the problem is happening
> at the MPI level. I even tried just putting a barrier after acquiring the
> nodename, and this crashes out too at the Barrier call.
>
>   MPI_Init(&argc, &argv);
>   MPI_Comm comm = MPI_COMM_WORLD;
>   int         rank, nprocs;
>   MPI_Comm_rank(comm, &rank);
>   MPI_Comm_size(comm, &nprocs);
>
>   char nodename[256];
>   int nodename_length;
>   MPI_Get_processor_name(nodename, &nodename_length );
>   printf("%s:I am rank %d of %d\n",nodename, rank, nprocs);
>
>   MPI_Barrier(comm);
>
> When this occurs, I seem to encounter the problem described in the FAQ
> where all ranks get rank 0, nprocs=1 (according to the printf).
> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_
> Questions#Q:_All_my_processes_get_rank_0.
>
> However, I suspect that it is related to the fact that the node cannot
> form a communication with itself. Is there a way to tell mvapich to allow
> the self connection, as you can with OpenMPI?
>
> If you suspect this is not the reason for my errors, please let me know. I
> can provide more specific details of my run and errors. Any help you can
> provide is greatly appreciated.
>
> Thank you,
> -Melissa
>>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170502/2c9d376f/attachment-0001.html>


More information about the mvapich-discuss mailing list