[mvapich-discuss] mpiexec and mpirun_rsh as non Root issues

Diego Humberto Kalegari kalegari at lactec.org.br
Wed Aug 17 12:04:12 EDT 2011


Hello Jonathan

I shared the MVAPCHI2 from dvse-cluster to all nodes, and I added the path in the $PATH of all nodes.
This is path where MVAPICH2 is installed -> home/MPI/bin:/

 echo $PATH
/home/l0626/bin:/home/MPI/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin
l0626 at n01:~>

This is the ssh hostname result to some of my nodes

l0626 at n01:~> ssh n03 hostname
n03

l0626 at n02:~> ssh n03 hostname
n03
l0626 at n02:~>


Running the following commands as root works.
mpirun_rsh -np 11 -hostfile hosts ./DETest 55 1000 10000 sequence.txt 10 0.8 0

Fatal error in MPI_Init:
Other MPI error

Fatal error in MPI_Init:
Other MPI error

cannot create cq
cannot create cq
Fatal error in MPI_Init:
Other MPI error

but as l0626 for examples it does not. But if I run it only in a sigle node it works. Bellow are the logs

This command also work as root

mpirun -hosts n01:24,n03:24,n04:24,n05:24,n06:24,n07:24,n08:24,n09:24,n10:24,n11:24 -np 240 ./DETest 13 2300 1000 sequence.txt 230 0.8 0 

but not as l0626 bellow are the logs

Initializing MPI
Initializing MPI
Fatal error in MPI_Init:
Other MPI error

Fatal error in MPI_Init:
Other MPI error

Thanks,

I really appreciate your help.

Diego

________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Wednesday, August 17, 2011 11:52 AM
To: Diego Humberto Kalegari
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mpiexec and mpirun_rsh as non Root issues

Hello,

To determine what may be wrong I'll just ask that you double check on
a few things.  Are you just trying to run a 2 process job?  If so, are
you able to log in as the user mpiexec on the host named dvse-cluster
and then `ssh second_hostname'?  If so, have you installed mvapich2 on
all machines in the same location, or use a shared filesystem?

If any of these things fail please send back the full failed message
included the command that caused the failure.

On Wed, Aug 17, 2011 at 10:12 AM, Diego Humberto Kalegari
<kalegari at lactec.org.br> wrote:
> Hello All,
>
> I'm trying to set up a environment with MVAPICH2.
>
> I was installed this and made all the configurations required if ssh to avoid asking for a user specific PWD.. When I do a ssh to another system with any user it logs in automatically
>
> I was successfully able to run mpiexec and  mpirun_rsh as root user.. But when I try to run i as anoher user, any toher in my system I can't it gives me the following
>
> Fatal error in MPI_Init:
> Other MPI error
>
> [mpiexec at dvse-cluster] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
> [mpiexec at dvse-cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at dvse-cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:179): error waiting for event
> [mpiexec at dvse-cluster] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
>
> Could someone please provide me with any support ?
>
> Best Regards
>
> Dieog
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list