[mvapich-discuss] troubles in running MPI job over RoCE, with mvapich2-1.6 shipped with OFED1.5.3.2

Devesh Sharma devesh28 at gmail.com
Thu Feb 21 10:43:57 EST 2013


On Thu, Feb 21, 2013 at 9:06 PM, Devesh Sharma <devesh28 at gmail.com> wrote:

> Hi Jonathan and Devendar, Thanks for a quick response.
>
> MV2_USE_RoCE-1 is given in section 6.12 of mvapich2-1.8.1 user guide. I
> have taken this from there.
> I have installed xterm, it was not there. and also changed the parameter
> names as suggested. I am hitting following output:
>
> [root at neo01 IMB-3.2]# /usr/mpi/gcc/mvapich2-1.6/bin/mpirun_rsh -ssh
> -debug -np 2 MV2_USE_RDMAOE=1 MV2_USE_RDMA_CM=1 -hostfile
> /opt/Work/hostfile /bin/hostname
>

>>>>Host file need to be given before MV2XXX


> Without hostfile option, hostnames must be specified on command line.
> usage: mpirun_rsh [-v] [-sg group] [-rsh|-ssh] [-debug] -[tv] [-xterm]
> [-show] [-legacy] -np N(-hostfile hfile | h1 h2 ... hN) a.out args |
> -config configfile (-hostfile hfile | h1 h2 ... hN)]
> Where:
>         sg         => execute the processes as different group ID
>         rsh        => to use rsh for connecting
>         ssh        => to use ssh for connecting
>         debug      => run each process under the control of gdb
>         tv         => run each process under the control of totalview
>         xterm      => run remote processes under xterm
>         show       => show command for remote execution but don't run it
>         legacy     => use old startup method (1 ssh/process)
>         np         => specify the number of processes
>         h1 h2...   => names of hosts where processes should run
> or      hostfile   => name of file containing hosts, one per line
>         a.out      => name of MPI binary
>         args       => arguments for MPI binary
>         config     => name of file containing the exe information: each
> line has the form -n numProc : exe args
>
> [root at neo01 IMB-3.2]#
>
> One basic doubt I have. I remember long time back as a part of MPI setup I
> use to create a password file in the home directory of user. In this file
> we use to specify a passcode. Is it still a requirement?
>
>
>
> On Thu, Feb 21, 2013 at 8:38 PM, Devendar Bureddy <
> bureddy at cse.ohio-state.edu> wrote:
>
>> Hi Devesh
>>
>> Can you make sure following things?
>>
>> - In mvapich2-1.6, the run-time parameter for RoCE support
>> was MV2_USE_RDMAOE. This is renamed later in mvapich2-1.8.
>>
>>  -  I'm not sure if this is copy/paste issue. The way to specify run-time
>> parameters is "<param_name>=<param_value>
>>    MV2_USE_RoCE-1  ===>  MV2_USE_RoCE=1
>>    MV2_USE_RDMA_CM-1  ===> MV2_USE_RDMA_CM=1
>>
>>
>> -Devendar
>>
>> On Thu, Feb 21, 2013 at 9:35 AM, Devesh Sharma <devesh28 at gmail.com>wrote:
>>
>>> Hi list,
>>>
>>> I am trying to run a simple mpi job over a 2 node cluster with RoCE
>>> adapter and OFED-1.5.3.2. I am facing following error. Please help
>>>
>>> [root at neo01 IMB-3.2]# /usr/mpi/gcc/mvapich2-1.6/bin/mpirun_rsh -ssh
>>> -debug -np 2 MV2_USE_RoCE-1 MV2_USE_RDMA_CM-1 -hostfile /opt/Work/hostfile
>>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>>> execv: No such file or directory
>>> /usr/bin/xterm -e /usr/bin/ssh -q MV2_USE_RoCE-1 cd
>>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2; /usr/bin/env
>>> MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=neo01
>>> MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=53250 MPISPAWN_MPIRUN_PORT=53250
>>> MPISPAWN_NNODES=2 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=23270
>>> MPISPAWN_ARGC=3 MPISPAWN_ARGV_0=/usr/bin/gdb
>>> MPDMAN_KVS_TEMPLATE=kvs_255_neo01_23270 MPISPAWN_LOCAL_NPROCS=1
>>> MPISPAWN_ARGV_1=-hostfile MPISPAWN_ARGV_2=/opt/Work/hostfile
>>> MPISPAWN_ARGV_3=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>>> MPISPAWN_GENERIC_ENV_COUNT=0  MPISPAWN_ID=0
>>> MPISPAWN_WORKING_DIR=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2
>>> MPISPAWN_MPIRUN_RANK_0=0 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
>>> /usr/mpi/gcc/mvapich2-1.6/bin/mpispawn 0 execv: No such file or directory
>>> (null) I��H��|5 (null)
>>> /usr/bin/xterm -e /usr/bin/ssh -q MV2_USE_RDMA_CM-1 cd
>>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2; /usr/bin/env
>>> MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=neo01
>>> MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=53250 MPISPAWN_MPIRUN_PORT=53250
>>> MPISPAWN_NNODES=2 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=23270
>>> MPISPAWN_ARGC=3 MPISPAWN_ARGV_0=/usr/bin/gdb
>>> MPDMAN_KVS_TEMPLATE=kvs_255_neo01_23270 MPISPAWN_LOCAL_NPROCS=1
>>> MPISPAWN_ARGV_1=-hostfile MPISPAWN_ARGV_2=/opt/Work/hostfile
>>> MPISPAWN_ARGV_3=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>>> MPISPAWN_GENERIC_ENV_COUNT=0  MPISPAWN_ID=1
>>> MPISPAWN_WORKING_DIR=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2
>>> MPISPAWN_MPIRUN_RANK_0=1 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
>>> /usr/mpi/gcc/mvapich2-1.6/bin/mpispawn 0 (null) I��H��|5 (null)
>>> child_handler: Error in init phase...wait for cleanup! (0/2mpispawn
>>> connections)
>>> child_handler: Error in init phase...wait for cleanup! (0/2mpispawn
>>> connections)
>>>
>>> -Best Regards
>>>  Devesh
>>>
>>> --
>>> Please don't print this E- mail unless you really need to - this will
>>> preserve trees on planet earth.
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
>> --
>> Devendar
>
>
>
>
> --
> Please don't print this E- mail unless you really need to - this will
> preserve trees on planet earth.
>



-- 
Please don't print this E- mail unless you really need to - this will
preserve trees on planet earth.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130221/21ca72aa/attachment.html


More information about the mvapich-discuss mailing list