[mvapich-discuss] troubles in running MPI job over RoCE, with mvapich2-1.6 shipped with OFED1.5.3.2

Devesh Sharma devesh28 at gmail.com
Thu Feb 21 10:36:05 EST 2013


Hi Jonathan and Devendar, Thanks for a quick response.

MV2_USE_RoCE-1 is given in section 6.12 of mvapich2-1.8.1 user guide. I
have taken this from there.
I have installed xterm, it was not there. and also changed the parameter
names as suggested. I am hitting following output:

[root at neo01 IMB-3.2]# /usr/mpi/gcc/mvapich2-1.6/bin/mpirun_rsh -ssh -debug
-np 2 MV2_USE_RDMAOE=1 MV2_USE_RDMA_CM=1 -hostfile /opt/Work/hostfile
/bin/hostname
Without hostfile option, hostnames must be specified on command line.
usage: mpirun_rsh [-v] [-sg group] [-rsh|-ssh] [-debug] -[tv] [-xterm]
[-show] [-legacy] -np N(-hostfile hfile | h1 h2 ... hN) a.out args |
-config configfile (-hostfile hfile | h1 h2 ... hN)]
Where:
        sg         => execute the processes as different group ID
        rsh        => to use rsh for connecting
        ssh        => to use ssh for connecting
        debug      => run each process under the control of gdb
        tv         => run each process under the control of totalview
        xterm      => run remote processes under xterm
        show       => show command for remote execution but don't run it
        legacy     => use old startup method (1 ssh/process)
        np         => specify the number of processes
        h1 h2...   => names of hosts where processes should run
or      hostfile   => name of file containing hosts, one per line
        a.out      => name of MPI binary
        args       => arguments for MPI binary
        config     => name of file containing the exe information: each
line has the form -n numProc : exe args

[root at neo01 IMB-3.2]#

One basic doubt I have. I remember long time back as a part of MPI setup I
use to create a password file in the home directory of user. In this file
we use to specify a passcode. Is it still a requirement?


On Thu, Feb 21, 2013 at 8:38 PM, Devendar Bureddy <
bureddy at cse.ohio-state.edu> wrote:

> Hi Devesh
>
> Can you make sure following things?
>
> - In mvapich2-1.6, the run-time parameter for RoCE support
> was MV2_USE_RDMAOE. This is renamed later in mvapich2-1.8.
>
>  -  I'm not sure if this is copy/paste issue. The way to specify run-time
> parameters is "<param_name>=<param_value>
>    MV2_USE_RoCE-1  ===>  MV2_USE_RoCE=1
>    MV2_USE_RDMA_CM-1  ===> MV2_USE_RDMA_CM=1
>
>
> -Devendar
>
> On Thu, Feb 21, 2013 at 9:35 AM, Devesh Sharma <devesh28 at gmail.com> wrote:
>
>> Hi list,
>>
>> I am trying to run a simple mpi job over a 2 node cluster with RoCE
>> adapter and OFED-1.5.3.2. I am facing following error. Please help
>>
>> [root at neo01 IMB-3.2]# /usr/mpi/gcc/mvapich2-1.6/bin/mpirun_rsh -ssh
>> -debug -np 2 MV2_USE_RoCE-1 MV2_USE_RDMA_CM-1 -hostfile /opt/Work/hostfile
>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>> execv: No such file or directory
>> /usr/bin/xterm -e /usr/bin/ssh -q MV2_USE_RoCE-1 cd
>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2; /usr/bin/env
>> MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=neo01
>> MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=53250 MPISPAWN_MPIRUN_PORT=53250
>> MPISPAWN_NNODES=2 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=23270
>> MPISPAWN_ARGC=3 MPISPAWN_ARGV_0=/usr/bin/gdb
>> MPDMAN_KVS_TEMPLATE=kvs_255_neo01_23270 MPISPAWN_LOCAL_NPROCS=1
>> MPISPAWN_ARGV_1=-hostfile MPISPAWN_ARGV_2=/opt/Work/hostfile
>> MPISPAWN_ARGV_3=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>> MPISPAWN_GENERIC_ENV_COUNT=0  MPISPAWN_ID=0
>> MPISPAWN_WORKING_DIR=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2
>> MPISPAWN_MPIRUN_RANK_0=0 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
>> /usr/mpi/gcc/mvapich2-1.6/bin/mpispawn 0 execv: No such file or directory
>> (null) I��H��|5 (null)
>> /usr/bin/xterm -e /usr/bin/ssh -q MV2_USE_RDMA_CM-1 cd
>> /usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2; /usr/bin/env
>> MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=neo01
>> MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=53250 MPISPAWN_MPIRUN_PORT=53250
>> MPISPAWN_NNODES=2 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=23270
>> MPISPAWN_ARGC=3 MPISPAWN_ARGV_0=/usr/bin/gdb
>> MPDMAN_KVS_TEMPLATE=kvs_255_neo01_23270 MPISPAWN_LOCAL_NPROCS=1
>> MPISPAWN_ARGV_1=-hostfile MPISPAWN_ARGV_2=/opt/Work/hostfile
>> MPISPAWN_ARGV_3=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2/IMB-MPI1
>> MPISPAWN_GENERIC_ENV_COUNT=0  MPISPAWN_ID=1
>> MPISPAWN_WORKING_DIR=/usr/mpi/gcc/mvapich2-1.6/tests/IMB-3.2
>> MPISPAWN_MPIRUN_RANK_0=1 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
>> /usr/mpi/gcc/mvapich2-1.6/bin/mpispawn 0 (null) I��H��|5 (null)
>> child_handler: Error in init phase...wait for cleanup! (0/2mpispawn
>> connections)
>> child_handler: Error in init phase...wait for cleanup! (0/2mpispawn
>> connections)
>>
>> -Best Regards
>>  Devesh
>>
>> --
>> Please don't print this E- mail unless you really need to - this will
>> preserve trees on planet earth.
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
> --
> Devendar




-- 
Please don't print this E- mail unless you really need to - this will
preserve trees on planet earth.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130221/b3a4906b/attachment-0001.html


More information about the mvapich-discuss mailing list