[mvapich-discuss] port problem using MVAPICH2

Matthew Koop koop at cse.ohio-state.edu
Mon Dec 17 21:40:01 EST 2007


Stephan,

I'm trying to understand the setup you have here -- let me know if this is
correct: You have 4 nodes -- 2 dual-core processors (4 cores per node).
Are you running mpdboot from a compute node or from somewhere else? It
looks like you may be launching from a Mac. If you are launching from
something not a compute node, can you try using a compute node to
boot the ring instead?

Try a simpler command from one of the compute nodes:

mpdboot -n 4 -f <path to hostsfile>

The failure in initializing the malloc library is likely due to starting
the MPD ring from a node that is not in the compute cluster (by default
mpdboot uses the local node as well as those in the hostfile).

Let us know if this helps. If not, can you send the contents of the
hostfile you are using as well?

Thanks,

Matt

On Tue, 11 Dec 2007, Stephan Gerber wrote:

> Dear MVAPICH users and developers,
>
> i have some problems using MVAPICH2.
> to start with MVAPICH(1) and OPEMMPI are both running on our infinibandcluster but both do not scale as i wanted to do them.
> so i want to use MVAPICH2 to achieve better results (hopefully...).
> my system is a dual-Opteron cluster with 4 nodes each has 2 processors with each of them two cores.
> i tried booting with:
>  mpdboot --totalnum=4 -1 --file=/Users/gerber/mac/mpd.hosts --rsh=ssh --verbose --ncpus=16
> in this case i end up with the follwoing error message
> mpdboot_n01.local (handle_mpd_output 373): from mpd on n01, invalid port info:
> does anyone know which problem might that be and how to solve it?
> if i use the --chhup(only) option i see that only one node out of four is up!?
>
> if i dont use the option --totalnum=4  the mpdboot workes fine but still afetr using mpdtrace i see that there is only one host up...
>
> for the second boot-approach i tried starting mpiexec but i end up with the following error:
> rank 3 in job 1  n01.local_39320   caused collective abort of all ranks
>   exit status of rank 3: return code 13
> [rdma_iba_init.c:91] Error initializing MVAPICH2 malloc library
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(230): Initialization failed
> MPID_Init(81)........: channel initialization failed
> (unknown)(): Other MPI error[gerber at n01]
>
> any help would be appreciated!
> thanx in advance
> br
> stephan
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list