[mvapich-discuss] port problem using MVAPICH2

Stephan Gerber StephanGerber at gmx.net
Tue Dec 11 10:37:04 EST 2007


Dear MVAPICH users and developers,

i have some problems using MVAPICH2. 
to start with MVAPICH(1) and OPEMMPI are both running on our infinibandcluster but both do not scale as i wanted to do them.
so i want to use MVAPICH2 to achieve better results (hopefully...).
my system is a dual-Opteron cluster with 4 nodes each has 2 processors with each of them two cores.
i tried booting with:
 mpdboot --totalnum=4 -1 --file=/Users/gerber/mac/mpd.hosts --rsh=ssh --verbose --ncpus=16
in this case i end up with the follwoing error message
mpdboot_n01.local (handle_mpd_output 373): from mpd on n01, invalid port info:
does anyone know which problem might that be and how to solve it?
if i use the --chhup(only) option i see that only one node out of four is up!? 

if i dont use the option --totalnum=4  the mpdboot workes fine but still afetr using mpdtrace i see that there is only one host up...

for the second boot-approach i tried starting mpiexec but i end up with the following error:
rank 3 in job 1  n01.local_39320   caused collective abort of all ranks
  exit status of rank 3: return code 13
[rdma_iba_init.c:91] Error initializing MVAPICH2 malloc library
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(230): Initialization failed
MPID_Init(81)........: channel initialization failed
(unknown)(): Other MPI error[gerber at n01]

any help would be appreciated!
thanx in advance
br
stephan


  


More information about the mvapich-discuss mailing list