[mvapich-discuss] port problem using MVAPICH2

Stephan Gerber StephanGerber at gmx.net
Mon Dec 17 06:12:48 EST 2007


Dear MVAPICH users and developers,

I am not sure whether my last mail did not arrive the forum or simply nobody could answer my question - so i try it again..

I am using OFED 1.2. which bring mvapich, openmpi and mvapich2 with it.
the system was preconfigured so i did not have to install it. all test-cases and examples running fine but:

when i try to use the three mpi versions in my own application i had success with mvapich and opemmpi but still have problems using mvapich2.
the lib of my application which uses mpi-libs compiled without error but i struggle on the first steps with mvapich2

(my system is a dual-Opteron cluster with 4 nodes each has 2 processors with each of them two cores.)
i tried booting with:
 mpdboot --totalnum=4 -1 --file=/Users/gerber/mac/mpd.hosts --rsh=ssh --verbose --ncpus=16 (of course i tried the simpler command mentioned in the userguide too)
in this case i end up with the follwoing error message
mpdboot_n01.local (handle_mpd_output 373): from mpd on n01, invalid port info:
does anyone know which problem might that be and how to solve it?
if i use the --chhup(only) option i see that only one node out of four is up!? but all other test with the ib-tools look ok.
i am not sure if i am missing a lib or an include path or whast else?! 

if i dont use the option --totalnum=4  the mpdboot workes fine but still after using mpdtrace i see that there is only one host up...

for the second boot-approach i tried starting mpiexec but i end up with the following error:
rank 3 in job 1  n01.local_39320   caused collective abort of all ranks
  exit status of rank 3: return code 13
[rdma_iba_init.c:91] Error initializing MVAPICH2 malloc library
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(230): Initialization failed
MPID_Init(81)........: channel initialization failed
(unknown)(): Other MPI error[gerber at n01]

so i guess there is something wrong with the malloc lib or am i missing some env-variables?!


any help would be really appreciated!

thanx in advance
br
stephan


  


More information about the mvapich-discuss mailing list