[mvapich-discuss] problem w/MVAPICH in the frames of Gen1

Sayantan Sur surs at cse.ohio-state.edu
Fri Aug 4 10:20:56 EDT 2006


Mikail,

Mikhail Kuzminsky wrote:

> In message from Dhabaleswar Panda <panda at cse.ohio-state.edu> (Thu, 3 
> Aug 2006 12:32:45 -0400 (EDT)):
>
>> Mikhail - Thanks for your note. Since you are trying MVAPICH with 
>> IBGD-1.8.0, let me suggest that you
>> contact Mellanox people regarding this problem.
>> You are also using a very old version of MVAPICH (0.9.5).
>
>
> There is no difference in the case of using of your last MVAPICH-0.9.8:
> for example, after mpicc -noshlib -o cpi cpi.c :
>
> mpirun_rsh -rsh -np 1 c5ws1.chem.ac.ru ./cpi
> [0] Abort: Cannot allocate PD (Invalid Virtual Address) at line 745 in 
> file viainit.c
> mpirun: executable version 0 does not match our version 3.
> done.

Thanks for trying out our latest version on your cluster. Clearly, the 
problem in both versions stems from the inability of the VAPI 
(underlying IB layer) to create a protection domain. The error indicates 
that when MVAPICH calls the function VAPI_alloc_pd(), the function 
doesn't return success.

My hunch is that your IB installation is not proper. In particular, the 
kernel modules which support IB, might not be working well with your 
kernel. Could you please verify from Mellanox that the kernel you are 
using is infact supported by IBGD? Pasha, any thoughts?

You may also try to run some benchmarks which use VAPI only, like 
`perf_main' to check if they have the same error too.

>
> The only plus of 0.9.8 in this sense is that it install w/right pathes
> in mpif77/mpif90/mpicc etc.
>
> BTW, what means here message about "mismatch" of executable version ?

The "mismatch" message can be ignored. Basically, the process fails 
allocating the protection domain, however the main `mpirun_rsh' process 
is still waiting for the child process to send its launcher version 
number. Due to the child process' untimely demise, it interprets the 
version number as something invalid and prints this message. If you use 
the `mpd' based launcher (Section 5.3 of our user guide), you will not 
get this message, as that launcher is more graceful with exiting processes.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list