[mvapich-discuss] problem w/MVAPICH in the frames of Gen1
Sayantan Sur
surs at cse.ohio-state.edu
Fri Aug 4 10:20:56 EDT 2006
Mikail,
Mikhail Kuzminsky wrote:
> In message from Dhabaleswar Panda <panda at cse.ohio-state.edu> (Thu, 3
> Aug 2006 12:32:45 -0400 (EDT)):
>
>> Mikhail - Thanks for your note. Since you are trying MVAPICH with
>> IBGD-1.8.0, let me suggest that you
>> contact Mellanox people regarding this problem.
>> You are also using a very old version of MVAPICH (0.9.5).
>
>
> There is no difference in the case of using of your last MVAPICH-0.9.8:
> for example, after mpicc -noshlib -o cpi cpi.c :
>
> mpirun_rsh -rsh -np 1 c5ws1.chem.ac.ru ./cpi
> [0] Abort: Cannot allocate PD (Invalid Virtual Address) at line 745 in
> file viainit.c
> mpirun: executable version 0 does not match our version 3.
> done.
Thanks for trying out our latest version on your cluster. Clearly, the
problem in both versions stems from the inability of the VAPI
(underlying IB layer) to create a protection domain. The error indicates
that when MVAPICH calls the function VAPI_alloc_pd(), the function
doesn't return success.
My hunch is that your IB installation is not proper. In particular, the
kernel modules which support IB, might not be working well with your
kernel. Could you please verify from Mellanox that the kernel you are
using is infact supported by IBGD? Pasha, any thoughts?
You may also try to run some benchmarks which use VAPI only, like
`perf_main' to check if they have the same error too.
>
> The only plus of 0.9.8 in this sense is that it install w/right pathes
> in mpif77/mpif90/mpicc etc.
>
> BTW, what means here message about "mismatch" of executable version ?
The "mismatch" message can be ignored. Basically, the process fails
allocating the protection domain, however the main `mpirun_rsh' process
is still waiting for the child process to send its launcher version
number. Due to the child process' untimely demise, it interprets the
version number as something invalid and prints this message. If you use
the `mpd' based launcher (Section 5.3 of our user guide), you will not
get this message, as that launcher is more graceful with exiting processes.
Thanks,
Sayantan.
--
http://www.cse.ohio-state.edu/~surs
More information about the mvapich-discuss
mailing list