[mvapich-discuss] error associated with MPI_BCAST and MPI_SEND?

amith rajith mamidala mamidala at cse.ohio-state.edu
Fri Mar 21 12:18:21 EDT 2008


Hi Raymond,

I think from your description that some of the nodes might not have been
set up properly for the maximum locked memory. You can do this by
following (Section 7.2.3 of mvapich user guude):

Edit the file /etc/security/limits.conf and enter
the following line:

* soft memlock phys mem size in KB

Where, phys mem size in KB is the MemTotal value reported by
/proc/meminfo.

In addition, you need to enter the following line in /etc/init.d/sshd and
then restart sshd.
ulimit -l phys mem size in KB

Thanks,
Amith

 On Fri, 21 Mar 2008, Raymond Richardson wrote:

> Hi all,
>
> I'm hoping one of you developers out there may have seen some behavior of
> the sort that's plaguing me at present. I'm running a large, complex
> numerical weather prediction model that is written in fortran and uses mpi
> for parallelization.  I'm running on an opteron cluster running linux and
> using infiniband.  I'm using mvapich 0.9.9 and the pathscale 3.1 fortran
> compiler.
>
> What I'm seeing is intermittent crashes with the following error:
>
> [cm.c: line 142]Couldn't create RC QP
>
> Compiling my code with things like trapuv and checkbounds doesn't reveal any
> problems.  Doing things like changing the order of nodes in my node list, or
> adding print statements, will change when and where these crashes happen.
> They seem to be associated with MPI_BCAST and MPI_SEND commands.  I have an
> older cluster using mpich and myrinet and the code runs fine there.  Does
> this mean anything to anyone out there?
>
> Thanks a lot,
>
> Ray Richardson
>



More information about the mvapich-discuss mailing list