[mvapich-discuss] Node crashes when all memory is used

Jimmy Tang jcftang at gmail.com
Mon Jun 19 07:37:14 EDT 2006


Hi Christopher,

On 6/19/06, Christopher Rowley <crowl055 at uottawa.ca> wrote:
> I'm running a cluster of Opterons with Fedora Core 5. We have topspin HCA's
> and Topspin 120 switches. We're using MVAPICH.gen2 to run a computational
> chemistry program called VASP. The memory requiresments are extremely high
> (60 GB), and occasionally exceed what is available on the nodes were running
> on. When this happens, the program is killed, but in the process, the first
> node on the list of hosts will crash (it remains pingable, but with no
> connectivity or keyboard response). We don't see this behavior with vanilla
> MPICH 1.2.7. Is there a known issue with exceeding the total available
> memory with MVAPICH?

Out of curiousity, which compiler are using? we had some similar
problems with a lattice qcd code (though it doesnt use as much memory
as vasp would in most cases), where if we turned off the
"LAZY_MEMORY_DEREGISTER" option in MVAPICH or if we turned of -O2 or
higher optimisations in our compiler (pathscale), everything seemed to
work okay again.

I dont know if that will help, since the symptoms that we saw are
similar to what you are seeing, it might help.

Jimmy.

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin.
http://www.tchpc.tcd.ie/


More information about the mvapich-discuss mailing list