[mvapich-discuss] Crashes with mpirun_rsh in MVAPICH2 1.5

Thompson, Matthew A. (GSFC-610.1)[SCIENCE APPLICATIONS INTL CORP] matthew.thompson at nasa.gov
Tue Aug 10 11:50:14 EDT 2010


Thompson, Matthew A. (GSFC-610.1)[SCIENCE APPLICATIONS INTL CORP] wrote:
> In order to better compare a test box I build on with a larger cluster
> that's using MVAPICH2 and mpirun_rsh, I recently downloaded version 1.5
> and tried to build it on the test node. This is a single, dual Nehalem
> node with no specialized networking hardware running RHEL 5.5.

Following a suggestion from a colleague more knowledgeable than I, I
built MVAPICH2 with all the debugging I could glean from ./configure
--help. The configure was:

./configure --with-device=ch3:sock --prefix=$HOME/mvapich2-debug
--enable-error-checking=all --enable-error-messages=all --enable-g=all
--disable-fast CC=pgcc FC=pgfortran F77=pgfortran CXX=pgcpp > &
configure.log

After a successful make I now get the following error from mpirun_rsh:

> ~/mvapich2-debug/bin/mpirun_rsh -np 2 -hostfile host_file_name ./hellow
Assertion failed in file mpidi_pg.c at line 293: ((pg)->ref_count) == 0
internal ABORT - process 0
[cli_0]: aborting job:
internal ABORT - process 0
Assertion failed in file mpidi_pg.c at line 293: ((pg)->ref_count) == 0
internal ABORT - process 0
[cli_1]: aborting job:
internal ABORT - process 0
MPI process (rank: 0) terminated unexpectedly on janus
Exit code -5 signaled from janus

Hopefully this might help you figure out the issues I'm having.

Matt
-- 
Matthew Thompson, SAIC, Sr Scientific Software Engr
NASA GSFC,  Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
Phone: 301-614-6712               Fax: 301-614-6246


More information about the mvapich-discuss mailing list