[mvapich-discuss] Bus error from mpirun_rsh

Dhabaleswar Panda panda at cse.ohio-state.edu
Sun Nov 9 13:26:52 EST 2008


> Following is a minimal trace from a core file dropped by mpirun_rsh
> while attempting to run an application on 120 nodes of an SGI ICE
> system.  The application and MVAPICH were both compiled with the
> Intel-10.1.015 compiler suite.  Does this look familiar to anyone?

We have not seen this error. Will it be possible for you to send us
backtrace of this error?

Do you see this error with the latest mvapich trunk version also? A couple
of fixes have gone into the codebase since RC1 was released. We are
currently testing the trunk version for the final release. You can get the
nightly tarball of the latest trunk version from mvapich download page.

Thanks,

DK

> Some system config
> Linux p4fe1 2.6.16.60-0.27schamp-nasa #1 SMP Sat Sep 13 20:37:07 UTC
> 2008 x86_64 x86_64 x86_64 GNU/Linux
>
> Command line
> mpirun_rsh -ssh -np 480 -hostfile machinefile VIADEV_USE_SHMEM_COLL=0
> VIADEV_CLUSTER_SIZE=MEDIUM ./Application.x
>
> which mpirun_rsh
> /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh
>
> p4fe1.dkokron 269> gdb -c core.10742
> /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh
> GNU gdb 6.6
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux"...
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Reading symbols from /nasa/intel/cce/10.1.015/lib/libimf.so...done.
> Loaded symbols for /nasa/intel/cce/10.1.015/lib/libimf.so
> Reading symbols from /lib64/libm.so.6...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libgcc_s.so.1...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Core was generated by `mpirun_rsh -ssh -np 480 -hostfile machinefile
> VIADEV_USE_SHMEM_COLL=0 VIADEV_CL'.
> Program terminated with signal 7, Bus error.
> #0  0x0000000000402b15 in main ()
>
> --
> Dan Kokron
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list