[mvapich-discuss] mvapich-0.9.8 Bus error

Sayantan Sur surs at cse.ohio-state.edu
Tue Jan 30 11:43:31 EST 2007


Hi Rene,

> I would really like to get mvapich working on my cluster.  Is there any way
> that I can get mvapich to generate a more verbose output other than a "bus
> error" message? I would like to get some type of debug info or message that
> that might give me a better hint as to what might be the problem.

Sorry to know that the problems persist. Let's try to get some debug
info from your run. Here are some things which we could try:

1) Compile MVAPICH with -g flag to enable debug. You can add this flag
in the CFLAGS variable in make.mvapich.gen2. Then, you can enable core
dumps. If you are using bash, you need to set "ulimit -c unlimited" to
allow core dumps. You should add that to your .bashrc, set that in the
current environment and then use mpirun_rsh. Upon generation of the core
dump, could you send us the back-trace?

$ #edit ~/.bashrc to include "ulimit -c unlimited"
$ . ~/.bashrc
$ mpirun_rsh -np 4 n1 n2 n1 n2 ./a.out
$ gdb -c core.XXX ./a.out
  bt

This will tell us where exactly the "bus error" happens.

2) Alternatively, could you run cpi on 4 different nodes? Your earlier
example indicated processes only on one node. This will help us narrow
down the problem further.

Thanks,
Sayantan.

> 
> Thanks for any help on this.
> Rene
> 
> 
> 
> On 1/19/07 11:14 AM, "Sayantan Sur" <surs at cse.ohio-state.edu> wrote:
> 
> > Hello Rene,
> > 
> >>> 2. What is the application you are running on the 8 nodes? Can you
> >>> verify if cpi runs fine on 8 nodes?
> >> 
> >> Some of the nodes I have are 2 CPU dual core nodes.  All I am trying to run
> >> is the cpi.c code that is in the mvapich examples directory.
> >> 
> >> If I log into one of the 2CPU dual core nodes and I try to run the cpi.c
> >> code here is what I get.
> >  ...
> >> mpi/mvapich> mpirun_rsh -np 4 compute-01-02-ib compute-01-02-ib
> >> compute-01-02-ib compute-01-02-ib ./a.out
> >> Bus error
> >> Bus error
> >> Bus error
> >> Bus error
> > 
> > Thanks for your response. It seems to me that this is a system related
> > issue, since even the basic `cpi' test is not able to run on 4
> > processes. I have a feeling that your system vendor would be able to
> > resolve this the fastest.
> > 
> > Thanks,
> > Sayantan.
> 
> 

-- 
http://www.cse.ohio-state.edu/~surs


More information about the mvapich-discuss mailing list