[mvapich-discuss] (no subject)

Adam M. Smith amsmith at lanl.gov
Mon Jul 3 19:52:37 EDT 2006


Hi All,

I'm trying to get ParaView working on a linux cluster.  I have confirmed
that, for any session with np>1 all procs can't get past MPI_Init (and
ParaView seems to be passing the right args). [backtrace at end of
message]

In the MVAPICH User Guide, in the section about applications not passing
MPI_Init, I have covered some of the basic tests.  That is...

- I'm using ssh with keys and have tested connecting from each node to
every other without entering a password
- I have confirmed that the hostnames supplied to mpirun_rsh match those
used in /etc/hosts on all machines
- I can run ibv_rc_pingpong and similar tests without any problem at all,
naming the hosts as I do trying to run ParaView with mpirun_rsh (whee! cpi
is fast)

(one of the tests mentioned for VAPI, perf_main I can't find)

What else can I try?  What does this mean?

Importantly
- other programs we built here use mvapich over our network fine (though
I'm not extremely familiar with thier configuration)
- the other tools using mvapich seem to use the same shared libraries as
the one I built




(gdb) r
Starting program: /usr/local2/users/pugmire/paraview/bin/paraview
[Thread debugging using libthread_db enabled]
[New Thread 46912528646976 (LWP 10581)]

Program received signal SIGTERM, Terminated. <-- I did this!
[Switching to Thread 46912528646976 (LWP 10581)]
0x00002aaaab6b48eb in __lll_mutex_lock_wait () from
/lib64/tls/libpthread.so.0
(gdb) bt
#0  0x00002aaaab6b48eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x00007ffffff07b08 in ?? ()
#2  0x0000000000001386 in ?? ()
#3  0x00002aaaab6b1877 in pthread_mutex_lock ()
   from /lib64/tls/libpthread.so.0
#4  0x00002aaaac9b4d1e in __mthca_reg_mr (pd=0x0, addr=0x1, length=64669200,
    hca_va=64669176, access=64673272) at verbs.c:127
#5  0x00002aaaac9b2631 in mthca_free_av (ah=0x3dab060) at ah.c:172
#6  0x00002aaaac9b4c09 in mthca_destroy_ah (ah=0x3dab140) at mthca.h:242
#7  0x0000000001ae97aa in viadev_post_recv ()
#8  0x0000000001aeed39 in init_mpi_connections ()
#9  0x0000000001aee007 in MPID_VIA_Init ()
#10 0x0000000001ae667b in MPID_Init ()
#11 0x0000000001ad2216 in MPIR_Init ()
#12 0x0000000001ad2119 in PMPI_Init ()
#13 0x0000000001397e30 in vtkPVMain::Initialize ()
#14 0x0000000000b836ba in MyMain ()
#15 0x0000000000b83797 in main ()
(gdb)


More information about the mvapich-discuss mailing list