[mvapich-discuss] my application hangs up depending on node number

Abhinav Vishnu vishnu at cse.ohio-state.edu
Thu Feb 23 12:42:13 EST 2006


Michael,

Thanks for using MVAPICH and reporting the problem.

> I have an application to run on a 8-node cluster.
> I have a very strange problem as follows:
> if I specify node number as 4, 8, the application
> hangs up at the beginning; if I specify node number as 2,3,
> 5,6,7, the application runs well until end.

Are you able to run perf_main between node 4 and node 8?
In addition, we would like to know the point at which application hangs,
does it cross the MPI_Init.

We would like to get more details about the error state at which
application hangs. In the meantime, may i request you to upgrade to IBGD
stack 1.8.1/1.8.2 and the MVAPICH to mvapich-0.9.6-122 from nowlab
download page.

Please keep us updated of your findings.

Thanks,

-- Abhinav

 >
> Can anyone point me a direction how to solve this problem ?
>
> I am using mvapich-0.9.6-121/Mellanox IB Gold Distribution (IBGD) v1.7.0.
> mli at sftc001:/home/mli> uname -a
> Linux sftc001 2.6.10-suse92-i4smp #62 SMP Thu Mar 31 12:03:47 EST 2005
> i686 i686 i386 GNU/Linux
> mli at sftc001:/home/mli> cat /etc/issue
>
> Welcome to SuSE Linux 9.2 (i586) - Kernel \r (\l).
>
>
>
> Here is how do I start my application:
>
> mli at sftc001:/home/mli/PROBLEM/tmp1>
> /home/deform/3d/v60/image/mvapich/bin/mpirun_rsh -rsh -hostfile
> /usr/rels/mvapich/share/machines/machines.LINUX -np 4
> /home/deform/3d/v60/image/EXE/DEF_SIM_P4P_INFINIBAND.EXE
>
> I've ps/grep-ed my application :
>
> node#   process#
> 2        7
> 3        8
> 4        9
> 5       10
> 6       11
> 7       12
> 8       13
>
> The attached file t.txt has more detailed output of ps/grep command.
>
> Best regards.
> Michael Li
>
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
> This email message and any attachments are for the sole use of the
> intended recipients and may contain proprietary and/or confidential
> information which may be privileged or otherwise protected from
> disclosure. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipients, please contact the
> sender by reply email and destroy the original message and any copies of
> the message as well as any attachments to the original message.
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
>



More information about the mvapich-discuss mailing list