[mvapich-discuss] Running code hanged using MVAPICH

Soon-Heum Ko floydfan at innogrid.com
Wed Oct 22 00:54:05 EDT 2008


Hi,


I'm working at KISTI Supercomputing Center, Korea, as a member of SUN support team.

I have some trouble using MVAPICH. So, I'd like to ask you the option for installing MVAPICH.

Recently, we made a supercomputer which consists of Sun blade 6048 nodes (4 quadcore Barcelona CPUs at each node) and Voltaire ISR 2012 switch for infiniband network. Admins installed MVAPICH in this system and I tested the operation of MVAPICH library.

While running various codes, I found that some codes don't work with MVAPICH though they don't make any trouble with other MPI libraries. Particularly, these codes work well with MVAPICH when I use several processors in the same node; but they don't work if more than 2 nodes cooperate. In detail, code is hung at the first MPI communication command. (i.e., when I use 16 processors in the same node, it works well... If I use 16 processors in 2 nodes, it pends in the MPI communication routine...)

Mysterious thing is that, in some simple codes, MPI communication routines make no trouble in inter-node communication. On the other hand, some complex codes that spend vast size memory show communication tourble even when they transfer only 4 bytes data (one integer).

Just in my imagination, this happened by one of these reasons:
 - MVAPICH trouble : which is fixed in latest version. (Note that we currently use MVAPICH Version 1.0.1)
 - MVAPICH trouble : which has not been reported or not been fixed.
 - Installing option : our installation option is as follows.
    ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/usr/local/mvapich \
    --enable-shared --enable-static --enable-debug --enable-sharedlib \
    --enable-cxx --enable-f77 --enable-f90 --enable-f90modules \
    --with-romio --without-mpe
 - Bugs on users' codes : which cannot happen.

Do you have any idea or comments? If you know the reason, please let me know.

Thank you in advance.


Best regards,
Jeff



Soon-Heum Ko,
Ph.D, Computational Fluid Dynamics
Parallel Optimization Analyst, 
SUN Support Team (InnoGrid) at KISTI
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081022/d39affca/attachment.html


More information about the mvapich-discuss mailing list