[mvapich-discuss] mvapich can't run cross nodes when defining _SMP_

Terry terry.ccchang at gmail.com
Fri Jul 20 13:56:45 EDT 2007


Dear Abhinav,

Yes, this is VERY strange.

My IB cards are the SilverStorm HCAs (PCI Express, 4X SDR):
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.8.200
Hardware version: a0

My IB switch is the InfiniCon Systems (SilverStorm) InfinIO3032 Switch.
I forget its firmware version (3.10??),
but I am sure I have updated its firmware to latest available version.

As to /var/log/messages and dmesg, I do not see anything strange.
I use the command
"mpirun_rsh -np 2 -hostfile ./hosts ./myprog"
to run the program.
When the program hang in MPI_Init(),
in master node, "top" shows the program uses 100% CPU resource,
in slave node, "ps" shows the program at sleep state.

By the way, when I compile the trunk version of 0.9.9,
it fails in mpid/ch_gen2_multirail/viainit.c.
If I add the following line in viainit.c, it can pass the compilation.
#include <malloc.h>
But this problem disappears in the release version of 0.9.9.

Terry

----- Original Message ----- 
From: "Abhinav Vishnu" <vishnu at cse.ohio-state.edu>
To: "Terry" <terry.ccchang at gmail.com>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Saturday, July 21, 2007 1:07 AM
Subject: Re: [mvapich-discuss] mvapich can't run cross nodes when defining 
_SMP_


> Hi Terry,
>> Dear Abhinav,
>>
>> In fact, I have tried the version 0.9.7, 0.9.8, 0.9.9 (release, branches,
>> and trunk),
>> and the script "make.mvapich.gen2", "make.mvapich.gen2_multirail".
>> I still face the same problem which the program cannot run across nodes.
>> When I undefine those two SMP options, every thing will work fine.
>> I don't know what happened.
>>
>
> This is quite strange. All these MVAPICH versions have gone through
> rigorous testing with a variety of MPI benchmarks and we did not see this 
> problem.
>>
>> By the way, my main hardware configurations  are
>> Intel Xeon E5335 x 2 x 2 nodes
>> Fully-Buffered DDR2 667 2GB x 8 x 2 nodes.
>> Could you give me more suggestions?
>>
> Thanks for providing information of your platform. Can you also provide
> the following information:
>
> 1. The InfiniBand card you are using (Mellanox/Pathscale/?...) and the 
> firmware
> version you are using on the card. Also please let us know the switch 
> version.
>
> 2. Any other information through /var/log/messages and dmesg?
>
> Thanks much,
>
> :- Abhinav
>
>> Terry
>>
>> ----- Original Message ----- From: "Abhinav Vishnu" 
>> <vishnu at cse.ohio-state.edu>
>> To: "Terry" <terry.ccchang at gmail.com>
>> Cc: <mvapich-discuss at cse.ohio-state.edu>
>> Sent: Friday, July 20, 2007 8:52 PM
>> Subject: Re: [mvapich-discuss] mvapich can't run cross nodes when 
>> defining _SMP_
>>
>>
>>> Hi Terry,
>>>
>>> Thanks for trying MVAPICH and reporting the problem.
>>> We have tried this combination with MVAPICH 0.9.9 using Intel
>>> MPI Benchmark and did not see this problem. Can you let us know
>>> the MPI benchmark which you are using for trying out MVAPICH?
>>>
>>> There were some changes made to the multi-rail script, since MVAPICH 
>>> 0.9.9
>>> and these have been checked into the trunk and later version 
>>> (0.9.9+psm). Can you try
>>> this version and let us know if you still face the same problem?
>>>
>>> Thanks,
>>>
>>> :- Abhinav
>>>
>>>> My environment:
>>>> CentOS 4.5
>>>> Intel Compiler 10.0.023
>>>> OFED 1.2
>>>> MVAPICH 0.9.9
>>>>  I build mvapich by using "make.mvapich.gen2_multirail",
>>>> and modify the following lines:
>>>> IBHOME=${IBHOME:-/usr/local/ofed}
>>>> IBHOME_LIB=${IBHOME_LIB:-/usr/local/ofed/lib64}
>>>> PREFIX=${PREFIX:-/usr/local/mvapich-0.9.9}
>>>> export CC=${CC:-icc}
>>>> export CXX=${CXX:-icpc}
>>>> export F77=${F77:-ifort}
>>>> If the CFLAGS contains "-D_SMP_ -D_SMP_RNDV_",
>>>> mvapich can't run cross nodes (ex: n1 n2).
>>>> In n1, mvapich will always be running.
>>>> But in n2, mvapich is always in sleep state.
>>>> However, it can run in local node (ex: n1 n1 or n2 n2) and execute 
>>>> successfully.
>>>>  If I undefine "-D_SMP_ -D_SMP_RNDV_",
>>>> mvapich can run cross all nodes and execute successfully.
>>>> Who can tell me why I can't use SMP options??
>>>> Please help me to solve this problem. Thx.
>>>>  ------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>
> 



More information about the mvapich-discuss mailing list