[mvapich-discuss] program hanged using mvapich with large number of processes

Weimin Wang wmwang at gmail.com
Thu Jan 28 10:21:05 EST 2010


That is great!

After copying those file (lib and head files) to public directory and
recompile mvapich2, I could start the program on any node.

Thanks you.

Yours,
Weimin

On Thu, Jan 28, 2010 at 9:25 PM, Jonathan Perkins
<perkinjo at cse.ohio-state.edu> wrote:
> On Thu, Jan 28, 2010 at 07:11:27PM +0800, Weimin Wang wrote:
>> On Sat, Jan 23, 2010 at 10:18 PM, Dhabaleswar Panda wrote:
>> > You are using the uDAPL interface of MVAPICH2 stack. All our designs and
>> > developments with latest features are taking place on the
>> > most-commonly-used OpenFabrics-Gen2 (IB/iWARP) interface. You should start
>> > using this interface to get the best performance and scalability on your
>> > cluster. You can use this interface and let us know whether you see the
>> > problem or not.
>>
>> Hello, Dhabaleswar,
>> I have solved the problem following your advice. I recompiled mvapich2 with
>> gen2 option and could start 80 processes now. Thank you very much!
>>
>> However, I got another questions. When I run the job in node73 which is a
>> node for user log-in, everything is fine. However, I got an error for other
>> nodes:
>>
>> wmwang at node2:~/test> /data02/home/wmwang/test/mvapich2/bin/mpicc -o cpi
>> cpi.c
>> /usr/bin/ld: cannot find -lrdmacm
>>
>> It may be due to that the librdmacm is not in publich directory for all
>> nodes. I have installed this libradmacm in the publich directory. Here is my
>> question: how could I add library directories for mvapich2?
>
> If librdmacm is installed and available at the same place on each of the
> nodes as the one you installed mvapich2 on then this problem should not
> happen.  You may want to check that librdmacm.so* files exist on the
> other machines.
>
> To answer your question, you can use `--with-ib-include' and
> `--with-ib-libpath' options at configure time if you need to use
> infiniband libraries that are not found in the default system locations.
>
> Example:
>    ./configure --with-ib-include=/opt/ofed/include --with-ib-libpath=/opt/ofed/lib64
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>



More information about the mvapich-discuss mailing list