[mvapich-discuss] MV2_USE_LAZY_MEM_UNREGISTER and memory usage?

Xie Min xmxmxie at gmail.com
Sat Mar 28 11:25:17 EDT 2009


2009/3/28 Matthew Koop <koop at cse.ohio-state.edu>:
>
>> One of our parallel applications met some problems, it often segment
>> fault. But when we set the
>> MV2_ON_DEMAND_THRESHOLD to the value equals to the tasks number, it
>> can run successfully.
>>
>> So we set only one envrionment variable MV2_ON_DEMAND_THRESHOLD=1024,
>> and run HPL tests again
>> for about 10 times (128 * 8, 1024 tasks, "RES" of each task is about
>> 1.4GB), each HPL test will run for about 12 minutes.
>> All tests runs normally, the "port_rcv_packets" counter of IB card
>> increase continually in the test running time.
>>
>> If on demand connection is used, seems there are some pthread lock
>> operations in vbuf.c, I am not sure if this will have
>> some relationship with the deadlock.
>
> I'm glad that things seem to be working now in terms of the memory usage.
> How large are these application runs that normally segfault (numbers of
> processes)?
>
One of our applications will segment fault in the scale of 128 tasks,
after setting MV2_ON_DEMAND_THRESHOLD=128, it runs ok.

> Can you try the following ENVs?
>
> MV2_CM_RECV_BUFFERS=8192
> MV2_CM_TIMEOUT=250
>
We will try to find some time and node resources to test these ENVS.

Thanks.

> We're also taking a further look within the code at possible issues.
>
> Thanks,
>
> Matt
>


More information about the mvapich-discuss mailing list