[mvapich-discuss] mvapich2-2.0.1 performs very poorly on old mthca0 nodes when involving multiple CPUs per node.

khaled hamidouche hamidouc at cse.ohio-state.edu
Tue Mar 31 20:02:45 EDT 2015


Hi Limin,

Good to know that your issue is fixed with this parameter. 2.0.1 introduces
a feature to automatically bind processes using information from HWLOC.
However it seems that in your system, HWLOC is not able to find the
information correctly which is probably due to the fact that the HCA is too
old.

BTW can you please let us know your hardware configuration:
1) how many CPU sockets in your system ?
2) and how many HCA you have and what is their configuration regarding the
sockets slots?

Thanks

On Tue, Mar 31, 2015 at 5:52 PM, Limin Gu <lgu at penguincomputing.com> wrote:

> Hi Khaled,
>
> Yes, MV2_CPU_MAPPING helped! It performs OK now.
>
> Will this be fixed in later release, or I have to use explicit binding on
> that system from now on?
>
> Thank you so much!
>
> Limin
>
> On Tue, Mar 31, 2015 at 5:08 PM, khaled hamidouche <
> hamidouc at cse.ohio-state.edu> wrote:
>
>> Hi Limin,
>>
>> With 2.0.1 can you please try with explicit binding of the processes
>> using MV2_CPU_MAPPING=0,1,2 ....
>>
>> Please refer to this section
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc2-userguide.html#x1-17000011.13
>>
>>
>> Please let us know if this is helping.
>>
>> Thanks
>>
>> On Tue, Mar 31, 2015 at 4:05 PM, Limin Gu <lgu at penguincomputing.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have encountered a problem with mvapich2-2.0.1.
>>>
>>> I use osu benchmark for performance test on the nodes which have old
>>> mellanox cards with mthca0. Mvapich2-2.0.0 performs fine on those nodes,
>>> but mvapich2-2.0.1 is compiled the same way, but performs horribly, it
>>> takes 1000 times longer than mvapich2-2.0.0. The problem happens when I use
>>> more than one CPU per node.
>>>
>>> For example: "mpirun_rsh -np 2 n0 n1 ./osu_alltoall" runs OK for both
>>> mvapich2-2.0.0 and mvapich2-2.0.1
>>> but "mpirun_rsh -np 4 n0 n0 n1 n1 ./osu_alltoall" runs OK for
>>> mvapich2-2.0.0, but horrible for mvapich2-2.0.1
>>>
>>> I also tried on nodes with newer mellanox card with mlx4_0,
>>> mvapich2-2.0.1 performs OK with multiple CPUs per node.
>>>
>>> Does anyone else have the same problem? Is this problem related to
>>> hardware?
>>>
>>> Thanks!
>>>
>>> Limin
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150331/60758e17/attachment-0001.html>


More information about the mvapich-discuss mailing list