[mvapich-discuss] mvapich2 2.3a on ppc64le

Jamil Appa jamil.appa at zenotech.com
Fri Jun 30 04:04:14 EDT 2017


Hi Sourav

   My application is a hybrid mpi/openmp application hence the numa binding
strategy. I will follow you advice as a workaround for the 8 threads per
core case.

    When I run at 4 threads per core I don't see any errors but can I
assume that the binding is failing silently in this case and I should
always use the explicit mapping?

  Also, I have just rerun the osu_scatter case and I don't see the ptmalloc
error as suggested by Hari but I do get a warning

 mpiexec -env MV2_NUM_HCAS 1 -env MV2_NUM_PORTS 1 -n 2 osu_scatter
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such
file or directory
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such
file or directory

 numactl -H
available: 2 nodes (0,8)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 262144 MB
node 0 free: 254348 MB
node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127
node 8 size: 262144 MB
node 8 free: 258470 MB
node distances:
node   0   8
  0:  10  40
  8:  40  10

Note the numa nodes are numbered 0 and 8.

I will rerun my application with the explicit mapping and let you know if I
see the ptmalloc warning.

 Thanks

 Jamil


On Thu, 29 Jun 2017 at 21:25 Sourav Chakraborty <
chakraborty.52 at buckeyemail.osu.edu> wrote:

> Hi Jamil,
>
> The combination MV2_CPU_BINDING_LEVEL=numanode
> MV2_CPU_BINDING_POLICY=scatter is not currently supported.
>
> If you want to bind the processes to individual cores, you can use MV2_CPU_BINDING_LEVEL=core
> MV2_CPU_BINDING_POLICY=scatter
>
> If you want to bind the processes to numanodes, you can specify it in the
> following fashion: MV2_CPU_MAPPING=0-79:80-159
> MV2_CPU_BINDING_POLICY=scatter
> This setting will work for 2 processes. For more processes, you can modify
> the mapping according to the userguide:
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3a-userguide.html#x1-620006.5.2
>
> Thanks,
> Sourav
>
>
>
> On Thu, Jun 29, 2017 at 8:07 AM, Hari Subramoni <subramoni.1 at osu.edu>
> wrote:
>
>> Thanks for getting back quickly. It is surprising that you are facing the
>> registration cache issue with osu microbenchmarks.
>>
>> We will take a look at it and get back to you shortly.
>>
>> Thx,
>> Hari.
>>
>> On Thu, Jun 29, 2017 at 8:03 AM, Jamil Appa <jamil.appa at zenotech.com>
>> wrote:
>>
>>> Hi Hari
>>>
>>>  Thanks for the quick reply. I get this error running any of the osu
>>> benchmarks that are distributed with mvapich2 as well as my own application.
>>>
>>>   The applications work if I have 4 threads per core so a total 80
>>> threads per node.  It looks like there is a limit to the maximum number of
>>> threads fixed by the size of the mask.
>>>
>>>  Let me know if you want me to run with different switches to generate
>>> more output.
>>>
>>>  Jamil
>>>
>>>
>>> On Thu, 29 Jun 2017 at 12:49 Hari Subramoni <subramoni.1 at osu.edu> wrote:
>>>
>>>> Hello,
>>>>
>>>> Sorry to hear that you are facing issues. These are two separate issues
>>>> actually. Could you please let us know what program you are running? That
>>>> will help us narrow the issue down further.
>>>>
>>>> Thx,
>>>> Hari.
>>>>
>>>>
>>>> On Jun 29, 2017 6:49 AM, "Jamil Appa" <jamil.appa at zenotech.com> wrote:
>>>>
>>>> Hi
>>>>
>>>>     I am trying to use mvapich2 2.3a on a 2 node ppc64le system with 8
>>>> threads per core (total threads per node of 160)
>>>>
>>>>     There appears to be a bug in ptmalloc that prevents correct startup
>>>> related to setting the affinity.
>>>>
>>>>    WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>>>> without InfiniBand registration cache support.
>>>> Warning! : Core id -1 does not exist on this architecture!
>>>> CPU Affinity is undefined
>>>> Error parsing CPU mapping string
>>>> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range)
>>>> in MPIDI_CH3I_set_affinity:2673
>>>> Warning! : Core id -1 does not exist on this architecture!
>>>> CPU Affinity is undefined
>>>> Error parsing CPU mapping string
>>>> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range)
>>>> in MPIDI_CH3I_set_affinity:2673
>>>> [cli_2]: aborting job:
>>>> Fatal error in PMPI_Init_thread:
>>>> Other MPI error, error stack:
>>>> MPIR_Init_thread(490):
>>>> MPID_Init(386).......:
>>>>
>>>> [cli_0]: aborting job:
>>>> Fatal error in PMPI_Init_thread:
>>>> Other MPI error, error stack:
>>>> MPIR_Init_thread(490):
>>>> MPID_Init(386).......:
>>>>
>>>>  mpiexec -env MV2_NUM_HCAS 1 -env MV2_NUM_PORTS 1 -env
>>>> MV2_USE_THREAD_WARNING 0 -env MV2_SHOW_HCA_BINDING 0 -env
>>>> MV2_CPU_BINDING_LEVEL numanode -env MV2_CPU_BINDING_POLICY scatter
>>>>
>>>>   cat /etc/redhat-release
>>>> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>>>>
>>>>  uname -a
>>>>
>>>>  Linux nux gpu02.cluster 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29
>>>> 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
>>>>
>>>>
>>>>
>>>>
>>>> *Jamil Appa* | Co-Founder and Director | Zenotech
>>>> [image: Papercut]
>>>> [image: Tel:] +44 (0)7747 606 788 <+44%207747%20606788> [image:
>>>> Zenotech LTD - Simulation Unlimited] <http://www.zenotech.com/>
>>>> [image: Email:] jamil.appa at zenotech.com
>>>> [image: Web:] www.zenotech.com
>>>> [image: Papercut]
>>>> [image: linkedin:] <http://uk.linkedin.com/pub/jamil-appa/1/165/120>[image:
>>>> Twitter:] <https://twitter.com/zenotech>[image: Location:]
>>>> <https://www.google.co.uk/maps/place/Bristol+%26+Bath+Science+Park/@51.500921,-2.478567,17z/data=!3m1!4b1!4m2!3m1!1s0x48719ab86a5a9f7d:0xd17394f3400abb0a>
>>>>
>>>> Company Registration No : 07926926 | VAT No : 128198591
>>>>
>>>> Registered Office : 1 Larkfield Grove, Chepstow, Monmouthshire, NP16
>>>> 5UF, UK
>>>>
>>>> Address : Bristol & Bath Science Park, Dirac Cres, Emersons Green,
>>>> Bristol BS16 7FR
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170630/61d2ce00/attachment-0001.html>


More information about the mvapich-discuss mailing list