[mvapich-discuss] (no subject)

Mingzhe Li li.2192 at osu.edu
Sat Jan 9 19:30:53 EST 2016


Hi Benedikt,

You are welcome. The window size for each process could be different. If
700mb of shared memory is all what you need per node, you could specify
window size=0 for some MPI processes. Is this possible for your application?

Thanks,
Mingzhe

On Sat, Jan 9, 2016 at 5:13 PM, Brandt, Benedikt B <benbra at gatech.edu>
wrote:

> --===============2141258889965369599==
> Content-Language: en-US
> Content-Type: multipart/alternative;
>
> boundary="_000_SN1PR0701MB1856ED6BF530B034B1297011A4F70SN1PR0701MB1856_"
>
> --_000_SN1PR0701MB1856ED6BF530B034B1297011A4F70SN1PR0701MB1856_
> Content-Type: text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> Hi Mingzhe
>
> Thanks a lot for your reply! Yes I did execute MPI_Win_allocate_shared
> on each MPI process. And the shared memory is about 700mb.
> I have attached the relevant code snipped at the
> end of this mail. From the documentation I read that
> MPI_Win_allocate_shared  is a collective call, so I do have to call it
> from every processes that is supposed to use the shared memory,
> right?
>
> I guess the question I am truly asking is: Is there a way to use MPI
> shared memory so that the shared memory is being accessible by all
> processes but only counted once (like threads in OpenMP)?
>
> Thanks a lot
>
> Benedikt
>
> =3D=3D=3D=3D=3D Code sample below =3D=3D=3D=3D=3D=3D
>
>
>     CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED&
> , 0, MPI_INFO_NULL, hostcomm,ierr)
>     CALL MPI_Comm_rank(hostcomm, hostrank,ierr)
>
>     allocate(arrayshape(4))
>     arrayshape=3D(/ nroot1,nroot1,nroot1,nroot1 /)
>     if (hostrank =3D=3D 0) then
>         windowsize =3D int(nroot1**4,MPI_ADDRESS_KIND)*&
> 8_MPI_ADDRESS_KIND ! *8 since there are 8 bytes in a double
>     else
>         windowsize =3D 0_MPI_ADDRESS_KIND
>     end if
>     disp_unit =3D 1
>
>     CALL MPI_Win_allocate_shared(windowsize, disp_unit, &
> MPI_INFO_NULL, hostcomm, baseptr, win,ierr)
>     CALL MPI_Win_allocate_shared(windowsize, disp_unit, &
> MPI_INFO_NULL, hostcomm, baseptr2, win2,ierr)
>
>     ! Obtain the location of the memory segment
>     if (hostrank /=3D 0) then
>         CALL MPI_Win_shared_query(win, 0, windowsize, disp_unit,&
>  baseptr,ierr)
>         CALL MPI_Win_shared_query(win2, 0, windowsize, disp_unit,&
>  baseptr2,ierr)
>     end if
>
>     ! baseptr can now be associated with a Fortran pointer
>     ! and thus used to access the shared data
>     CALL C_F_POINTER(baseptr, matrix_elementsy,arrayshape)
>     CALL C_F_POINTER(baseptr2, matrix_elementsz,arrayshape)
>
>
> ________________________________
> From: mingzhe0908 at gmail.com <mingzhe0908 at gmail.com> on behalf of Mingzhe
> Li=
>  <li.2192 at osu.edu>
> Sent: Saturday, January 9, 2016 11:34 AM
> To: Brandt, Benedikt B
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re:
>
> Hi Benedikt,
>
> Thanks for your note. Did you allocate around 700mb of shared memory with
> M=
> PI_Win_allocate_shared for each MPI process? If that's the case, the
> memory=
>  consumption will be the same as using malloc for each MPI process.
>
> Thanks,
> Mingzhe
>
> On Fri, Jan 8, 2016 at 11:01 AM, Brandt, Benedikt B <benbra at gatech.edu
> <mail=
> to:benbra at gatech.edu>> wrote:
> Please excuse the terrible formatting of my last mail. This was the
> first time I submitted to this list. Here is a well formatted
> version:
>
> Dear mvapich community
>
> I am currently testing the MPI-3 shared memory routines for use in our
> application. The goal is to reduce the memory footprint of our
> application per node.
>
> The code seems to work but I get the following odd behavior when I
> monitor the memory usage:
>
> TLDR: Shared memory that is "touched" (read or written) by an MPI
> process counts towards that process's real memory (RSS, RES) value. If
> every process accesses the whole shared memory (=3D data), the memory
> consumption as seen by top (or other monitoring tools) is the same as
> if every process had it's own copy of the data.
>
> If we run this job on a cluster with a job scheduler and resource
> manager our jobs will be aborted if we expect the shared memory to
> count only once. So how can we work around this problem? How could a
> resource manager (or the operating system) correctly determine memory
> consumption?
>
> =3D=3D=3D Long version: =3D=3D=3D
>
> Running our code compiled with mvapich (2.1)  and ifort (15) on one
> node, I see the following memory footprint right after starting the
> program:
>
> PID   USER      PR  NI  VIRT  RES  SHR S %CPU  %MEM   TIME+  COMMAND
> 47708 bbrandt6  20   0  746m  14m 6064 R 100.0  0.0   0:22.57 exa
> 47707 bbrandt6  20   0  746m  14m 6164 R 100.0  0.0   0:22.56 exa
> 47709 bbrandt6  20   0  746m  14m 6020 R 100.0  0.0   0:22.58 exa
> 47710 bbrandt6  20   0  746m  14m 6056 R 100.0  0.0   0:22.55 exa
> 47711 bbrandt6  20   0  746m  14m 6072 R 100.0  0.0   0:22.57 exa
>
>
> This is as expected since we allocate about 700mb of shared memory
> using MPI_Win_allocate_shared. After copying the data into the shared
> memory it looks like this
>
>
> PID   USER      PR  NI  VIRT  RES  SHR S %CPU  %MEM   TIME+  COMMAND
> 47711 bbrandt6  20   0  746m  17m 6216 R 100.0  0.0   3:01.03 exa
> 47708 bbrandt6  20   0  746m  17m 6212 R 99.6  0.0   2:40.07 exa
> 47707 bbrandt6  20   0  746m 612m 600m R 99.3  0.9   3:01.33 exa
> 47709 bbrandt6  20   0  746m  17m 6164 R 98.6  0.0   3:06.72 exa
> 47710 bbrandt6  20   0  746m  17m 6200 R 98.6  0.0   2:43.91 exa
>
> Again just as expected, one process copied the data and has now a
> memory footprint of 746m VIRT and 612m RES. Now the other processes
> start accessing the data and we get:
>
> PID   USER      PR  NI  VIRT  RES  SHR S %CPU  %MEM   TIME+  COMMAND
> 47709 bbrandt6  20   0  785m 214m 165m R 100.0  0.3   3:49.37 exa
> 47707 bbrandt6  20   0  785m 653m 602m R 100.0  1.0   3:43.93 exa
> 47708 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:23.03 exa
> 47710 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:26.86 exa
> 47711 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:44.01 exa
>
> which increases to 787m VIRT 653m RES for all processes once they have
> accessed all the data in the shared memory. So the memory footprint is
> just as large as if every process held it's own copy of the data. So
> at this point it seems like we haven't saved any memory at all. We
> might have gained speed and bandwith but using the shared memory did
> not reduce the memory footprint of our application.
>
> If we run this job on a cluster with a job scheduler and resource
> manager our jobs will be aborted if we expect the shared memory to
> count only once. So how can we work around this problem? Is the cause
> of this problem that mvapich runs different processes so shared memory
> counts fully towards each whereas openmp runs only one process but
> multiple threads so the shared memory counts only once? How could a
> resource manager (or the operating system) correctly determine memory
> consumption?
>
> =3D=3D=3D end long version =3D=3D=3D
>
> Any thoughts and any comments are truly appreciated
>
> Thanks a lot
>
> Benedikt
>
> ________________________________________
> From: Brandt, Benedikt B <benbra at gatech.edu<mailto:benbra at gatech.edu>>
> Sent: Friday, January 8, 2016 10:46 AM
> To: mvapich-discuss at cse.ohio-state.edu<mailto:
> mvapich-discuss at cse.ohio-stat=
> e.edu>
> Subject:
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D
> Content-Language: en-US
> Content-Type: multipart/alternative;
>
> boundary=3D"_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701M=
> B1856_"
>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_
> Content-Type: text/plain; charset=3D"iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> Dear mvapich community
>
>
> I am currently testing the MPI-3 shared memory routines for use in our
> appl=
> =3D
> ication. The goal is to reduce the memory footprint of our application per
> =
> =3D
> node.
>
>
> The code seems to work but I get the following odd behavior when I monitor
> =
> =3D
> the memory usage:
>
>
> TLDR: Shared memory that is "touched" (read or written) by an MPI process
> c=
> =3D
> ounts towards that process's real memory (RSS, RES) value. If every
> process=
> =3D
>  accesses the whole shared memory (=3D3D data), the memory consumption as
> s=
> ee=3D
> n by top (or other monitoring tools) is the same as if every process had
> it=
> =3D
> 's own copy of the data.
>
>
> If we run this job on a cluster with a job scheduler and resource manager
> o=
> =3D
> ur jobs will be aborted if we expect the shared memory to count only once.
> =
> =3D
> So how can we work around this problem? How could a resource manager (or
> th=
> =3D
> e operating system) correctly determine memory consumption?
>
>
> =3D3D=3D3D=3D3D Long version: =3D3D=3D3D=3D3D
>
>
> Running our code compiled with mvapich (2.1)  and ifort (15) on one node,
> I=
> =3D
>  see the following memory footprint right after starting the program:
>
>
> PID      USER         PR  NI  VIRT    RES  SHR   S %CPU %MEM    TIME+
> COMM=
> =3D
> AND
>
> 47708 bbrandt6  20   0  746m  14m 6064 R 100.0  0.0   0:22.57
> exact_ddot_en=
> =3D
> e_
> 47707 bbrandt6  20   0  746m  14m 6164 R 100.0  0.0   0:22.56
> exact_ddot_en=
> =3D
> e_
> 47709 bbrandt6  20   0  746m  14m 6020 R 100.0  0.0   0:22.58
> exact_ddot_en=
> =3D
> e_
> 47710 bbrandt6  20   0  746m  14m 6056 R 100.0  0.0   0:22.55
> exact_ddot_en=
> =3D
> e_
> 47711 bbrandt6  20   0  746m  14m 6072 R 100.0  0.0   0:22.57
> exact_ddot_en=
> =3D
> e_
>
>
> This is as expected since we allocate about 700mb of shared memory using
> MP=
> =3D
> I_Win_allocate_shared. After copying the data into the shared memory it
> loo=
> =3D
> ks like this
>
>
> PID      USER         PR  NI  VIRT    RES  SHR   S %CPU %MEM    TIME+
> COMM=
> =3D
> AND
>
> 47711 bbrandt6  20   0  746m  17m 6216 R 100.0  0.0   3:01.03
> exact_ddot_en=
> =3D
> e_
> 47708 bbrandt6  20   0  746m  17m 6212 R 99.6  0.0   2:40.07
> exact_ddot_ene=
> =3D
> _
> 47707 bbrandt6  20   0  746m 612m 600m R 99.3  0.9   3:01.33
> exact_ddot_ene=
> =3D
> _
> 47709 bbrandt6  20   0  746m  17m 6164 R 98.6  0.0   3:06.72
> exact_ddot_ene=
> =3D
> _
> 47710 bbrandt6  20   0  746m  17m 6200 R 98.6  0.0   2:43.91
> exact_ddot_ene=
> =3D
> _
>
> Again just as expected, one process copied the data and has now a memory
> fo=
> =3D
> otprint of 746m VIRT and 612m RES. Now the other processes start accessing
> =
> =3D
> the data and we get:
>
> PID      USER         PR  NI  VIRT    RES  SHR   S %CPU %MEM    TIME+
> COMM=
> =3D
> AND
> 47709 bbrandt6  20   0  785m 214m 165m R 100.0  0.3   3:49.37
> exact_ddot_en=
> =3D
> e_
> 47707 bbrandt6  20   0  785m 653m 602m R 100.0  1.0   3:43.93
> exact_ddot_en=
> =3D
> e_
> 47708 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:23.03
> exact_ddot_en=
> =3D
> e_
> 47710 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:26.86
> exact_ddot_en=
> =3D
> e_
> 47711 bbrandt6  20   0  785m 214m 166m R 100.0  0.3   3:44.01
> exact_ddot_en=
> =3D
> e_
>
> which increases to 787m VIRT 653m RES for all processes once they have
> acce=
> =3D
> ssed all the data in the shared memory. So the memory footprint is just as
> =
> =3D
> large as if every process held it's own copy of the data. So at this point
> =
> =3D
> it seems like we haven't saved any memory at all. We might have gained
> spee=
> =3D
> d and bandwith but using the shared memory did not reduce the memory
> footpr=
> =3D
> int of our application.
>
> If we run this job on a cluster with a job scheduler and resource manager
> o=
> =3D
> ur jobs will be aborted if we expect the shared memory to count only once.
> =
> =3D
> So how can we work around this problem? Is the cause of this problem that
> m=
> =3D
> vapich runs different processes so shared memory counts fully towards each
> =
> =3D
> whereas openmp runs only one process but multiple threads so the shared
> mem=
> =3D
> ory counts only once? How could a resource manager (or the operating
> system=
> =3D
> ) correctly determine memory consumption?
>
> =3D3D=3D3D=3D3D end long version =3D3D=3D3D=3D3D
>
> Any thoughts and any comments are truly appreciated
>
> Thanks a lot
>
> Benedikt
>
>
>
>
>
>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_
> Content-Type: text/html; charset=3D"iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> <html>
> <head>
> <meta http-equiv=3D3D"Content-Type" content=3D3D"text/html;
> charset=3D3Diso=
> -8859-=3D
> 1">
> <style type=3D3D"text/css" style=3D3D"display:none;"><!-- P
> {margin-top:0;m=
> argi=3D
> n-bottom:0;} --></style>
> </head>
> <body dir=3D3D"ltr">
> <div id=3D3D"divtagdefaultwrapper"
> style=3D3D"font-size:12pt;color:#000000;=
> back=3D
> ground-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
> <p>Dear mvapich community</p>
> <p><br>
> </p>
> <p>I am currently testing the MPI-3 shared memory routines for use in our
> a=
> =3D
> pplication. The goal is to reduce the memory footprint of our application
> p=
> =3D
> er node. </p>
> <p><br>
> </p>
> <p>The code seems to work but I get the following odd behavior when I
> =
> =3D
> monitor the memory usage:</p>
> <p><br>
> </p>
> <p>TLDR: Shared memory that is "touched" <span
> style=3D3D"fo=
> nt=3D
> -family: Calibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji',
> 'Segoe=
> =3D
>  UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
> EmojiSymbol=
> =3D
> s; font-size: 16px;">(read or written</span><span style=3D3D"font-family:
> C=
> al=3D
> ibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> =
> =3D
> NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;
> font-size=
> =3D
> : 16px;">)</span>
>  by an MPI process counts towards that process's real memory (RSS,
> RES)&nbs=
> =3D
> p;value. If every process accesses the whole shared memory
> (=3D3D=
>  d=3D
> ata), the memory consumption as seen by top (or other monitoring tools) is
> =
> =3D
> the same as if every process had it's own copy of
>  the data. </p>
> <p><br>
> </p>
> <p><span style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif,
> 'Ap=
> pl=3D
> e Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'Andro=
> =3D
> id Emoji', EmojiSymbols; font-size: 16px;">If we run this job on a cluster
> =
> =3D
> with a job scheduler and resource
>  manager </span><span style=3D3D"font-family: Calibri, Arial,
> Helvetic=
> a,=3D
>  sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe
> =
> =3D
> UI Symbol', 'Android Emoji', EmojiSymbols; font-size: 16px;">our jobs will
> =
> =3D
> be aborted if we expect the shared memory
>  to count only once. So how can we work around this
> problem?</span> <s=
> =3D
> pan style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif, 'Apple
> C=
> ol=3D
> or Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android
> Em=
> =3D
> oji', EmojiSymbols; font-size: 16px;">How
>  could a resource manager (or the operating system</span><span
> style=3D3D"f=
> on=3D
> t-family: Calibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji',
> 'Sego=
> =3D
> e UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
> EmojiSymbo=
> =3D
> ls; font-size: 16px;">) correctly
>  determine memory consumption?</span> </p>
> <p><br>
> </p>
> <p>=3D3D=3D3D=3D3D Long version: =3D3D=3D3D=3D3D</p>
> <p><br>
> </p>
> <p>Running our code compiled with mvapich (2.1)  and ifort (15)
> o=
> =3D
> n one node, I see the following memory footprint right after starting the
> p=
> =3D
> rogram:</p>
> <p><span style=3D3D"font-size: 12pt;"><br>
> </span></p>
> <p><span style=3D3D"font-size: 12pt;">PID      USER  
> &=
> nb=3D
> sp;     PR  NI  VIRT    RES  SHR  
> =
> =3D
> S %CPU %MEM    TIME+  COMMAND</span><br>
> </p>
> <p></p>
> <div></div>
> <div>47708 bbrandt6  20   0  746m  14m 6064 R 100.0
> &nb=
> =3D
> sp;0.0   0:22.57 exact_ddot_ene_<span style=3D3D"font-size:
> 12pt;"></s=
> pa=3D
> n></div>
> <div>47707 bbrandt6  20   0  746m  14m 6164 R 100.0
> &nb=
> =3D
> sp;0.0   0:22.56 exact_ddot_ene_</div>
> <div>47709 bbrandt6  20   0  746m  14m 6020 R 100.0
> &nb=
> =3D
> sp;0.0   0:22.58 exact_ddot_ene_</div>
> <div>47710 bbrandt6  20   0  746m  14m 6056 R 100.0
> &nb=
> =3D
> sp;0.0   0:22.55 exact_ddot_ene_</div>
> <div><span style=3D3D"font-size: 12pt;">47711 bbrandt6  20   0
> &n=
> bs=3D
> p;746m  14m 6072 R 100.0  0.0   0:22.57
> exact_ddot_ene_</spa=
> =3D
> n></div>
> <div><span style=3D3D"font-size: 12pt;"><br>
> </span></div>
> <p></p>
> <p>This is as expected since we allocate about 700mb of shared memory
> using=
> =3D
>  MPI_Win_allocate_shared. A<span style=3D3D"font-size: 12pt;">fter
> cop=
> yi=3D
> ng the data into the shared memory it looks like this</span></p>
> <p><span style=3D3D"font-size: 12pt;"><br>
> </span></p>
> <p><span style=3D3D"font-size: 12pt;"><span style=3D3D"font-family:
> Calibri=
> , Ar=3D
> ial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> NotoColo=
> =3D
> rEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols; font-size:
> 16px;"=
> =3D
> >PID      USER         PR  NI
> &nbsp=
> =3D
> ;VIRT
>     RES  SHR   S %CPU %MEM    TIME+
> &nbsp=
> =3D
> ;COMMAND</span><br>
> </span></p>
> <p></p>
> <div>47711 bbrandt6  20   0  746m  17m 6216 R 100.0
> &nb=
> =3D
> sp;0.0   3:01.03 exact_ddot_ene_<span style=3D3D"font-size:
> 12pt;"></s=
> pa=3D
> n></div>
> <div>47708 bbrandt6  20   0  746m  17m 6212 R 99.6
> &nbs=
> =3D
> p;0.0   2:40.07 exact_ddot_ene_</div>
> <div>47707 bbrandt6  20   0  746m 612m 600m R 99.3
>  0.9=
> =3D
>    3:01.33 exact_ddot_ene_</div>
> <div>47709 bbrandt6  20   0  746m  17m 6164 R 98.6
> &nbs=
> =3D
> p;0.0   3:06.72 exact_ddot_ene_</div>
> <div>47710 bbrandt6  20   0  746m  17m 6200 R 98.6
> &nbs=
> =3D
> p;0.0   2:43.91 exact_ddot_ene_</div>
> <div><br>
> </div>
> <div>Again just as expected, one process copied the data and has now
> a=
> =3D
>  memory footprint of<span style=3D3D"font-family: Calibri, Arial,
> Helvetica=
> , =3D
> sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe
> U=
> =3D
> I Symbol', 'Android Emoji', EmojiSymbols; font-size: 16px;"> 746m
>  VIRT and 612m RES. Now the other processes start accessing the data
> a=
> =3D
> nd we get:</span></div>
> <div><span style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif,
> '=
> Ap=3D
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D
> roid Emoji', EmojiSymbols; font-size: 16px;"><br>
> </span></div>
> <div><span style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif,
> '=
> Ap=3D
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D
> roid Emoji', EmojiSymbols; font-size: 16px;"><span style=3D3D"font-family:
> =
> Ca=3D
> libri, Arial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI
> Emoji',=
> =3D
>  NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;
> font-siz=
> =3D
> e: 16px;">PID
>       USER         PR  NI
>  VIR=
> =3D
> T    RES  SHR   S %CPU %MEM    TIME+
> &nbs=
> =3D
> p;COMMAND</span><br>
> </span></div>
> <div><span style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif,
> '=
> Ap=3D
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D
> roid Emoji', EmojiSymbols; font-size: 16px;">
> <div>47709 bbrandt6  20   0  785m 214m 165m R 100.0
>  0.=
> =3D
> 3   3:49.37 exact_ddot_ene_</div>
> <div>47707 bbrandt6  20   0  785m 653m 602m R 100.0
>  1.=
> =3D
> 0   3:43.93 exact_ddot_ene_</div>
> <div>47708 bbrandt6  20   0  785m 214m 166m R 100.0
>  0.=
> =3D
> 3   3:23.03 exact_ddot_ene_</div>
> <div>47710 bbrandt6  20   0  785m 214m 166m R 100.0
>  0.=
> =3D
> 3   3:26.86 exact_ddot_ene_</div>
> <div>47711 bbrandt6  20   0  785m 214m 166m R 100.0
>  0.=
> =3D
> 3   3:44.01 exact_ddot_ene_</div>
> <div><br>
> </div>
> <div>which increases to <span style=3D3D"font-family: Calibri, Arial,
> =
> He=3D
> lvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> NotoColorEmoji,=
> =3D
>  'Segoe UI Symbol', 'Android Emoji', EmojiSymbols; font-size: 16px;">787m
> V=
> =3D
> IRT 653m RES for all processes once they have
>  accessed all the data in the shared memory. So the memory footprint is
> jus=
> =3D
> t as large as if every process held it's own copy of the data. So at this
> p=
> =3D
> oint it seems like we haven't saved any memory at all. We might have
> gained=
> =3D
>  speed and bandwith but using the
>  shared memory did not reduce the memory footprint of our
> application.&nbsp=
> =3D
> ;</span></div>
> <div><br>
> </div>
> <div>If we run this job on a cluster with a job scheduler and
> resource=
> =3D
>  manager our jobs will be aborted if we expect the shared memory to
> co=
> =3D
> unt only once. So how can we work around this problem? Is the cause of
> this=
> =3D
>  problem that mvapich runs different processes
>  so shared memory counts fully towards each whereas openmp runs only one
> pr=
> =3D
> ocess but multiple threads so the shared memory counts only once? How
> =
> =3D
> could a resource manager (or the operating system) correctly determine
> memo=
> =3D
> ry consumption? </div>
> <div><br>
> </div>
> <div>=3D3D=3D3D=3D3D end long version =3D3D=3D3D=3D3D</div>
> <div><br>
> </div>
> <div>Any thoughts and any comments are truly appreciated</div>
> <div><br>
> </div>
> <div>Thanks a lot</div>
> <div><br>
> </div>
> <div>Benedikt</div>
> <div></div>
> <br>
> </span></div>
> <div><br>
> </div>
> <div><br>
> </div>
> <div><br>
> </div>
> <br>
> <p></p>
> </div>
> </body>
> </html>
>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_--
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D
> Content-Type: text/plain; charset=3D"us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:
> mvapich-discuss at cse.ohio-state.ed=
> u>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D--
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:
> mvapich-discuss at cse.ohio-state.ed=
> u>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
> --_000_SN1PR0701MB1856ED6BF530B034B1297011A4F70SN1PR0701MB1856_
> Content-Type: text/html; charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> <html>
> <head>
> <meta http-equiv=3D"Content-Type" content=3D"text/html;
> charset=3Diso-8859-=
> 1">
> <style type=3D"text/css" style=3D"display:none;"><!-- P
> {margin-top:0;margi=
> n-bottom:0;} --></style>
> </head>
> <body dir=3D"ltr">
> <div id=3D"divtagdefaultwrapper"
> style=3D"font-size:12pt;color:#000000;back=
> ground-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
> <p></p>
> <div>Hi Mingzhe</div>
> <div><br>
> </div>
> <div>Thanks a lot for your reply! Yes I did execute
> MPI_Win_allocate_shared=
> </div>
> <div>on each MPI process. And the shared memory is about 700mb.</div>
> <div>I have attached the relevant code snipped at the</div>
> <div>end of this mail. From the documentation I read that</div>
> <div>MPI_Win_allocate_shared  is a collective call, so I do have to
> ca=
> ll it</div>
> <div>from every processes that is supposed to use the shared memory,</div>
> <div>right? </div>
> <div><span style=3D"font-size: 12pt;"></span></div>
> <div><span style=3D"font-size: 12pt;"><br>
> </span></div>
> <div>I guess the question I am truly asking is: Is there a way to use
> MPI&n=
> bsp;</div>
> <div>shared <span style=3D"font-size: 12pt;">memory s</span><span
> styl=
> e=3D"font-size: 12pt;">o that the shared memory is being accessible by
> all<=
> /span></div>
> <div>processes but only counted once (like threads in
> OpenMP)?</d=
> iv>
> <div><span style=3D"font-size: 12pt;"></span></div>
> <div><br>
> </div>
> <div>Thanks a lot</div>
> <div><br>
> </div>
> <div>Benedikt</div>
> <div><br>
> </div>
> <div>=3D=3D=3D=3D=3D Code sample below =3D=3D=3D=3D=3D=3D</div>
> <div><br>
> </div>
> <div><br>
> </div>
> <div>    CALL MPI_Comm_split_type(MPI_COMM_WORLD,
> MPI_COMM_TYPE_S=
> HARED&</div>
> <div>, 0, MPI_INFO_NULL, hostcomm,ierr)</div>
> <div>    CALL MPI_Comm_rank(hostcomm, hostrank,ierr)</div>
> <div><br>
> </div>
> <div>    allocate(arrayshape(4))</div>
> <div>    arrayshape=3D(/ nroot1,nroot1,nroot1,nroot1 /)</div>
> <div>    if (hostrank =3D=3D 0) then</div>
> <div>        windowsize =3D
> int(nroot1**4,MPI_ADDRESS_K=
> IND)*&</div>
> <div>8_MPI_ADDRESS_KIND ! *8 since there are 8 bytes in a double</div>
> <div>    else</div>
> <div>        windowsize =3D 0_MPI_ADDRESS_KIND</div>
> <div>    end if</div>
> <div>    disp_unit =3D 1</div>
> <div><br>
> </div>
> <div>    CALL MPI_Win_allocate_shared(windowsize, disp_unit,
> &amp=
> ;</div>
> <div>MPI_INFO_NULL, hostcomm, baseptr, win,ierr)</div>
> <div>    CALL MPI_Win_allocate_shared(windowsize, disp_unit,
> &amp=
> ;</div>
> <div>MPI_INFO_NULL, hostcomm, baseptr2, win2,ierr)</div>
> <div><br>
> </div>
> <div>    ! Obtain the location of the memory segment</div>
> <div>    if (hostrank /=3D 0) then</div>
> <div>        CALL MPI_Win_shared_query(win, 0,
> windowsi=
> ze, disp_unit,&</div>
> <div> baseptr,ierr)</div>
> <div>        CALL MPI_Win_shared_query(win2, 0,
> windows=
> ize, disp_unit,&</div>
> <div> baseptr2,ierr)</div>
> <div>    end if</div>
> <div><br>
> </div>
> <div>    ! baseptr can now be associated with a Fortran
> pointer</=
> div>
> <div>    ! and thus used to access the shared data</div>
> <div>    CALL C_F_POINTER(baseptr,
> matrix_elementsy,arrayshape)</=
> div>
> <div>    CALL C_F_POINTER(baseptr2,
> matrix_elementsz,arrayshape)<=
> /div>
> <br>
> <p></p>
> <div style=3D"color: rgb(0, 0, 0);">
> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"
> customtabinde=
> x=3D"-1" disabled=3D"true">
> <div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif"
> co=
> lor=3D"#000000" style=3D"font-size:11pt"><b>From:</b>
> mingzhe0908 at gmail.com=
>  <mingzhe0908 at gmail.com> on behalf of Mingzhe Li <li.2192 at osu.edu
> &=
> gt;<br>
> <b>Sent:</b> Saturday, January 9, 2016 11:34 AM<br>
> <b>To:</b> Brandt, Benedikt B<br>
> <b>Cc:</b> mvapich-discuss at cse.ohio-state.edu<br>
> <b>Subject:</b> Re:</font>
> <div> </div>
> </div>
> <div>
> <div dir=3D"ltr">Hi Benedikt,
> <div><br>
> </div>
> <div>Thanks for your note. Did you allocate around 700mb of shared memory
> w=
> ith MPI_Win_allocate_shared for each MPI process? If that's the case, the
> m=
> emory consumption will be the same as using malloc for each MPI
> process.&nb=
> sp;</div>
> <div><br>
> </div>
> <div>Thanks,</div>
> <div>Mingzhe</div>
> <div class=3D"gmail_extra"><br>
> <div class=3D"gmail_quote">On Fri, Jan 8, 2016 at 11:01 AM, Brandt,
> Benedik=
> t B <span dir=3D"ltr">
> <<a href=3D"mailto:benbra at gatech.edu" target=3D"_blank"
> title=3D"mailto:=
> benbra at gatech.edu=0A=
> Ctrl+Click or tap to follow the link" tabindex=3D"-1"
> disabled=3D"true"=
> >benbra at gatech.edu</a>></span> wrote:<br>
> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;
> border-left:1=
> px #ccc solid; padding-left:1ex">
> Please excuse the terrible formatting of my last mail. This was the<br>
> first time I submitted to this list. Here is a well formatted<br>
> version:<br>
> <span><br>
> Dear mvapich community<br>
> <br>
> I am currently testing the MPI-3 shared memory routines for use in our<br>
> </span>application. The goal is to reduce the memory footprint of our<br>
> application per node.<br>
> <span><br>
> The code seems to work but I get the following odd behavior when I<br>
> </span>monitor the memory usage:<br>
> <span><br>
> TLDR: Shared memory that is "touched" (read or written) by an
> MPI=
> <br>
> </span>process counts towards that process's real memory (RSS, RES) value.
> =
> If<br>
> every process accesses the whole shared memory (=3D data), the memory<br>
> consumption as seen by top (or other monitoring tools) is the same as<br>
> if every process had it's own copy of the data.<br>
> <span><br>
> If we run this job on a cluster with a job scheduler and resource<br>
> </span>manager our jobs will be aborted if we expect the shared memory
> to<b=
> r>
> count only once. So how can we work around this problem? How could a<br>
> resource manager (or the operating system) correctly determine memory<br>
> consumption?<br>
> <br>
> =3D=3D=3D Long version: =3D=3D=3D<br>
> <span><br>
> Running our code compiled with mvapich (2.1)  and ifort (15) on
> one<br=
> >
> </span><span>node, I see the following memory footprint right after
> startin=
> g the<br>
> </span>program:<br>
> <br>
> PID   USER      PR  NI  VIRT 
> RES&n=
> bsp; SHR S %CPU  %MEM   TIME+  COMMAND<br>
> <span>47708 bbrandt6  20   0  746m  14m 6064 R
> 100=
> .0  0.0   0:22.57 exa<br>
> </span><span>47707 bbrandt6  20   0  746m  14m
> 616=
> 4 R 100.0  0.0   0:22.56 exa<br>
> </span><span>47709 bbrandt6  20   0  746m  14m
> 602=
> 0 R 100.0  0.0   0:22.58 exa<br>
> </span><span>47710 bbrandt6  20   0  746m  14m
> 605=
> 6 R 100.0  0.0   0:22.55 exa<br>
> </span><span>47711 bbrandt6  20   0  746m  14m
> 607=
> 2 R 100.0  0.0   0:22.57 exa<br>
> <br>
> <br>
> </span><span>This is as expected since we allocate about 700mb of shared
> me=
> mory<br>
> </span>using MPI_Win_allocate_shared. After copying the data into the
> share=
> d<br>
> memory it looks like this<br>
> <br>
> <br>
> PID   USER      PR  NI  VIRT 
> RES&n=
> bsp; SHR S %CPU  %MEM   TIME+  COMMAND<br>
> <span>47711 bbrandt6  20   0  746m  17m 6216 R
> 100=
> .0  0.0   3:01.03 exa<br>
> </span><span>47708 bbrandt6  20   0  746m  17m
> 621=
> 2 R 99.6  0.0   2:40.07 exa<br>
> </span><span>47707 bbrandt6  20   0  746m 612m 600m R
> 9=
> 9.3  0.9   3:01.33 exa<br>
> </span><span>47709 bbrandt6  20   0  746m  17m
> 616=
> 4 R 98.6  0.0   3:06.72 exa<br>
> </span><span>47710 bbrandt6  20   0  746m  17m
> 620=
> 0 R 98.6  0.0   2:43.91 exa<br>
> <br>
> </span><span>Again just as expected, one process copied the data and has
> no=
> w a<br>
> </span>memory footprint of 746m VIRT and 612m RES. Now the other
> processes<=
> br>
> start accessing the data and we get:<br>
> <br>
> PID   USER      PR  NI  VIRT 
> RES&n=
> bsp; SHR S %CPU  %MEM   TIME+  COMMAND<br>
> <span>47709 bbrandt6  20   0  785m 214m 165m R
> 100.0&nb=
> sp; 0.3   3:49.37 exa<br>
> </span><span>47707 bbrandt6  20   0  785m 653m 602m R
> 1=
> 00.0  1.0   3:43.93 exa<br>
> </span><span>47708 bbrandt6  20   0  785m 214m 166m R
> 1=
> 00.0  0.3   3:23.03 exa<br>
> </span><span>47710 bbrandt6  20   0  785m 214m 166m R
> 1=
> 00.0  0.3   3:26.86 exa<br>
> </span><span>47711 bbrandt6  20   0  785m 214m 166m R
> 1=
> 00.0  0.3   3:44.01 exa<br>
> <br>
> </span><span>which increases to 787m VIRT 653m RES for all processes once
> t=
> hey have<br>
> </span><span>accessed all the data in the shared memory. So the memory
> foot=
> print is<br>
> </span>just as large as if every process held it's own copy of the data.
> So=
> <br>
> at this point it seems like we haven't saved any memory at all. We<br>
> might have gained speed and bandwith but using the shared memory did<br>
> not reduce the memory footprint of our application.<br>
> <span><br>
> If we run this job on a cluster with a job scheduler and resource<br>
> </span>manager our jobs will be aborted if we expect the shared memory
> to<b=
> r>
> count only once. So how can we work around this problem? Is the cause<br>
> of this problem that mvapich runs different processes so shared memory<br>
> counts fully towards each whereas openmp runs only one process but<br>
> multiple threads so the shared memory counts only once? How could a<br>
> resource manager (or the operating system) correctly determine memory<br>
> consumption?<br>
> <br>
> =3D=3D=3D end long version =3D=3D=3D<br>
> <span><br>
> Any thoughts and any comments are truly appreciated<br>
> <br>
> Thanks a lot<br>
> <br>
> Benedikt<br>
> <br>
> </span>________________________________________<br>
> From: Brandt, Benedikt B <<a href=3D"mailto:benbra at gatech.edu"
> target=3D=
> "_blank" tabindex=3D"-1" disabled=3D"true">benbra at gatech.edu</a>><br>
> Sent: Friday, January 8, 2016 10:46 AM<br>
> To: <a href=3D"mailto:mvapich-discuss at cse.ohio-state.edu"
> target=3D"_blank"=
>  tabindex=3D"-1" disabled=3D"true">
> mvapich-discuss at cse.ohio-state.edu</a><br>
> Subject:<br>
> <div>
> <div><br>
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D<br=
> >
> Content-Language: en-US<br>
> Content-Type: multipart/alternative;<br>
>        
> boundary=3D"_000_SN1PR0701MB1856179D6E2940=
> D9AC7322FCA4F60SN1PR0701MB1856_"<br>
> <br>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_<br>
> Content-Type: text/plain; charset=3D"iso-8859-1"<br>
> Content-Transfer-Encoding: quoted-printable<br>
> <br>
> Dear mvapich community<br>
> <br>
> <br>
> I am currently testing the MPI-3 shared memory routines for use in our
> appl=
> =3D<br>
> ication. The goal is to reduce the memory footprint of our application per
> =
> =3D<br>
> node.<br>
> <br>
> <br>
> The code seems to work but I get the following odd behavior when I monitor
> =
> =3D<br>
> the memory usage:<br>
> <br>
> <br>
> TLDR: Shared memory that is "touched" (read or written) by an
> MPI=
>  process c=3D<br>
> ounts towards that process's real memory (RSS, RES) value. If every
> process=
> =3D<br>
>  accesses the whole shared memory (=3D3D data), the memory
> consumption=
>  as see=3D<br>
> n by top (or other monitoring tools) is the same as if every process had
> it=
> =3D<br>
> 's own copy of the data.<br>
> <br>
> <br>
> If we run this job on a cluster with a job scheduler and resource manager
> o=
> =3D<br>
> ur jobs will be aborted if we expect the shared memory to count only once.
> =
> =3D<br>
> So how can we work around this problem? How could a resource manager (or
> th=
> =3D<br>
> e operating system) correctly determine memory consumption?<br>
> <br>
> <br>
> =3D3D=3D3D=3D3D Long version: =3D3D=3D3D=3D3D<br>
> <br>
> <br>
> Running our code compiled with mvapich (2.1)  and ifort (15) on one
> no=
> de, I=3D<br>
>  see the following memory footprint right after starting the
> program:<=
> br>
> <br>
> <br>
> PID      USER         PR 
> NI&n=
> bsp; VIRT    RES  SHR   S %CPU %MEM   
> T=
> IME+  COMM=3D<br>
> AND<br>
> <br>
> 47708 bbrandt6  20   0  746m  14m 6064 R
> 100.0&nbs=
> p; 0.0   0:22.57 exact_ddot_en=3D<br>
> e_<br>
> 47707 bbrandt6  20   0  746m  14m 6164 R
> 100.0&nbs=
> p; 0.0   0:22.56 exact_ddot_en=3D<br>
> e_<br>
> 47709 bbrandt6  20   0  746m  14m 6020 R
> 100.0&nbs=
> p; 0.0   0:22.58 exact_ddot_en=3D<br>
> e_<br>
> 47710 bbrandt6  20   0  746m  14m 6056 R
> 100.0&nbs=
> p; 0.0   0:22.55 exact_ddot_en=3D<br>
> e_<br>
> 47711 bbrandt6  20   0  746m  14m 6072 R
> 100.0&nbs=
> p; 0.0   0:22.57 exact_ddot_en=3D<br>
> e_<br>
> <br>
> <br>
> This is as expected since we allocate about 700mb of shared memory using
> MP=
> =3D<br>
> I_Win_allocate_shared. After copying the data into the shared memory it
> loo=
> =3D<br>
> ks like this<br>
> <br>
> <br>
> PID      USER         PR 
> NI&n=
> bsp; VIRT    RES  SHR   S %CPU %MEM   
> T=
> IME+  COMM=3D<br>
> AND<br>
> <br>
> 47711 bbrandt6  20   0  746m  17m 6216 R
> 100.0&nbs=
> p; 0.0   3:01.03 exact_ddot_en=3D<br>
> e_<br>
> 47708 bbrandt6  20   0  746m  17m 6212 R
> 99.6&nbsp=
> ; 0.0   2:40.07 exact_ddot_ene=3D<br>
> _<br>
> 47707 bbrandt6  20   0  746m 612m 600m R 99.3 
> 0.9=
>    3:01.33 exact_ddot_ene=3D<br>
> _<br>
> 47709 bbrandt6  20   0  746m  17m 6164 R
> 98.6&nbsp=
> ; 0.0   3:06.72 exact_ddot_ene=3D<br>
> _<br>
> 47710 bbrandt6  20   0  746m  17m 6200 R
> 98.6&nbsp=
> ; 0.0   2:43.91 exact_ddot_ene=3D<br>
> _<br>
> <br>
> Again just as expected, one process copied the data and has now a memory
> fo=
> =3D<br>
> otprint of 746m VIRT and 612m RES. Now the other processes start accessing
> =
> =3D<br>
> the data and we get:<br>
> <br>
> PID      USER         PR 
> NI&n=
> bsp; VIRT    RES  SHR   S %CPU %MEM   
> T=
> IME+  COMM=3D<br>
> AND<br>
> 47709 bbrandt6  20   0  785m 214m 165m R 100.0 
> 0.=
> 3   3:49.37 exact_ddot_en=3D<br>
> e_<br>
> 47707 bbrandt6  20   0  785m 653m 602m R 100.0 
> 1.=
> 0   3:43.93 exact_ddot_en=3D<br>
> e_<br>
> 47708 bbrandt6  20   0  785m 214m 166m R 100.0 
> 0.=
> 3   3:23.03 exact_ddot_en=3D<br>
> e_<br>
> 47710 bbrandt6  20   0  785m 214m 166m R 100.0 
> 0.=
> 3   3:26.86 exact_ddot_en=3D<br>
> e_<br>
> 47711 bbrandt6  20   0  785m 214m 166m R 100.0 
> 0.=
> 3   3:44.01 exact_ddot_en=3D<br>
> e_<br>
> <br>
> which increases to 787m VIRT 653m RES for all processes once they have
> acce=
> =3D<br>
> ssed all the data in the shared memory. So the memory footprint is just as
> =
> =3D<br>
> large as if every process held it's own copy of the data. So at this point
> =
> =3D<br>
> it seems like we haven't saved any memory at all. We might have gained
> spee=
> =3D<br>
> d and bandwith but using the shared memory did not reduce the memory
> footpr=
> =3D<br>
> int of our application.<br>
> <br>
> If we run this job on a cluster with a job scheduler and resource manager
> o=
> =3D<br>
> ur jobs will be aborted if we expect the shared memory to count only once.
> =
> =3D<br>
> So how can we work around this problem? Is the cause of this problem that
> m=
> =3D<br>
> vapich runs different processes so shared memory counts fully towards each
> =
> =3D<br>
> whereas openmp runs only one process but multiple threads so the shared
> mem=
> =3D<br>
> ory counts only once? How could a resource manager (or the operating
> system=
> =3D<br>
> ) correctly determine memory consumption?<br>
> <br>
> =3D3D=3D3D=3D3D end long version =3D3D=3D3D=3D3D<br>
> <br>
> Any thoughts and any comments are truly appreciated<br>
> <br>
> Thanks a lot<br>
> <br>
> Benedikt<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_<br>
> Content-Type: text/html; charset=3D"iso-8859-1"<br>
> Content-Transfer-Encoding: quoted-printable<br>
> <br>
> <html><br>
> <head><br>
> <meta http-equiv=3D3D"Content-Type"
> content=3D3D"text/htm=
> l; charset=3D3Diso-8859-=3D<br>
> 1"><br>
> <style type=3D3D"text/css"
> style=3D3D"display:none;"=
> ><!-- P {margin-top:0;margi=3D<br>
> n-bottom:0;} --></style><br>
> </head><br>
> <body dir=3D3D"ltr"><br>
> <div id=3D3D"divtagdefaultwrapper"
> style=3D3D"font-size:1=
> 2pt;color:#000000;back=3D<br>
>
> ground-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"&=
> gt;<br>
> <p>Dear mvapich community</p><br>
> <p><br><br>
> </p><br>
> <p>I am currently testing the MPI-3 shared memory routines for use
> in=
>  our a=3D<br>
> pplication. The goal is to reduce the memory footprint of our application
> p=
> =3D<br>
> er node.&nbsp;</p><br>
> <p><br><br>
> </p><br>
> <p>The code&nbsp;seems to work but I get the following odd
> behavi=
> or when I =3D<br>
> monitor the memory usage:</p><br>
> <p><br><br>
> </p><br>
> <p>TLDR: Shared memory that is
> &quot;touched&quot;&nbsp;&=
> lt;span style=3D3D"font=3D<br>
> -family: Calibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji',
> 'Segoe=
> =3D<br>
>  UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
> EmojiS=
> ymbol=3D<br>
> s; font-size: 16px;">(read or written</span><span
> style=3D=
> 3D"font-family: Cal=3D<br>
> ibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> =
> =3D<br>
> NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;
> font-size=
> =3D<br>
> : 16px;">)</span><br>
>  by an MPI process counts towards that process's real memory (RSS,
> RES=
> )&nbs=3D<br>
> p;value. If every process accesses&nbsp;the whole&nbsp;shared
> memor=
> y (=3D3D d=3D<br>
> ata), the memory consumption as seen by top (or other monitoring tools) is
> =
> =3D<br>
> the same as if every process had it's own copy of<br>
>  the data.&nbsp;</p><br>
> <p><br><br>
> </p><br>
> <p><span style=3D3D"font-family: Calibri, Arial, Helvetica,
> s=
> ans-serif, 'Appl=3D<br>
> e Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'Andro=
> =3D<br>
> id Emoji', EmojiSymbols; font-size: 16px;">If we run this job on a
> =
> cluster =3D<br>
> with a&nbsp;job scheduler and resource<br>
>  manager&nbsp;</span><span style=3D3D"font-family:
> C=
> alibri, Arial, Helvetica,=3D<br>
>  sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji,
> 'S=
> egoe =3D<br>
> UI Symbol', 'Android Emoji', EmojiSymbols; font-size: 16px;">our
> jo=
> bs will =3D<br>
> be aborted if we expect the shared memory<br>
>  to count only once. So how can we work around this
> problem?</span&=
> gt;&nbsp;<s=3D<br>
> pan style=3D3D"font-family: Calibri, Arial, Helvetica, sans-serif,
> 'Ap=
> ple Col=3D<br>
> or Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android
> Em=
> =3D<br>
> oji', EmojiSymbols; font-size: 16px;">How<br>
>  could a resource manager (or the operating
> system</span><spa=
> n style=3D3D"fon=3D<br>
> t-family: Calibri, Arial, Helvetica, sans-serif, 'Apple Color Emoji',
> 'Sego=
> =3D<br>
> e UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
> EmojiSymbo=
> =3D<br>
> ls; font-size: 16px;">) correctly<br>
>  determine memory consumption?</span>&nbsp;</p><br>
> <p><br><br>
> </p><br>
> <p>=3D3D=3D3D=3D3D Long version: =3D3D=3D3D=3D3D</p><br>
> <p><br><br>
> </p><br>
> <p>Running our code compiled with&nbsp;mvapich (2.1)
> &nbsp;an=
> d ifort (15) o=3D<br>
> n one node, I see the following memory footprint right after starting the
> p=
> =3D<br>
> rogram:</p><br>
> <p><span style=3D3D"font-size: 12pt;"><br><br>
> </span></p><br>
> <p><span style=3D3D"font-size: 12pt;">PID &nbsp;
> =
> &nbsp; &nbsp;USER &nbsp; &nb=3D<br>
> sp; &nbsp; &nbsp; PR &nbsp;NI &nbsp;VIRT &nbsp;
> &nb=
> sp;RES &nbsp;SHR &nbsp; =3D<br>
> S %CPU %MEM &nbsp; &nbsp;TIME&#43;
> &nbsp;COMMAND</span&g=
> t;<br><br>
> </p><br>
> <p></p><br>
> <div></div><br>
> <div>47708 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;14m 6064 R 100.0 &nb=3D<br>
> sp;0.0 &nbsp; 0:22.57 exact_ddot_ene_<span
> style=3D3D"font-size=
> : 12pt;"></spa=3D<br>
> n></div><br>
> <div>47707 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;14m 6164 R 100.0 &nb=3D<br>
> sp;0.0 &nbsp; 0:22.56 exact_ddot_ene_</div><br>
> <div>47709 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;14m 6020 R 100.0 &nb=3D<br>
> sp;0.0 &nbsp; 0:22.58 exact_ddot_ene_</div><br>
> <div>47710 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;14m 6056 R 100.0 &nb=3D<br>
> sp;0.0 &nbsp; 0:22.55 exact_ddot_ene_</div><br>
> <div><span style=3D3D"font-size: 12pt;">47711
> bbrandt=
> 6 &nbsp;20 &nbsp; 0 &nbs=3D<br>
> p;746m &nbsp;14m 6072 R 100.0 &nbsp;0.0 &nbsp; 0:22.57
> exact_dd=
> ot_ene_</spa=3D<br>
> n></div><br>
> <div><span style=3D3D"font-size:
> 12pt;"><br><br=
> >
> </span></div><br>
> <p></p><br>
> <p>This is as expected since we allocate about 700mb of shared
> memory=
>  using=3D<br>
> &nbsp;MPI_Win_allocate_shared. A<span style=3D3D"font-size:
> 12p=
> t;">fter copyi=3D<br>
> ng the data into the shared memory it looks like
> this</span></p&gt=
> ;<br>
> <p><span style=3D3D"font-size: 12pt;"><br><br>
> </span></p><br>
> <p><span style=3D3D"font-size: 12pt;"><span style=
> =3D3D"font-family: Calibri, Ar=3D<br>
> ial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> NotoColo=
> =3D<br>
> rEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols; font-size:
> 16px;&=
> quot;=3D<br>
> >PID &nbsp; &nbsp; &nbsp;USER &nbsp; &nbsp;
> &nbs=
> p; &nbsp; PR &nbsp;NI &nbsp=3D<br>
> ;VIRT<br>
>  &nbsp; &nbsp;RES &nbsp;SHR &nbsp; S %CPU %MEM
> &nb=
> sp; &nbsp;TIME&#43; &nbsp=3D<br>
> ;COMMAND</span><br><br>
> </span></p><br>
> <p></p><br>
> <div>47711 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;17m 6216 R 100.0 &nb=3D<br>
> sp;0.0 &nbsp; 3:01.03 exact_ddot_ene_<span
> style=3D3D"font-size=
> : 12pt;"></spa=3D<br>
> n></div><br>
> <div>47708 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;17m 6212 R 99.6 &nbs=3D<br>
> p;0.0 &nbsp; 2:40.07 exact_ddot_ene_</div><br>
> <div>47707 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m 612m
> 600=
> m R 99.3 &nbsp;0.9=3D<br>
>  &nbsp; 3:01.33 exact_ddot_ene_</div><br>
> <div>47709 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;17m 6164 R 98.6 &nbs=3D<br>
> p;0.0 &nbsp; 3:06.72 exact_ddot_ene_</div><br>
> <div>47710 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;746m
> &nbs=
> p;17m 6200 R 98.6 &nbs=3D<br>
> p;0.0 &nbsp; 2:43.91 exact_ddot_ene_</div><br>
> <div><br><br>
> </div><br>
> <div>Again just as expected, one process copied&nbsp;the data
> and=
>  has now a=3D<br>
>  memory footprint of<span style=3D3D"font-family: Calibri,
> Ari=
> al, Helvetica, =3D<br>
> sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe
> U=
> =3D<br>
> I Symbol', 'Android Emoji', EmojiSymbols; font-size:
> 16px;">&nb=
> sp;746m<br>
>  VIRT and&nbsp;612m RES. Now the other processes start accessing
> t=
> he data a=3D<br>
> nd we get:</span></div><br>
> <div><span style=3D3D"font-family: Calibri, Arial,
> Helvetica,=
>  sans-serif, 'Ap=3D<br>
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D<br>
> roid Emoji', EmojiSymbols; font-size: 16px;"><br><br>
> </span></div><br>
> <div><span style=3D3D"font-family: Calibri, Arial,
> Helvetica,=
>  sans-serif, 'Ap=3D<br>
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D<br>
> roid Emoji', EmojiSymbols; font-size: 16px;"><span
> style=3D3D&qu=
> ot;font-family: Ca=3D<br>
> libri, Arial, Helvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI
> Emoji',=
> =3D<br>
>  NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;
> fon=
> t-siz=3D<br>
> e: 16px;">PID<br>
>  &nbsp; &nbsp; &nbsp;USER &nbsp; &nbsp;
> &nbsp;=
>  &nbsp; PR &nbsp;NI &nbsp;VIR=3D<br>
> T &nbsp; &nbsp;RES &nbsp;SHR &nbsp; S %CPU %MEM &nbsp;
> =
> &nbsp;TIME&#43; &nbs=3D<br>
> p;COMMAND</span><br><br>
> </span></div><br>
> <div><span style=3D3D"font-family: Calibri, Arial,
> Helvetica,=
>  sans-serif, 'Ap=3D<br>
> ple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol',
> 'And=
> =3D<br>
> roid Emoji', EmojiSymbols; font-size: 16px;"><br>
> <div>47709 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;785m 214m
> 165=
> m R 100.0 &nbsp;0.=3D<br>
> 3 &nbsp; 3:49.37 exact_ddot_ene_</div><br>
> <div>47707 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;785m 653m
> 602=
> m R 100.0 &nbsp;1.=3D<br>
> 0 &nbsp; 3:43.93 exact_ddot_ene_</div><br>
> <div>47708 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;785m 214m
> 166=
> m R 100.0 &nbsp;0.=3D<br>
> 3 &nbsp; 3:23.03 exact_ddot_ene_</div><br>
> <div>47710 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;785m 214m
> 166=
> m R 100.0 &nbsp;0.=3D<br>
> 3 &nbsp; 3:26.86 exact_ddot_ene_</div><br>
> <div>47711 bbrandt6 &nbsp;20 &nbsp; 0 &nbsp;785m 214m
> 166=
> m R 100.0 &nbsp;0.=3D<br>
> 3 &nbsp; 3:44.01 exact_ddot_ene_</div><br>
> <div><br><br>
> </div><br>
> <div>which increases to&nbsp;<span
> style=3D3D"font-family=
> : Calibri, Arial, He=3D<br>
> lvetica, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji',
> NotoColorEmoji,=
> =3D<br>
>  'Segoe UI Symbol', 'Android Emoji', EmojiSymbols; font-size:
> 16px;&qu=
> ot;>787m V=3D<br>
> IRT 653m RES for&nbsp;all processes once they&nbsp;have<br>
>  accessed all the data in the shared memory. So the memory footprint
> i=
> s jus=3D<br>
> t as large as if every process held it's own copy of the data. So at this
> p=
> =3D<br>
> oint it seems like we haven't saved any memory at all. We might have
> gained=
> =3D<br>
>  speed and bandwith but using the<br>
>  shared memory did not reduce the memory footprint of our
> application.=
> &nbsp=3D<br>
> ;</span></div><br>
> <div><br><br>
> </div><br>
> <div>If we run this job on a cluster with a&nbsp;job scheduler
> an=
> d resource=3D<br>
>  manager&nbsp;our jobs will be aborted if we expect the shared
> mem=
> ory to co=3D<br>
> unt only once. So how can we work around this problem? Is the cause of
> this=
> =3D<br>
>  problem that mvapich runs different processes<br>
>  so shared memory counts fully towards each whereas openmp runs only
> o=
> ne pr=3D<br>
> ocess but multiple&nbsp;threads so the shared memory counts only once?
> =
> How =3D<br>
> could a resource manager (or the operating system) correctly determine
> memo=
> =3D<br>
> ry consumption?&nbsp;</div><br>
> <div><br><br>
> </div><br>
> <div>=3D3D=3D3D=3D3D end long version =3D3D=3D3D=3D3D</div><br>
> <div><br><br>
> </div><br>
> <div>Any thoughts and any comments are truly
> appreciated</div><=
> br>
> <div><br><br>
> </div><br>
> <div>Thanks a lot</div><br>
> <div><br><br>
> </div><br>
> <div>Benedikt</div><br>
> <div></div><br>
> <br><br>
> </span></div><br>
> <div><br><br>
> </div><br>
> <div><br><br>
> </div><br>
> <div><br><br>
> </div><br>
> <br><br>
> <p></p><br>
> </div><br>
> </body><br>
> </html><br>
> <br>
> --_000_SN1PR0701MB1856179D6E2940D9AC7322FCA4F60SN1PR0701MB1856_--<br>
> <br>
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D<br=
> >
> Content-Type: text/plain; charset=3D"us-ascii"<br>
> MIME-Version: 1.0<br>
> Content-Transfer-Encoding: 7bit<br>
> Content-Disposition: inline<br>
> <br>
> _______________________________________________<br>
> mvapich-discuss mailing list<br>
> <a href=3D"mailto:mvapich-discuss at cse.ohio-state.edu" target=3D"_blank"
> tab=
> index=3D"-1" disabled=3D"true">mvapich-discuss at cse.ohio-state.edu</a><br>
> <a href=3D"
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discu=
> ss" rel=3D"noreferrer" target=3D"_blank" tabindex=3D"-1"
> disabled=3D"true">=
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss</a><br>
> <br>
>
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D3995625848783985513=3D=3D--<=
> br>
> _______________________________________________<br>
> mvapich-discuss mailing list<br>
> <a href=3D"mailto:mvapich-discuss at cse.ohio-state.edu" target=3D"_blank"
> tab=
> index=3D"-1" disabled=3D"true">mvapich-discuss at cse.ohio-state.edu</a><br>
> <a href=3D"
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discu=
> ss" rel=3D"noreferrer" target=3D"_blank" tabindex=3D"-1"
> disabled=3D"true">=
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss</a><br>
> </div>
> </div>
> </blockquote>
> </div>
> <br>
> </div>
> </div>
> </div>
> </div>
> </div>
> </body>
> </html>
>
> --_000_SN1PR0701MB1856ED6BF530B034B1297011A4F70SN1PR0701MB1856_--
>
> --===============2141258889965369599==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============2141258889965369599==--
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160109/a249e3a7/attachment-0001.html>


More information about the mvapich-discuss mailing list