[mvapich-discuss] MVAPICH2-2.3b hwloc topology issue

Doug Johnson djohnson at osc.edu
Tue Dec 19 20:21:27 EST 2017


Hi Sourav,

Sorry for the slow reply, this patch has fixed the problem for us.

Best,
Doug

Sourav Chakraborty <chakraborty.52 at buckeyemail.osu.edu> writes:

> Hi Doug,
>
> The attached patch should fix the issue for Hydra. It'll be included in the upcoming MVAPICH2-2.3rc1 release.
>
> Thanks,
> Sourav
>
> On Tue, Nov 28, 2017 at 3:59 PM, Doug Johnson <djohnson at osc.edu> wrote:
>
>  Hi Sourav,
>
>  I do not see the problem when using mpirun_rsh, but I do when using
>  mpiexec hydra.
>
>  Doug
>
>  Sourav Chakraborty <chakraborty.52 at buckeyemail.osu.edu> writes:
>
>  > Hi Doug,
>  >
>  > The xml file is created by MVAPICH2 during startup. The error could be due to some permission issue while
>  > accessing the /tmp directory.
>  >
>  > We tried out the same build options with 2.3b on Owens but were not able to reproduce the issue. Please
>  let us
>  > know if the error persists.
>  >
>  > -bash-4.2$ ./bin/mpichversion
>  > MVAPICH2 Version: 2.3b
>  > MVAPICH2 Release date: Thu Aug 10 22:00:00 EST 2017
>  > MVAPICH2 Device: ch3:mrail
>  > MVAPICH2 configure: --prefix=/users/PZS0622/osc1006/devel/mvapich2/install --enable-shared --with-mpe
>  > --enable-romio --enable-mpit-pvars=mv2 --disable-option-checking --with-file-system=ufs+nfs+gpfs
>  > --with-pbs=/opt/torque --with-pbs-lib=/opt/torque/lib64 --with-pbs-include=/opt/torque/include CC=icc
>  CXX=icpc
>  > FC=ifort
>  > MVAPICH2 CC: icc -DNDEBUG -DNVALGRIND -O2
>  > MVAPICH2 CXX: icpc -DNDEBUG -DNVALGRIND -O2
>  > MVAPICH2 F77: ifort -L/lib -L/lib -O2
>  > MVAPICH2 FC: ifort -O2
>  >
>  > -bash-4.2$ ./bin/mpirun_rsh -np 56 -hostfile $PBS_NODEFILE MV2_HCA_AWARE_PROCESS_MAPPING=0
>  > ./libexec/osu-micro-benchmarks/mpi/startup/osu_init
>  > # OSU MPI Init Test v5.3.2
>  > nprocs: 56, min: 885 ms, max: 934 ms, avg: 905 ms
>  > -bash-4.2$
>  >
>  > Thanks,
>  > Sourav
>  >
>  > On Mon, Nov 27, 2017 at 2:26 PM, Doug Johnson <djohnson at osc.edu> wrote:
>  >
>  > Hi,
>  >
>  > I'm testing 2.3b and have encountered a problem with the new hwloc
>  > topology routines in src/mpid/ch3/channels/common/src/affinity/hwloc_bind.c.
>  > Simple tests fail to start, hanging due to missing files under /tmp.
>  > The strace command shows the following.
>  >
>  > access("/tmp/mv2-hwloc-kvs_28818_0-o0433.ten.osc.edu-6758.xml", F_OK) = -1 ENOENT (No such file or
>  > directory)
>  >
>  > The xml file is missing on all the nodes where I'm attempting to launch
>  > the job. Are there possibly dependencies that I'm missing for this file
>  > to be created successfully? (don't have time to step through the code.)
>  >
>  > I was able to continue my tests successfully by setting
>  > MV2_BCAST_HWLOC_TOPOLOGY=0.
>  >
>  > Output from mpichversion looks like the following.
>  >
>  > MVAPICH2 Version: 2.3b
>  > MVAPICH2 Release date: Thu Aug 10 22:00:00 EST 2017
>  > MVAPICH2 Device: ch3:mrail
>  > MVAPICH2 configure: --prefix=/opt/mvapich2/intel/17.0/2.3b --enable-shared --with-mpe --enable-romio
>  > --enable-mpit-pvars=mv2 --disable-option-checking --with-file-system=ufs+nfs+gpfs
>  --with-pbs=/opt/torque
>  > --with-pbs-lib=/opt/torque/lib64 --with-pbs-include=/opt/torque/include
>  > MVAPICH2 CC: icc -DNDEBUG -DNVALGRIND -O2
>  > MVAPICH2 CXX: icpc -DNDEBUG -DNVALGRIND -O2
>  > MVAPICH2 F77: ifort -L/lib -L/lib -O2
>  > MVAPICH2 FC: ifort -O2
>  >
>  > Best,
>  > Doug
>  > _______________________________________________
>  > mvapich-discuss mailing list
>  > mvapich-discuss at cse.ohio-state.edu
>  > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list