[mvapich-discuss] MPI communication problem with mvapich2-1.8a1p1

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Jan 27 18:11:28 EST 2012


I'm not sure if this is related to some interaction with
-mcmodel=medium or not.  This happens with both sets of options?  I'll
try to reproduce this build failure but can you still send a trace of
the processes when they are hanging?

Use your build options but replacing ``--enable-fast with
--disable-fast --enable-g=dbg''.

On Fri, Jan 27, 2012 at 6:01 PM, Nirmal Seenu <nirmal at fnal.gov> wrote:
> I am getting the following error during make with the options that you
> mentioned:
>
> make[4]: Entering directory
> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc'
> Making all in src
> make[5]: Entering directory
> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/src'
>  CC     topology.lo
>  CC     traversal.lo
>  CC     distances.lo
>  CC     topology-synthetic.lo
>  CC     bind.lo
>  CC     cpuset.lo
>  CC     misc.lo
>  CC     topology-xml.lo
>  CC     topology-linux.lo
>  CC     topology-x86.lo
> topology-x86.c: In function 'look_proc':
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
> error: can't find a register in class 'BREG' while reloading 'asm'
> make[5]: *** [topology-x86.lo] Error 1
> make[5]: Leaving directory
> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/src'
> make[4]: *** [all-recursive] Error 1
> make[4]: Leaving directory
> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc'
> make[3]: *** [all-recursive] Error 1
> make[3]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra'
> make[2]: *** [all-redirect] Error 1
> make[2]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src/pm'
> make[1]: *** [all-redirect] Error 2
> make[1]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src'
> make: *** [all-redirect] Error 2
>
>
> I am able to build with the options that I mentioned in my previous email
> though.
>
> Nirmal
>
>
> On 01/27/2012 04:38 PM, Jonathan Perkins wrote:
>>
>> Please try the following...
>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-mpe
>>
>> If you would like to try and provide stack traces to us use...
>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --disable-fast
>> --enable-g=dbg --enable-f77 --enable-fc --enable-cxx --enable-romio
>> --enable-mpe
>>
>> On Fri, Jan 27, 2012 at 5:31 PM, Nirmal Seenu<nirmal at fnal.gov>  wrote:
>>>
>>> Hi,
>>>
>>> I doubt that the options used to build MVAPICH2 is the problem here as
>>> the
>>> remote MPI process launch successfully and they do a little bit of
>>> communication before they hang.
>>>
>>> I use the same options to build the version mvapich2-1.2p1, mvapich2-1.5,
>>> mvapich2-1.6rc2 and mvapich2-1.6-r4751 and they all work fine.
>>>
>>> What options do I need on MVAPICH2 build to use mpiexec launcher to use
>>> TM
>>> interface to launch MPI jobs?
>>>
>>> Nirmal
>>>
>>>
>>> On 01/27/2012 03:53 PM, Jonathan Perkins wrote:
>>>>
>>>>
>>>> Hello Nirmal, sorry to hear that you're having trouble.  Let me
>>>> suggest that you remove some of the options that you've specified at
>>>> the configure step.  We no longer support MPD so you should remove the
>>>> --enable-pmiport and --with-pm=mpd options.  I actually think it'll be
>>>> simpler for you to remove more options and then only add an option if
>>>> you need it and things are working.
>>>>
>>>> Please try the following configuration for MVAPICH2 and let us know if
>>>> you still have trouble or not.
>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>>>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-mpe
>>>>
>>>> On Fri, Jan 27, 2012 at 3:57 PM, Nirmal Seenu<nirmal at fnal.gov>    wrote:
>>>>>
>>>>>
>>>>> I am having trouble running the Intel MPI Benchmark(IMB_3.2.3 where I
>>>>> run
>>>>> IMB-MPI1 without any options) on the latest version of
>>>>> MVAPICH2-1.8a1p1.
>>>>>
>>>>> The MPI process gets launched properly on the worker nodes but the
>>>>> benchmark
>>>>> hangs within a few seconds after the launch and doesn't make any
>>>>> progress. I
>>>>> checked the infiniband fabric and everything is healthy. We mount
>>>>> Lustre
>>>>> over native IB on all the worker nodes and the lustre mounts are
>>>>> healthy
>>>>> as
>>>>> well.
>>>>>
>>>>> This reproducible on MVAPICH2 compiled with GCC and PGI compiler 11.7
>>>>> as
>>>>> well.
>>>>>
>>>>> Details about the installation:
>>>>>
>>>>> The worker nodes run RHEL 5.3 with the latest kernel
>>>>> 2.6.18-274.17.1.el5
>>>>> and
>>>>> we use the Infiniband drivers that are distributed as a part of the
>>>>> kernel.
>>>>>
>>>>> MVAPICH2 gcc version was compiled with the following compiler:
>>>>> gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
>>>>>
>>>>> The following were the options used to compile the MVAPICH2 and the
>>>>> MPIEXEC:
>>>>>
>>>>> export CC=gcc
>>>>> export CXX=g++
>>>>> export F77=gfortran
>>>>> export FC=gfortran
>>>>>
>>>>> export CFLAGS=-mcmodel=medium
>>>>> export CXXFLAGS=-mcmodel=medium
>>>>> export FFLAGS=-mcmodel=medium
>>>>> export FCFLAGS=-mcmodel=medium
>>>>> export LDFLAGS=-mcmodel=medium
>>>>>
>>>>> MVAPICH2:
>>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>>>>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-pmiport
>>>>> --enable-mpe --with-pm=mpd --with-pmi=simple --with-thread-package
>>>>> --with-hwloc
>>>>>
>>>>> MPIEXEC:
>>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc
>>>>> --with-pbs=/usr/local/pbs
>>>>> --with-mpicc=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicc
>>>>> --with-mpicxx=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicxx
>>>>> --with-mpif77=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif77
>>>>> --with-mpif90=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif90
>>>>> --disable-mpich-gm
>>>>> --disable-mpich-p4 --disable-mpich-rai --with-default-comm=pmi
>>>>>
>>>>> I was able to run the Intel MPI Benchmark using the following versions
>>>>> of
>>>>> MVAPICH2 that was compiled with the same version of gcc:
>>>>> mvapich2-1.2p1
>>>>> mvapich2-1.5
>>>>> mvapich2-1.6rc2
>>>>> mvapich2-1.6-r4751
>>>>>
>>>>> I will be more than happy to provide more details if needed. Thanks in
>>>>> advance for looking into this problem.
>>>>>
>>>>> Nirmal
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list