[mvapich-discuss] MPI communication problem with mvapich2-1.8a1p1

Nirmal Seenu nirmal at fnal.gov
Fri Jan 27 19:02:28 EST 2012


I couldn't launch the MPI process with the version that was build with 
--disable-fast --enable-g=dbg options and it fails with the following 
error message:

[nirmal at cci001 ~]$ export 
PATH=/usr/local/mvapich2-1.8a1p1-gcc-test/bin:$PATH
[nirmal at cci001 mvapich2-1.8a1p1-gcc-test]$ mpiexec ./IMB-MPI1
Assertion failed in file mpid_vc.c at line 840: *max_id_p >= 0
[cli_0]: aborting job:
internal ABORT - process 0
Assertion failed in file mpid_vc.c at line 840: *max_id_p >= 0
[cli_4]: aborting job:
internal ABORT - process 0


In the other version that was build with --enable-fast consistently 
hangs while running IMB after it completes ping pong and sendrecv on 2 
and 4 processes successfully:

[nirmal at cci001 ~]$ export PATH=/usr/local/mvapich2-1.8a1p1-gcc/bin:$PATH
[nirmal at cci001 ~]$ cd run-imb/mvapich2-1.8a1p1-gcc
[nirmal at cci001 mvapich2-1.8a1p1-gcc]$ mpiexec ./IMB-MPI1
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date                  : Fri Jan 27 17:49:15 2012
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.18-274.17.1.el5
# Version               : #1 SMP Tue Jan 10 16:13:44 EST 2012
# MPI Version           : 2.2
# MPI Thread Environment:
...
...
...
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 60 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
Mbytes/sec
...
...
       2097152           20      3723.20      3726.20      3724.86 
1073.48
       4194304           10      7400.30      7424.88      7414.66 
1077.46


Nirmal

On 1/27/2012 5:11 PM, Jonathan Perkins wrote:
> I'm not sure if this is related to some interaction with
> -mcmodel=medium or not.  This happens with both sets of options?  I'll
> try to reproduce this build failure but can you still send a trace of
> the processes when they are hanging?
>
> Use your build options but replacing ``--enable-fast with
> --disable-fast --enable-g=dbg''.
>
> On Fri, Jan 27, 2012 at 6:01 PM, Nirmal Seenu<nirmal at fnal.gov>  wrote:
>> I am getting the following error during make with the options that you
>> mentioned:
>>
>> make[4]: Entering directory
>> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc'
>> Making all in src
>> make[5]: Entering directory
>> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/src'
>>   CC     topology.lo
>>   CC     traversal.lo
>>   CC     distances.lo
>>   CC     topology-synthetic.lo
>>   CC     bind.lo
>>   CC     cpuset.lo
>>   CC     misc.lo
>>   CC     topology-xml.lo
>>   CC     topology-linux.lo
>>   CC     topology-x86.lo
>> topology-x86.c: In function 'look_proc':
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> /usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/cpuid.h:54:
>> error: can't find a register in class 'BREG' while reloading 'asm'
>> make[5]: *** [topology-x86.lo] Error 1
>> make[5]: Leaving directory
>> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc/src'
>> make[4]: *** [all-recursive] Error 1
>> make[4]: Leaving directory
>> `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra/tools/topo/hwloc/hwloc'
>> make[3]: *** [all-recursive] Error 1
>> make[3]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src/pm/hydra'
>> make[2]: *** [all-redirect] Error 1
>> make[2]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src/pm'
>> make[1]: *** [all-redirect] Error 2
>> make[1]: Leaving directory `/usr/local/src/mvapich2-1.8a1p1/src'
>> make: *** [all-redirect] Error 2
>>
>>
>> I am able to build with the options that I mentioned in my previous email
>> though.
>>
>> Nirmal
>>
>>
>> On 01/27/2012 04:38 PM, Jonathan Perkins wrote:
>>>
>>> Please try the following...
>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-mpe
>>>
>>> If you would like to try and provide stack traces to us use...
>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --disable-fast
>>> --enable-g=dbg --enable-f77 --enable-fc --enable-cxx --enable-romio
>>> --enable-mpe
>>>
>>> On Fri, Jan 27, 2012 at 5:31 PM, Nirmal Seenu<nirmal at fnal.gov>    wrote:
>>>>
>>>> Hi,
>>>>
>>>> I doubt that the options used to build MVAPICH2 is the problem here as
>>>> the
>>>> remote MPI process launch successfully and they do a little bit of
>>>> communication before they hang.
>>>>
>>>> I use the same options to build the version mvapich2-1.2p1, mvapich2-1.5,
>>>> mvapich2-1.6rc2 and mvapich2-1.6-r4751 and they all work fine.
>>>>
>>>> What options do I need on MVAPICH2 build to use mpiexec launcher to use
>>>> TM
>>>> interface to launch MPI jobs?
>>>>
>>>> Nirmal
>>>>
>>>>
>>>> On 01/27/2012 03:53 PM, Jonathan Perkins wrote:
>>>>>
>>>>>
>>>>> Hello Nirmal, sorry to hear that you're having trouble.  Let me
>>>>> suggest that you remove some of the options that you've specified at
>>>>> the configure step.  We no longer support MPD so you should remove the
>>>>> --enable-pmiport and --with-pm=mpd options.  I actually think it'll be
>>>>> simpler for you to remove more options and then only add an option if
>>>>> you need it and things are working.
>>>>>
>>>>> Please try the following configuration for MVAPICH2 and let us know if
>>>>> you still have trouble or not.
>>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>>>>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-mpe
>>>>>
>>>>> On Fri, Jan 27, 2012 at 3:57 PM, Nirmal Seenu<nirmal at fnal.gov>      wrote:
>>>>>>
>>>>>>
>>>>>> I am having trouble running the Intel MPI Benchmark(IMB_3.2.3 where I
>>>>>> run
>>>>>> IMB-MPI1 without any options) on the latest version of
>>>>>> MVAPICH2-1.8a1p1.
>>>>>>
>>>>>> The MPI process gets launched properly on the worker nodes but the
>>>>>> benchmark
>>>>>> hangs within a few seconds after the launch and doesn't make any
>>>>>> progress. I
>>>>>> checked the infiniband fabric and everything is healthy. We mount
>>>>>> Lustre
>>>>>> over native IB on all the worker nodes and the lustre mounts are
>>>>>> healthy
>>>>>> as
>>>>>> well.
>>>>>>
>>>>>> This reproducible on MVAPICH2 compiled with GCC and PGI compiler 11.7
>>>>>> as
>>>>>> well.
>>>>>>
>>>>>> Details about the installation:
>>>>>>
>>>>>> The worker nodes run RHEL 5.3 with the latest kernel
>>>>>> 2.6.18-274.17.1.el5
>>>>>> and
>>>>>> we use the Infiniband drivers that are distributed as a part of the
>>>>>> kernel.
>>>>>>
>>>>>> MVAPICH2 gcc version was compiled with the following compiler:
>>>>>> gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
>>>>>>
>>>>>> The following were the options used to compile the MVAPICH2 and the
>>>>>> MPIEXEC:
>>>>>>
>>>>>> export CC=gcc
>>>>>> export CXX=g++
>>>>>> export F77=gfortran
>>>>>> export FC=gfortran
>>>>>>
>>>>>> export CFLAGS=-mcmodel=medium
>>>>>> export CXXFLAGS=-mcmodel=medium
>>>>>> export FFLAGS=-mcmodel=medium
>>>>>> export FCFLAGS=-mcmodel=medium
>>>>>> export LDFLAGS=-mcmodel=medium
>>>>>>
>>>>>> MVAPICH2:
>>>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast
>>>>>> --enable-f77 --enable-fc --enable-cxx --enable-romio --enable-pmiport
>>>>>> --enable-mpe --with-pm=mpd --with-pmi=simple --with-thread-package
>>>>>> --with-hwloc
>>>>>>
>>>>>> MPIEXEC:
>>>>>> ./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc
>>>>>> --with-pbs=/usr/local/pbs
>>>>>> --with-mpicc=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicc
>>>>>> --with-mpicxx=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicxx
>>>>>> --with-mpif77=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif77
>>>>>> --with-mpif90=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif90
>>>>>> --disable-mpich-gm
>>>>>> --disable-mpich-p4 --disable-mpich-rai --with-default-comm=pmi
>>>>>>
>>>>>> I was able to run the Intel MPI Benchmark using the following versions
>>>>>> of
>>>>>> MVAPICH2 that was compiled with the same version of gcc:
>>>>>> mvapich2-1.2p1
>>>>>> mvapich2-1.5
>>>>>> mvapich2-1.6rc2
>>>>>> mvapich2-1.6-r4751
>>>>>>
>>>>>> I will be more than happy to provide more details if needed. Thanks in
>>>>>> advance for looking into this problem.
>>>>>>
>>>>>> Nirmal
>>>>>> _______________________________________________
>>>>>> mvapich-discuss mailing list
>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>


More information about the mvapich-discuss mailing list