[mvapich-discuss] mvapich2_munmap

Krishna Chaitanya Kandalla kandalla at cse.ohio-state.edu
Thu Dec 3 22:37:11 EST 2009


Burlen,
         Sorry to know that the problem persists even with mvapich2-1.4. 
Can you please re-configure and re-build the library with the 
config-time flag : --disable-registration-cache. This will turn this 
feature off completely and you will be using the default memory related 
functions.
         Its very surprising that your application is failing inside 
MPI_Init itself. We have tested the release version with Intel 
compilers, but we have not see such an issue before. Can you also give 
us some more information about the compiler version, operating system 
and anything related to your hardware? 

Thanks,
Krishna

burlen wrote:
> Hi Krishna,
>
> I built mvapich2-1.4 today. bad news man, I got the same problem.
>
> With mvapich2-1.4 the program crashes right off with a segfault, and a 
> very similar stack as the mvapich2-1.2p1 build (see below). In both the
> builds an intel compiler has been used (just to be sure to mention). The
> stack showed that a call to free() initiated the issue. Any ideas?
>
>    Program received signal SIGSEGV, Segmentation fault.
>    [Switching to Thread 46912874878096 (LWP 28347)]
>    0x00002aaaaaddffcf in find_and_free_dregs_inside ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    (gdb)
>    where
>    #0  0x00002aaaaaddffcf in find_and_free_dregs_inside
>    ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #1  0x00002aaaaadcd73b in mvapich2_mem_unhook
>    ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #2  0x00002aaaaadcd77a in mvapich2_munmap
>    ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #3  0x00002aaaaf3cc37c in munmap
>    ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>    #4  0x00002aaaaadcd78f in mvapich2_munmap ()
>
>    ... repeated mvapich2_munmap , munmap  sequence
>
>    #16567 0x00002aaaaf3cc37c in munmap ()
>       from /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>    #16568 0x00002aaaaadcd78f in mvapich2_munmap ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16569 0x00002aaaaf3cc37c in munmap ()
>       from /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>    #16570 0x00002aaaaadcd78f in mvapich2_munmap ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16571 0x00002aaaaadc7ad5 in free ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16572 0x00002aaaaadd686a in MPIDI_CH3I_SMP_init ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16573 0x00002aaaaae49d24 in MPIDI_CH3_Init ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16574 0x00002aaaaae0b3fd in MPID_Init ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16575 0x00002aaaaae33d40 in MPIR_Init_thread ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>    #16576 0x00002aaaac6118ff in PMPI_Init ()
>       from
>    /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerManager.so
>    #16577 0x00002aaaab5230a8 in vtkPVMain::Initialize 
> (argc=0x7fffffffdb00,
>        argv=0x7fffffffdab0)
>        at
>    /u/burlen/ParaView/ParaView3-3.7/Servers/Filters/vtkPVMain.cxx:107
>    #16578 0x00000000004027bd in main (argc=3, argv=0x7fffffffdbf8)
>        at
>    /u/burlen/ParaView/ParaView3-3.7/Servers/Executables/pvserver.cxx:30
>
>
>
> Krishna Chaitanya Kandalla wrote:
>> I am guessing that as long as you use the right InfiniBand related 
>> paths, everything should be fine. You can build mvapich2-1.4rc1 
>> locally instead and for that you wont be needing any sudo permissions.
>>
>> Krishna
>>
>> burlen wrote:
>>> right I did say that, sorry for the confusion. When you said that I 
>>> wondered/hoped you might have seen something else that suggested the 
>>> wrong library was linked in. I am all for upgrading to the latest, 
>>> but I'm not a sys admin on this system and I don't know the details 
>>> of the hardware. So if I built the new release with the same 
>>> configure options that were used on the current build will the 
>>> infiniband stuff just work? or do I have to have access to drivers 
>>> etc.? I never built mvapich before :)
>>>
>>> Krishna Chaitanya Kandalla wrote:
>>>> Burlen,
>>>> In your first mail, you had mentioned :
>>>> > I have this strange situation when running paraview on a 
>>>> particular build/install/revision of mvapich.
>>>>
>>>> So, I concluded that you were using mvapich and not mvapich2.  But, 
>>>> its still not very clear as to why you are seeing a seg-fault 
>>>> inside the function find_and_free_dregs(), with this flag on. I can 
>>>> think of a few options to move ahead. You can try out the 1.4 
>>>> version of mvapich2 that we released a few weeks ago. 1.2p1 is 
>>>> quite old. If you get the same failure even with 1.4, would it be 
>>>> possible for you to point us to where this application can be found 
>>>> so that we can reproduce it on our cluster?
>>>>
>>>> Thanks,
>>>> Krishna
>>>>
>>>>
>>>>
>>>>
>>>> burlen wrote:
>>>>> I get the same problem (as initially reported) using 
>>>>> VIADEV_USE_DREG_CACHE, but for sure it's mvapich2.
>>>>>
>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>> Burlen,
>>>>>>          I just noticed that you are using MVAPICH and not 
>>>>>> MVAPICH2. The equivalent flag on MVAPICH is 
>>>>>> VIADEV_USE_DREG_CACHE. So, please set this flag to 0 instead of 
>>>>>> the MV2_* flag. I am sorry for the confusion.
>>>>>>
>>>>>> Thanks,
>>>>>> Krishna
>>>>>>
>>>>>> burlen wrote:
>>>>>>> OK, I didn't use the mpirun_rsh before because it doesn't pass 
>>>>>>> some of the environment vars through. So with mpirun_rsh method, 
>>>>>>> without the MV2_USE_LAZY_MEM_UNREGISTER flag, I get the same 
>>>>>>> result as before, but with set to 0 I now have a segfault:
>>>>>>>
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>> [Switching to Thread 46912793699472 (LWP 24718)]
>>>>>>> 0x00002aaaaadae366 in find_and_free_dregs_inside () from 
>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>
>>>>>>> (gdb) where
>>>>>>> #0  0x00002aaaaadae366 in find_and_free_dregs_inside () from 
>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>
>>>>>>> Cannot access memory at address 0x7fffedb06ff0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>>> Burlen,
>>>>>>>>           In MVAPICH2, we use the mpirun_rsh feature for 
>>>>>>>> job-launch.
>>>>>>>>           So, for the default configuration, you would be doing 
>>>>>>>> something like :
>>>>>>>>
>>>>>>>> mpirun_rsh -np 1 pvserver --server-port=50001 
>>>>>>>> --use-offscreen-rendering
>>>>>>>>
>>>>>>>>           But, to turn off this memory optimization feature, 
>>>>>>>> you can do :
>>>>>>>> mpirun_rsh -np 1 MV2_USE_LAZY_MEM_UNREGISTER=0 pvserver 
>>>>>>>> --server-port=50001 --use-offscreen-rendering
>>>>>>>>
>>>>>>>>           Please let us know if either of there is any 
>>>>>>>> difference in the behavior across these two cases..
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Krishna
>>>>>>>>
>>>>>>>> burlen wrote:
>>>>>>>>> Maybe it was a coincidence that it seemed to die faster...
>>>>>>>>>
>>>>>>>>> r50i1n14:~$export MV2_USE_LAZY_MEM_UNREGISTER=0
>>>>>>>>> r50i1n14:~$mpiexec -np 1 pvserver --server-port=50001 
>>>>>>>>> --use-offscreen-rendering
>>>>>>>>>
>>>>>>>>> is that right?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>>>>> Burlen,
>>>>>>>>>>          Thats very strange. With this flag set to 0, one of 
>>>>>>>>>> our memory optimizations is turned off and our memory 
>>>>>>>>>> foot-print should actually get better. Can you also let us 
>>>>>>>>>> know how you are running the job? This flag should appear 
>>>>>>>>>> before the name of the executable that you are trying to run.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Krishna
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> burlen wrote:
>>>>>>>>>>> Hi Krishna, I tried it, but it didn't seem to help. Now the 
>>>>>>>>>>> available ram was exhausted very quickly. way faster than 
>>>>>>>>>>> before. The node quickly became unresponsive, gdb never 
>>>>>>>>>>> finished starting, and the job was killed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>>>>>>> Burlen,
>>>>>>>>>>>>          Can you run your application with the run-time 
>>>>>>>>>>>> flag MV2_USE_LAZY_MEM_UNREGISTER=0.  This might lead to 
>>>>>>>>>>>> slightly poorer performance, but can help us narrow down 
>>>>>>>>>>>> the problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Krishna
>>>>>>>>>>>>
>>>>>>>>>>>> burlen wrote:
>>>>>>>>>>>>> I have this strange situation when running paraview on a 
>>>>>>>>>>>>> particular build/install/revision of mvapich. Shortly 
>>>>>>>>>>>>> after paraview starts up it hangs, and watching in top I 
>>>>>>>>>>>>> see memory grow before it's killed for using too much. 
>>>>>>>>>>>>> Attaching a debugger I see what looks like an infinite 
>>>>>>>>>>>>> recursion. It's only happened to me using this particular 
>>>>>>>>>>>>> build of mvapich which happens to be the only one on this 
>>>>>>>>>>>>> system.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just curious if anyone has seen anything like this before?
>>>>>>>>>>>>>
>>>>>>>>>>>>>    (gdb)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> where                                                                                                
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #0  0x00002aaaaadbb25b in avlfindex () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #1  0x00002aaaaadae427 in find_and_free_dregs_inside () 
>>>>>>>>>>>>> from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #2  0x00002aaaaad9d1f9 in mvapich2_mem_unhook () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #3  0x00002aaaaad9d244 in mvapich2_munmap () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #4  0x00002aaaadfa88c6 in munmap () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>>>>>>>>>>>>>    #5  0x00002aaaaad9d259 in mvapich2_munmap () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #73059 0x00002aaaadfa88c6 in munmap () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>>>>>>>>>>>>>    #73060 0x00002aaaaad9d259 in mvapich2_munmap () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #73061 0x00002aaaaad979a1 in free () from
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so 
>>>>>>>>>>>>>
>>>>>>>>>>>>>    #73062 0x00002aaaae441e7e in icetResizeBuffer 
>>>>>>>>>>>>> (size=91607685) at
>>>>>>>>>>>>>    
>>>>>>>>>>>>> /u/burlen/ParaView/ParaView3-3.7/Utilities/IceT/src/ice-t/context.c:129 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> mvapich info:
>>>>>>>>>>>>> Version:          1.2p1.
>>>>>>>>>>>>> Compiled with:    Intel version 11.0.074
>>>>>>>>>>>>> Configured with:  --prefix=/nasa/mvapich2/1.2p1/intel 
>>>>>>>>>>>>> --enable-f77 --enable-f90
>>>>>>>>>>>>>                  --enable-cxx --enable-mpe --enable-romio 
>>>>>>>>>>>>> --enable-threads=multiple
>>>>>>>>>>>>>                  --with-rdma=gen2
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  CFLAGS = -fPIC
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> mvapich-discuss mailing list
>>>>>>>>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>>>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>


More information about the mvapich-discuss mailing list