[mvapich-discuss] mvapich2_munmap
burlen
burlen.loring at gmail.com
Thu Dec 3 21:33:55 EST 2009
Hi Krishna,
I built mvapich2-1.4 today. bad news man, I got the same problem.
With mvapich2-1.4 the program crashes right off with a segfault, and a
very similar stack as the mvapich2-1.2p1 build (see below). In both the
builds an intel compiler has been used (just to be sure to mention). The
stack showed that a call to free() initiated the issue. Any ideas?
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 46912874878096 (LWP 28347)]
0x00002aaaaaddffcf in find_and_free_dregs_inside ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
(gdb)
where
#0 0x00002aaaaaddffcf in find_and_free_dregs_inside
()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#1 0x00002aaaaadcd73b in mvapich2_mem_unhook
()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#2 0x00002aaaaadcd77a in mvapich2_munmap
()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#3 0x00002aaaaf3cc37c in munmap
()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
#4 0x00002aaaaadcd78f in mvapich2_munmap ()
... repeated mvapich2_munmap , munmap sequence
#16567 0x00002aaaaf3cc37c in munmap ()
from /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
#16568 0x00002aaaaadcd78f in mvapich2_munmap ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16569 0x00002aaaaf3cc37c in munmap ()
from /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
#16570 0x00002aaaaadcd78f in mvapich2_munmap ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16571 0x00002aaaaadc7ad5 in free ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16572 0x00002aaaaadd686a in MPIDI_CH3I_SMP_init ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16573 0x00002aaaaae49d24 in MPIDI_CH3_Init ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16574 0x00002aaaaae0b3fd in MPID_Init ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16575 0x00002aaaaae33d40 in MPIR_Init_thread ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
#16576 0x00002aaaac6118ff in PMPI_Init ()
from
/u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerManager.so
#16577 0x00002aaaab5230a8 in vtkPVMain::Initialize (argc=0x7fffffffdb00,
argv=0x7fffffffdab0)
at
/u/burlen/ParaView/ParaView3-3.7/Servers/Filters/vtkPVMain.cxx:107
#16578 0x00000000004027bd in main (argc=3, argv=0x7fffffffdbf8)
at
/u/burlen/ParaView/ParaView3-3.7/Servers/Executables/pvserver.cxx:30
Krishna Chaitanya Kandalla wrote:
> I am guessing that as long as you use the right InfiniBand related
> paths, everything should be fine. You can build mvapich2-1.4rc1
> locally instead and for that you wont be needing any sudo permissions.
>
> Krishna
>
> burlen wrote:
>> right I did say that, sorry for the confusion. When you said that I
>> wondered/hoped you might have seen something else that suggested the
>> wrong library was linked in. I am all for upgrading to the latest,
>> but I'm not a sys admin on this system and I don't know the details
>> of the hardware. So if I built the new release with the same
>> configure options that were used on the current build will the
>> infiniband stuff just work? or do I have to have access to drivers
>> etc.? I never built mvapich before :)
>>
>> Krishna Chaitanya Kandalla wrote:
>>> Burlen,
>>> In your first mail, you had mentioned :
>>> > I have this strange situation when running paraview on a
>>> particular build/install/revision of mvapich.
>>>
>>> So, I concluded that you were using mvapich and not mvapich2. But,
>>> its still not very clear as to why you are seeing a seg-fault inside
>>> the function find_and_free_dregs(), with this flag on. I can think
>>> of a few options to move ahead. You can try out the 1.4 version of
>>> mvapich2 that we released a few weeks ago. 1.2p1 is quite old. If
>>> you get the same failure even with 1.4, would it be possible for you
>>> to point us to where this application can be found so that we can
>>> reproduce it on our cluster?
>>>
>>> Thanks,
>>> Krishna
>>>
>>>
>>>
>>>
>>> burlen wrote:
>>>> I get the same problem (as initially reported) using
>>>> VIADEV_USE_DREG_CACHE, but for sure it's mvapich2.
>>>>
>>>> Krishna Chaitanya Kandalla wrote:
>>>>> Burlen,
>>>>> I just noticed that you are using MVAPICH and not
>>>>> MVAPICH2. The equivalent flag on MVAPICH is VIADEV_USE_DREG_CACHE.
>>>>> So, please set this flag to 0 instead of the MV2_* flag. I am
>>>>> sorry for the confusion.
>>>>>
>>>>> Thanks,
>>>>> Krishna
>>>>>
>>>>> burlen wrote:
>>>>>> OK, I didn't use the mpirun_rsh before because it doesn't pass
>>>>>> some of the environment vars through. So with mpirun_rsh method,
>>>>>> without the MV2_USE_LAZY_MEM_UNREGISTER flag, I get the same
>>>>>> result as before, but with set to 0 I now have a segfault:
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> [Switching to Thread 46912793699472 (LWP 24718)]
>>>>>> 0x00002aaaaadae366 in find_and_free_dregs_inside () from
>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>> (gdb) where
>>>>>> #0 0x00002aaaaadae366 in find_and_free_dregs_inside () from
>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>> Cannot access memory at address 0x7fffedb06ff0
>>>>>>
>>>>>>
>>>>>>
>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>> Burlen,
>>>>>>> In MVAPICH2, we use the mpirun_rsh feature for
>>>>>>> job-launch.
>>>>>>> So, for the default configuration, you would be doing
>>>>>>> something like :
>>>>>>>
>>>>>>> mpirun_rsh -np 1 pvserver --server-port=50001
>>>>>>> --use-offscreen-rendering
>>>>>>>
>>>>>>> But, to turn off this memory optimization feature, you
>>>>>>> can do :
>>>>>>> mpirun_rsh -np 1 MV2_USE_LAZY_MEM_UNREGISTER=0 pvserver
>>>>>>> --server-port=50001 --use-offscreen-rendering
>>>>>>>
>>>>>>> Please let us know if either of there is any
>>>>>>> difference in the behavior across these two cases..
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Krishna
>>>>>>>
>>>>>>> burlen wrote:
>>>>>>>> Maybe it was a coincidence that it seemed to die faster...
>>>>>>>>
>>>>>>>> r50i1n14:~$export MV2_USE_LAZY_MEM_UNREGISTER=0
>>>>>>>> r50i1n14:~$mpiexec -np 1 pvserver --server-port=50001
>>>>>>>> --use-offscreen-rendering
>>>>>>>>
>>>>>>>> is that right?
>>>>>>>>
>>>>>>>>
>>>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>>>> Burlen,
>>>>>>>>> Thats very strange. With this flag set to 0, one of
>>>>>>>>> our memory optimizations is turned off and our memory
>>>>>>>>> foot-print should actually get better. Can you also let us
>>>>>>>>> know how you are running the job? This flag should appear
>>>>>>>>> before the name of the executable that you are trying to run.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Krishna
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> burlen wrote:
>>>>>>>>>> Hi Krishna, I tried it, but it didn't seem to help. Now the
>>>>>>>>>> available ram was exhausted very quickly. way faster than
>>>>>>>>>> before. The node quickly became unresponsive, gdb never
>>>>>>>>>> finished starting, and the job was killed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Krishna Chaitanya Kandalla wrote:
>>>>>>>>>>> Burlen,
>>>>>>>>>>> Can you run your application with the run-time flag
>>>>>>>>>>> MV2_USE_LAZY_MEM_UNREGISTER=0. This might lead to slightly
>>>>>>>>>>> poorer performance, but can help us narrow down the problem.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Krishna
>>>>>>>>>>>
>>>>>>>>>>> burlen wrote:
>>>>>>>>>>>> I have this strange situation when running paraview on a
>>>>>>>>>>>> particular build/install/revision of mvapich. Shortly after
>>>>>>>>>>>> paraview starts up it hangs, and watching in top I see
>>>>>>>>>>>> memory grow before it's killed for using too much.
>>>>>>>>>>>> Attaching a debugger I see what looks like an infinite
>>>>>>>>>>>> recursion. It's only happened to me using this particular
>>>>>>>>>>>> build of mvapich which happens to be the only one on this
>>>>>>>>>>>> system.
>>>>>>>>>>>>
>>>>>>>>>>>> Just curious if anyone has seen anything like this before?
>>>>>>>>>>>>
>>>>>>>>>>>> (gdb)
>>>>>>>>>>>>
>>>>>>>>>>>> where
>>>>>>>>>>>>
>>>>>>>>>>>> #0 0x00002aaaaadbb25b in avlfindex () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #1 0x00002aaaaadae427 in find_and_free_dregs_inside ()
>>>>>>>>>>>> from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #2 0x00002aaaaad9d1f9 in mvapich2_mem_unhook () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #3 0x00002aaaaad9d244 in mvapich2_munmap () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #4 0x00002aaaadfa88c6 in munmap () from
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>>>>>>>>>>>> #5 0x00002aaaaad9d259 in mvapich2_munmap () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> #73059 0x00002aaaadfa88c6 in munmap () from
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libicet_mpi.so
>>>>>>>>>>>> #73060 0x00002aaaaad9d259 in mvapich2_munmap () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #73061 0x00002aaaaad979a1 in free () from
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/apps/PV3-3.7-D-IV/lib/paraview-3.7/libvtkPVServerCommon.so
>>>>>>>>>>>>
>>>>>>>>>>>> #73062 0x00002aaaae441e7e in icetResizeBuffer
>>>>>>>>>>>> (size=91607685) at
>>>>>>>>>>>>
>>>>>>>>>>>> /u/burlen/ParaView/ParaView3-3.7/Utilities/IceT/src/ice-t/context.c:129
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> mvapich info:
>>>>>>>>>>>> Version: 1.2p1.
>>>>>>>>>>>> Compiled with: Intel version 11.0.074
>>>>>>>>>>>> Configured with: --prefix=/nasa/mvapich2/1.2p1/intel
>>>>>>>>>>>> --enable-f77 --enable-f90
>>>>>>>>>>>> --enable-cxx --enable-mpe --enable-romio
>>>>>>>>>>>> --enable-threads=multiple
>>>>>>>>>>>> --with-rdma=gen2
>>>>>>>>>>>>
>>>>>>>>>>>> CFLAGS = -fPIC
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> mvapich-discuss mailing list
>>>>>>>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
More information about the mvapich-discuss
mailing list