[mvapich-discuss] Hang in free

Hashmi, Jahanzeb hashmi.29 at buckeyemail.osu.edu
Wed Mar 25 13:04:13 EDT 2020


The issue was discussed with the user and using LD_PRELOAD=<path-to-libmpi.so> before launch command fixed the issue without any performance penalty. This is an issue when applications use custom memory allocators instead of glibc which causes interference with mvapich2's memory registration cache. More details are available in the mvapich2 userguide (Section 9.1)

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.3-userguide.html


Regards,

Jahanzeb

________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Khuvis, Samuel <skhuvis at osc.edu>
Sent: Tuesday, March 24, 2020 4:10 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Hang in free


Hi,



A code I am working on is hanging with MVAPICH2 on multiple nodes but not with Intel MPI or with MVAPICH2 on a single node. Based on the backtrace below it seems to be hanging inside of a free. I found a thread from 2009 (http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-December/002654.html) that suggested setting MV2_USE_LAZY_MEM_UNREGISTER=0 at runtime. This seems to prevent the hang from occurring but could potentially hurt performance. Is there a better solution for this issue? Let me know if you need any further information about the code.





#0  0x00002acec0eaa831 in find_and_free_dregs_inside () from /opt/mvapich2/intel/19.0/2.3.3/lib/libmpi.so.12

#1  0x00002acec0ed98bc in mvapich2_munmap () from /opt/mvapich2/intel/19.0/2.3.3/lib/libmpi.so.12

#2  0x00002acec0edaf19 in _int_free () from /opt/mvapich2/intel/19.0/2.3.3/lib/libmpi.so.12

#3  0x00002acec0edcb2a in free () from /opt/mvapich2/intel/19.0/2.3.3/lib/libmpi.so.12

#4  0x000000000044e7e1 in VerticalLineLocus_blunder (proinfo=0xbfe698910011c2ee, nccresult=0x2ace00000007, MagImages=0x2ace0011c2d6, DEM_resolution=0,

    im_resolution=6.9526732747253049e-310, RPCs=0x58ce2e, Imagesizes_ori=0x3fffe0, Imagesizes=0x40b1800000000000, Images=0x7ffcc8959168, Template_size=96 '`',

    Size_Grid2D=..., param=..., GridPts=0x7ffcc8959038, Grid_wgs=0x7ffcc8959068, GridPT3=0x7ffcc8959070, NumofIAparam=144 '\220', ImageAdjust=0x7ffcc8959078,

    Pyramid_step=152 '\230', Startpos=0x7ffcc8959020, save_filepath=0x7ffcc89590a0 "", tile_row=168 '\250', tile_col=176 '\260', iteration=184 '\270',

    bl_count=192 '\300', Boundary=0x7ffcc895903c, ori_images=0x7ffcc8959040, blunder_selected_level=-929722300, bblunder=200) at setsm_code.cpp:14584

#5  0x00002acec1c71d43 in __kmp_invoke_microtask () from /opt/intel/19.0.5/compilers_and_libraries_2019/linux/lib/intel64_lin/libiomp5.so

#6  0x00002acec1c0163f in __kmp_invoke_task_func (gtid=-1058373488) at ../../src/kmp_runtime.cpp:7426

#7  0x00002acec1c0065c in __kmp_launch_thread (this_thr=0x2acec0ea8090 <vma_compare_search>) at ../../src/kmp_runtime.cpp:6041

#8  0x00002acec1c722fb in _INTERNAL_26_______src_z_Linux_util_cpp_cabc1a3b::__kmp_launch_worker (thr=0x2acec0ea8090 <vma_compare_search>)

    at ../../src/z_Linux_util.cpp:586

#9  0x00002acec215edd5 in start_thread () from /lib64/libpthread.so.0

#10 0x00002acec247102d in clone () from /lib64/libc.so.6





Thanks,

Samuel Khuvis
Scientific Applications Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-5178<tel:+16142925178> • Fax: (614) 292-7168<tel:+16142927168>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200325/c01571e0/attachment.html>


More information about the mvapich-discuss mailing list