[mvapich-discuss] ch3:psm MPI-3 RMA limited to $MV2_SHMEM_COLL_NUM_COMM windows

Mingzhe Li li.2192 at osu.edu
Fri Mar 28 13:43:20 EDT 2014


Hi Jeff,

Thanks for reporting. We will take a look and get back to you.

Mingzhe


On Thu, Mar 27, 2014 at 5:04 PM, Jeff Hammond <jeff.science at gmail.com>wrote:

> I cannot allocate more RMA windows than the value of
> MV2_SHMEM_COLL_NUM_COMM.  This breaks ARMCI-MPI and thus NWChem.  I do
> not see this problem with non-PSM builds of MVAPICH2, although I have
> not updated the Mellanox builds within the last few weeks, so perhaps
> it is pervasively broken by a recent change.
>
> I am working on the MVAPICH2 svn trunk.  Please let me know when this
> issue is resolved so I can support NWChem users running with Qlogic
> IB.
>
> Thanks,
>
> Jeff
>
> [jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=1000
> [jhammond at blogin2 tests]$
> /home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
> Starting ARMCI memory allocation test with 4 processes
>  + allocation 0
>  + allocation 1
>  + allocation 2
>  + allocation 3
>  + allocation 4
>  + allocation 5
>  + allocation 6
>  + allocation 7
>  + allocation 8
>  + allocation 9
>  + allocation 10
>  + allocation 11
>  + allocation 12
>  + allocation 13
>  + allocation 14
>  + allocation 15
>  + allocation 16
>  + allocation 17
>  + allocation 18
>  + allocation 19
>  + allocation 20
>  + allocation 21
>  + allocation 22
>  + allocation 23
>  + allocation 24
>  + allocation 25
>  + allocation 26
>  + allocation 27
>  + allocation 28
>  + allocation 29
>  + allocation 30
>  + allocation 31
>  + allocation 32
>  + allocation 33
>  + allocation 34
>  + allocation 35
>  + allocation 36
>  + allocation 37
>  + allocation 38
>  + allocation 39
>  + allocation 40
>  + allocation 41
>  + allocation 42
>  + allocation 43
>  + allocation 44
>  + allocation 45
>  + allocation 46
>  + allocation 47
>  + allocation 48
>  + allocation 49
>  + allocation 50
>  + allocation 51
>  + allocation 52
>  + allocation 53
>  + allocation 54
>  + allocation 55
>  + allocation 56
>  + allocation 57
>  + allocation 58
>  + allocation 59
>  + allocation 60
>  + allocation 61
>  + allocation 62
>  + allocation 63
>  + allocation 64
>  + allocation 65
>  + allocation 66
>  + allocation 67
>  + allocation 68
>  + allocation 69
>  + allocation 70
>  + allocation 71
>  + allocation 72
>  + allocation 73
>  + allocation 74
>  + allocation 75
>  + allocation 76
>  + allocation 77
>  + allocation 78
>  + allocation 79
>  + allocation 80
>  + allocation 81
>  + allocation 82
>  + allocation 83
>  + allocation 84
>  + allocation 85
>  + allocation 86
>  + allocation 87
>  + allocation 88
>  + allocation 89
>  + allocation 90
>  + allocation 91
>  + allocation 92
>  + allocation 93
>  + allocation 94
>  + allocation 95
>  + allocation 96
>  + allocation 97
>  + allocation 98
>  + allocation 99
>  + free 0
>  + free 1
>  + free 2
>  + free 3
>  + free 4
>  + free 5
>  + free 6
>  + free 7
>  + free 8
>  + free 9
>  + free 10
>  + free 11
>  + free 12
>  + free 13
>  + free 14
>  + free 15
>  + free 16
>  + free 17
>  + free 18
>  + free 19
>  + free 20
>  + free 21
>  + free 22
>  + free 23
>  + free 24
>  + free 25
>  + free 26
>  + free 27
>  + free 28
>  + free 29
>  + free 30
>  + free 31
>  + free 32
>  + free 33
>  + free 34
>  + free 35
>  + free 36
>  + free 37
>  + free 38
>  + free 39
>  + free 40
>  + free 41
>  + free 42
>  + free 43
>  + free 44
>  + free 45
>  + free 46
>  + free 47
>  + free 48
>  + free 49
>  + free 50
>  + free 51
>  + free 52
>  + free 53
>  + free 54
>  + free 55
>  + free 56
>  + free 57
>  + free 58
>  + free 59
>  + free 60
>  + free 61
>  + free 62
>  + free 63
>  + free 64
>  + free 65
>  + free 66
>  + free 67
>  + free 68
>  + free 69
>  + free 70
>  + free 71
>  + free 72
>  + free 73
>  + free 74
>  + free 75
>  + free 76
>  + free 77
>  + free 78
>  + free 79
>  + free 80
>  + free 81
>  + free 82
>  + free 83
>  + free 84
>  + free 85
>  + free 86
>  + free 87
>  + free 88
>  + free 89
>  + free 90
>  + free 91
>  + free 92
>  + free 93
>  + free 94
>  + free 95
>  + free 96
>  + free 97
>  + free 98
>  + free 99
> Test complete: PASS.
>
> [jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=100
> [jhammond at blogin2 tests]$
> /home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
> Starting ARMCI memory allocation test with 4 processes
>  + allocation 0
>  + allocation 1
>  + allocation 2
>  + allocation 3
>  + allocation 4
>  + allocation 5
>  + allocation 6
>  + allocation 7
>  + allocation 8
>  + allocation 9
>  + allocation 10
>  + allocation 11
>  + allocation 12
>  + allocation 13
>  + allocation 14
>  + allocation 15
>  + allocation 16
>  + allocation 17
>  + allocation 18
>  + allocation 19
>  + allocation 20
>  + allocation 21
>  + allocation 22
>  + allocation 23
>  + allocation 24
>  + allocation 25
>  + allocation 26
>  + allocation 27
>  + allocation 28
>  + allocation 29
>  + allocation 30
>  + allocation 31
>  + allocation 32
>  + allocation 33
>  + allocation 34
>  + allocation 35
>  + allocation 36
>  + allocation 37
>  + allocation 38
>  + allocation 39
>  + allocation 40
>  + allocation 41
>  + allocation 42
>  + allocation 43
>  + allocation 44
>  + allocation 45
>  + allocation 46
>  + allocation 47
>  + allocation 48
>  + allocation 49
>  + allocation 50
>  + allocation 51
>  + allocation 52
>  + allocation 53
>  + allocation 54
>  + allocation 55
>  + allocation 56
>  + allocation 57
>  + allocation 58
>  + allocation 59
>  + allocation 60
>  + allocation 61
>  + allocation 62
>  + allocation 63
>  + allocation 64
>  + allocation 65
>  + allocation 66
>  + allocation 67
>  + allocation 68
>  + allocation 69
>  + allocation 70
>  + allocation 71
>  + allocation 72
>  + allocation 73
>  + allocation 74
>  + allocation 75
>  + allocation 76
>  + allocation 77
>  + allocation 78
>  + allocation 79
>  + allocation 80
>  + allocation 81
>  + allocation 82
>  + allocation 83
>  + allocation 84
>  + allocation 85
>  + allocation 86
>  + allocation 87
>  + allocation 88
>  + allocation 89
>  + allocation 90
>  + allocation 91
>  + allocation 92
>  + allocation 93
>  + allocation 94
>  + allocation 95
>  + allocation 96
>  + allocation 97
>  + allocation 98
>  + allocation 99
>
> test_malloc:116454 terminated with signal 11 at PC=2ae87e750a0e
> SP=7fffbdc4aa80.  Backtrace:
>
> test_malloc:116456 terminated with signal 11 at PC=2b432f382a0e
> SP=7fffd024a820.  Backtrace:
>
> test_malloc:116455 terminated with signal 11 at PC=2b8d22b46a0e
> SP=7fff916b67c0.  Backtrace:
>
> test_malloc:116457 terminated with signal 11 at PC=2b1aeb8aea0e
> SP=7fff6490d480.  Backtrace:
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b432f382a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ae87e750a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b432f373869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b1aeb8aea0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b8d22b46a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b1aeb89f869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ae87e741869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b432f37a4ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b8d22b37869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b432f462a01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ae87e7484ef]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b1aeb8a64ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b8d22b3e4ef]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ae87e830a01]
> ./test_malloc[0x4023ca]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b1aeb98ea01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b8d22c26a01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 116456 RUNNING AT blogin2
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
>
>
> [jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=10
> [jhammond at blogin2 tests]$
> /home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
> Starting ARMCI memory allocation test with 4 processes
>  + allocation 0
>  + allocation 1
>  + allocation 2
>  + allocation 3
>  + allocation 4
>  + allocation 5
>  + allocation 6
>  + allocation 7
>  + allocation 8
>  + allocation 9
>  + allocation 10
>
> test_malloc:116472 terminated with signal 11 at PC=2ba320df5a0e
> SP=7fff1d1d1c10.  Backtrace:
>
> test_malloc:116473 terminated with signal 11 at PC=2adc34339a0e
> SP=7fff87299800.  Backtrace:
>
> test_malloc:116474 terminated with signal 11 at PC=2ba62c9dca0e
> SP=7fff5457a550.  Backtrace:
>
> test_malloc:116475 terminated with signal 11 at PC=2b6069991a0e
> SP=7ffff4ba8b70.  Backtrace:
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b6069991a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ba320df5a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2adc34339a0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ba62c9dca0e]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b6069982869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b60699894ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ba320de6869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2adc3432a869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ba62c9cd869]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b6069a71a01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ba62c9d44ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ba320ded4ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2adc343314ef]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ba320ed5a01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2adc34419a01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
>
> /home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ba62cabca01]
> ./test_malloc[0x4023ca]
> ./test_malloc[0x401f7a]
> ./test_malloc[0x401ee8]
> ./test_malloc[0x401dfc]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
> ./test_malloc[0x401bc9]
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 116475 RUNNING AT blogin2
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
>
> --
> Jeff Hammond
> jeff.science at gmail.com
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140328/ebdaae2c/attachment-0001.html>


More information about the mvapich-discuss mailing list