[mvapich-discuss] ch3:psm MPI-3 RMA limited to $MV2_SHMEM_COLL_NUM_COMM windows

Jeff Hammond jeff.science at gmail.com
Thu Mar 27 17:04:41 EDT 2014


I cannot allocate more RMA windows than the value of
MV2_SHMEM_COLL_NUM_COMM.  This breaks ARMCI-MPI and thus NWChem.  I do
not see this problem with non-PSM builds of MVAPICH2, although I have
not updated the Mellanox builds within the last few weeks, so perhaps
it is pervasively broken by a recent change.

I am working on the MVAPICH2 svn trunk.  Please let me know when this
issue is resolved so I can support NWChem users running with Qlogic
IB.

Thanks,

Jeff

[jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=1000
[jhammond at blogin2 tests]$
/home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
Starting ARMCI memory allocation test with 4 processes
 + allocation 0
 + allocation 1
 + allocation 2
 + allocation 3
 + allocation 4
 + allocation 5
 + allocation 6
 + allocation 7
 + allocation 8
 + allocation 9
 + allocation 10
 + allocation 11
 + allocation 12
 + allocation 13
 + allocation 14
 + allocation 15
 + allocation 16
 + allocation 17
 + allocation 18
 + allocation 19
 + allocation 20
 + allocation 21
 + allocation 22
 + allocation 23
 + allocation 24
 + allocation 25
 + allocation 26
 + allocation 27
 + allocation 28
 + allocation 29
 + allocation 30
 + allocation 31
 + allocation 32
 + allocation 33
 + allocation 34
 + allocation 35
 + allocation 36
 + allocation 37
 + allocation 38
 + allocation 39
 + allocation 40
 + allocation 41
 + allocation 42
 + allocation 43
 + allocation 44
 + allocation 45
 + allocation 46
 + allocation 47
 + allocation 48
 + allocation 49
 + allocation 50
 + allocation 51
 + allocation 52
 + allocation 53
 + allocation 54
 + allocation 55
 + allocation 56
 + allocation 57
 + allocation 58
 + allocation 59
 + allocation 60
 + allocation 61
 + allocation 62
 + allocation 63
 + allocation 64
 + allocation 65
 + allocation 66
 + allocation 67
 + allocation 68
 + allocation 69
 + allocation 70
 + allocation 71
 + allocation 72
 + allocation 73
 + allocation 74
 + allocation 75
 + allocation 76
 + allocation 77
 + allocation 78
 + allocation 79
 + allocation 80
 + allocation 81
 + allocation 82
 + allocation 83
 + allocation 84
 + allocation 85
 + allocation 86
 + allocation 87
 + allocation 88
 + allocation 89
 + allocation 90
 + allocation 91
 + allocation 92
 + allocation 93
 + allocation 94
 + allocation 95
 + allocation 96
 + allocation 97
 + allocation 98
 + allocation 99
 + free 0
 + free 1
 + free 2
 + free 3
 + free 4
 + free 5
 + free 6
 + free 7
 + free 8
 + free 9
 + free 10
 + free 11
 + free 12
 + free 13
 + free 14
 + free 15
 + free 16
 + free 17
 + free 18
 + free 19
 + free 20
 + free 21
 + free 22
 + free 23
 + free 24
 + free 25
 + free 26
 + free 27
 + free 28
 + free 29
 + free 30
 + free 31
 + free 32
 + free 33
 + free 34
 + free 35
 + free 36
 + free 37
 + free 38
 + free 39
 + free 40
 + free 41
 + free 42
 + free 43
 + free 44
 + free 45
 + free 46
 + free 47
 + free 48
 + free 49
 + free 50
 + free 51
 + free 52
 + free 53
 + free 54
 + free 55
 + free 56
 + free 57
 + free 58
 + free 59
 + free 60
 + free 61
 + free 62
 + free 63
 + free 64
 + free 65
 + free 66
 + free 67
 + free 68
 + free 69
 + free 70
 + free 71
 + free 72
 + free 73
 + free 74
 + free 75
 + free 76
 + free 77
 + free 78
 + free 79
 + free 80
 + free 81
 + free 82
 + free 83
 + free 84
 + free 85
 + free 86
 + free 87
 + free 88
 + free 89
 + free 90
 + free 91
 + free 92
 + free 93
 + free 94
 + free 95
 + free 96
 + free 97
 + free 98
 + free 99
Test complete: PASS.

[jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=100
[jhammond at blogin2 tests]$
/home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
Starting ARMCI memory allocation test with 4 processes
 + allocation 0
 + allocation 1
 + allocation 2
 + allocation 3
 + allocation 4
 + allocation 5
 + allocation 6
 + allocation 7
 + allocation 8
 + allocation 9
 + allocation 10
 + allocation 11
 + allocation 12
 + allocation 13
 + allocation 14
 + allocation 15
 + allocation 16
 + allocation 17
 + allocation 18
 + allocation 19
 + allocation 20
 + allocation 21
 + allocation 22
 + allocation 23
 + allocation 24
 + allocation 25
 + allocation 26
 + allocation 27
 + allocation 28
 + allocation 29
 + allocation 30
 + allocation 31
 + allocation 32
 + allocation 33
 + allocation 34
 + allocation 35
 + allocation 36
 + allocation 37
 + allocation 38
 + allocation 39
 + allocation 40
 + allocation 41
 + allocation 42
 + allocation 43
 + allocation 44
 + allocation 45
 + allocation 46
 + allocation 47
 + allocation 48
 + allocation 49
 + allocation 50
 + allocation 51
 + allocation 52
 + allocation 53
 + allocation 54
 + allocation 55
 + allocation 56
 + allocation 57
 + allocation 58
 + allocation 59
 + allocation 60
 + allocation 61
 + allocation 62
 + allocation 63
 + allocation 64
 + allocation 65
 + allocation 66
 + allocation 67
 + allocation 68
 + allocation 69
 + allocation 70
 + allocation 71
 + allocation 72
 + allocation 73
 + allocation 74
 + allocation 75
 + allocation 76
 + allocation 77
 + allocation 78
 + allocation 79
 + allocation 80
 + allocation 81
 + allocation 82
 + allocation 83
 + allocation 84
 + allocation 85
 + allocation 86
 + allocation 87
 + allocation 88
 + allocation 89
 + allocation 90
 + allocation 91
 + allocation 92
 + allocation 93
 + allocation 94
 + allocation 95
 + allocation 96
 + allocation 97
 + allocation 98
 + allocation 99

test_malloc:116454 terminated with signal 11 at PC=2ae87e750a0e
SP=7fffbdc4aa80.  Backtrace:

test_malloc:116456 terminated with signal 11 at PC=2b432f382a0e
SP=7fffd024a820.  Backtrace:

test_malloc:116455 terminated with signal 11 at PC=2b8d22b46a0e
SP=7fff916b67c0.  Backtrace:

test_malloc:116457 terminated with signal 11 at PC=2b1aeb8aea0e
SP=7fff6490d480.  Backtrace:
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b432f382a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ae87e750a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b432f373869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b1aeb8aea0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b8d22b46a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b1aeb89f869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ae87e741869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b432f37a4ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b8d22b37869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b432f462a01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ae87e7484ef]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b1aeb8a64ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b8d22b3e4ef]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ae87e830a01]
./test_malloc[0x4023ca]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b1aeb98ea01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b8d22c26a01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 116456 RUNNING AT blogin2
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================


[jhammond at blogin2 tests]$ export MV2_SHMEM_COLL_NUM_COMM=10
[jhammond at blogin2 tests]$
/home/jhammond/MPI/gcc482-mv2trunk/bin/mpiexec -n 4 ./test_malloc
Starting ARMCI memory allocation test with 4 processes
 + allocation 0
 + allocation 1
 + allocation 2
 + allocation 3
 + allocation 4
 + allocation 5
 + allocation 6
 + allocation 7
 + allocation 8
 + allocation 9
 + allocation 10

test_malloc:116472 terminated with signal 11 at PC=2ba320df5a0e
SP=7fff1d1d1c10.  Backtrace:

test_malloc:116473 terminated with signal 11 at PC=2adc34339a0e
SP=7fff87299800.  Backtrace:

test_malloc:116474 terminated with signal 11 at PC=2ba62c9dca0e
SP=7fff5457a550.  Backtrace:

test_malloc:116475 terminated with signal 11 at PC=2b6069991a0e
SP=7ffff4ba8b70.  Backtrace:
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2b6069991a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ba320df5a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2adc34339a0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(+0xb9a0e)[0x2ba62c9dca0e]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2b6069982869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2b60699894ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ba320de6869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2adc3432a869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPIDI_CH3U_Win_allocate+0x79)[0x2ba62c9cd869]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2b6069a71a01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ba62c9d44ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2ba320ded4ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(MPID_Win_allocate+0x9f)[0x2adc343314ef]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ba320ed5a01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2adc34419a01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/home/jhammond/MPI/gcc482-mv2trunk/lib/libmpich.so.12(PMPI_Win_allocate+0x221)[0x2ba62cabca01]
./test_malloc[0x4023ca]
./test_malloc[0x401f7a]
./test_malloc[0x401ee8]
./test_malloc[0x401dfc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b6ea1ed1d]
./test_malloc[0x401bc9]

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 116475 RUNNING AT blogin2
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

-- 
Jeff Hammond
jeff.science at gmail.com



More information about the mvapich-discuss mailing list