[mvapich-discuss] mvapich2 error
Bharat
mbkumar at gmail.com
Sat Sep 27 16:37:43 EDT 2008
Hi All,
After several days of trying various things, I am posting my problem. We
have 16node, dual processor, Quad Core Intel Xeon with 16GB RAM/node
cluster interconnected with infiniband. I am using mvapich2-1.2RC2. And I
am running an application compiled using ifort 10.1.017, intel mkl
10.0.1.014 (scalapack & blacs taken from intel libraries). The program
runs fine for some time and then it stops with the error message like
siesta: ==============================
Begin CG move = 15
==============================
siesta: iscf Eharris(eV) E_KS(eV) FreeEng(eV) dDmax Ef(eV)
siesta: 1 -110464.5442 -110476.9339 -110477.1312 0.1268 -4.4928
siesta: 2 -110507.6684 -110459.2304 -110459.4392 0.3223 -5.8411
siesta: 3 -110463.9960 -110472.4056 -110472.5206 0.0867 -4.6470
Fatal error in MPI_Bcast:
Message truncated, error stack:
MPI_Bcast(1144)...................: MPI_Bcast(buf=0x20c0fe0, count=1,
dtype=USER<vector>, root=2, comm=0xc4000006) failed
MPIR_Bcast(228)...................:
MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2
truncated; 31744 bytes received but buffer size is 1600
rank 5 in job 27 master_39065 caused collective abort of all ranks
exit status of rank 5: killed by signal 9
I tried different compiler flags, and also tried gfortran, but the problem
is still present. So I am thinking
the error is related to mvapich2. And I am new to mvapich2. So can someone
please help me in solving this issue.
I did only default install of mvapich2 (i.e., ./configure CC=... F90=...,
make, make install). Do I have to
set any environment variables? I used the option of -heap_arrays during
compiling to overcome stack size issue.
The output of ibstatus is
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0027:da55
base lid: 0x13
sm lid: 0x13
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
The output of ibv_devinfo is
hca_id: mthca0
fw_ver: 1.2.0
node_guid: 0002:c902:0027:da54
sys_image_guid: 0002:c902:0027:da57
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0150002
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 19
port_lid: 19
port_lmc: 0x00
Thanks,
Bharat
More information about the mvapich-discuss
mailing list