[mvapich-discuss] mvapich2 error
Lei Chai
chai.15 at osu.edu
Mon Sep 29 21:14:06 EDT 2008
Hi Bharat,
Thanks for reporting the problem. Since we don't have the license for
siesta we are not able to run it on our cluster. Could you try the
following and let us know the results:
- Use the option MV2_USE_SHMEM_COLL=0 <#x1-13400011.56>
e.g. $ mpirun_rsh -np N -hostfile ./hosts MV2_USE_SHMEM_COLL=0
<#x1-13400011.56> ./prog
- Try to run the program with MPICH2-1.0.7, since mvapich2-1.2rc2 is
based on MPICH2-1.0.7
This will help us get more insight into the problem.
Thanks,
Lei
Bharat wrote:
> Hi All,
>
> After several days of trying various things, I am posting my problem.
> We have 16node, dual processor, Quad Core Intel Xeon with 16GB
> RAM/node cluster interconnected with infiniband. I am using
> mvapich2-1.2RC2. And I am running an application compiled using ifort
> 10.1.017, intel mkl 10.0.1.014 (scalapack & blacs taken from intel
> libraries). The program runs fine for some time and then it stops with
> the error message like
>
> siesta: ==============================
> Begin CG move = 15
> ==============================
>
>
> siesta: iscf Eharris(eV) E_KS(eV) FreeEng(eV) dDmax Ef(eV)
> siesta: 1 -110464.5442 -110476.9339 -110477.1312 0.1268 -4.4928
> siesta: 2 -110507.6684 -110459.2304 -110459.4392 0.3223 -5.8411
> siesta: 3 -110463.9960 -110472.4056 -110472.5206 0.0867 -4.6470
> Fatal error in MPI_Bcast:
> Message truncated, error stack:
> MPI_Bcast(1144)...................: MPI_Bcast(buf=0x20c0fe0, count=1,
> dtype=USER<vector>, root=2, comm=0xc4000006) failed
> MPIR_Bcast(228)...................:
> MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2
> truncated; 31744 bytes received but buffer size is 1600
> rank 5 in job 27 master_39065 caused collective abort of all ranks
> exit status of rank 5: killed by signal 9
>
> I tried different compiler flags, and also tried gfortran, but the
> problem is still present. So I am thinking
> the error is related to mvapich2. And I am new to mvapich2. So can
> someone please help me in solving this issue.
> I did only default install of mvapich2 (i.e., ./configure CC=...
> F90=..., make, make install). Do I have to
> set any environment variables? I used the option of -heap_arrays
> during compiling to overcome stack size issue.
> The output of ibstatus is
>
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0002:c902:0027:da55
> base lid: 0x13
> sm lid: 0x13
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 20 Gb/sec (4X DDR)
>
> The output of ibv_devinfo is
> hca_id: mthca0
> fw_ver: 1.2.0
> node_guid: 0002:c902:0027:da54
> sys_image_guid: 0002:c902:0027:da57
> vendor_id: 0x02c9
> vendor_part_id: 25204
> hw_ver: 0xA0
> board_id: MT_03B0150002
> phys_port_cnt: 1
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 19
> port_lid: 19
> port_lmc: 0x00
>
>
> Thanks,
> Bharat
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list