[mvapich-discuss] Help problem MPI_Bcast fails on np=8 with 8MB buffer

Terrence.LIAO at total.com Terrence.LIAO at total.com
Thu Aug 21 17:55:56 EDT 2008


Dear mvapich,

I got a core dump when MPI_Bcast(buffer, n, MPI_DOUBLE,...) when n is 
1024*1024,  i,e 8MB buffer on np=8 on 8 compute nodes.    I have NO 
problem when using np = 7.  I am using mvapich-1.0 Feb 28 2008 download on 
 AMD cluster - quad-core dual sockets 16GB mem, with 4xDDR IB.  mvapich is 
built on pgi 7.1 compiler.    Below is the gdb output.   Any suggestion I 
should do to fix this problem?  Thank you very much.  -- Terrence


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894245856 (LWP 18383)]
0x00000036d80723e3 in memcpy () from /lib64/tls/libc.so.6
(gdb) where
#0  0x00000036d80723e3 in memcpy () from /lib64/tls/libc.so.6
#1  0x0000000000449c09 in MPID_VIA_self_start (buf=0x2a96546010, 
len=8388608, src_lrank=0, tag=2,
    context_id=0, shandle=0x57a1e8) at viasend.c:276
#2  0x000000000044c205 in MPID_IsendContig (comm_ptr=0x5a2060, 
buf=0x2a96546010, len=8388608,
    src_lrank=0, tag=2, context_id=0, dest_grank=0, 
msgrep=MPID_MSGREP_RECEIVER, request=0x57a1e8,
    error_code=0x7fbfffe66c) at mpid_send.c:84
#3  0x0000000000435cfd in MPID_IsendDatatype (comm_ptr=0x5a2060, 
buf=0x2a96546010, count=1048576,
    dtype_ptr=0x56ac60, src_lrank=0, tag=2, context_id=0, dest_grank=0, 
request=0x57a1e8,
    error_code=0x7fbfffe66c) at mpid_hsend.c:129
#4  0x0000000000443215 in PMPI_Isend (buf=0x2a96546010, count=1048576, 
datatype=11, dest=0, tag=2,
    comm=91, request=0x7fbfffe710) at isend.c:97
#5  0x0000000000444710 in PMPI_Sendrecv (sendbuf=0x2a96546010, 
sendcount=1048576, sendtype=11,
    dest=0, sendtag=2, recvbuf=0x2a96d4bc00, recvcount=1048576, 
recvtype=11, source=0, recvtag=2,
    comm=91, status=0x7fbfffe820) at sendrecv.c:95
#6  0x000000000041c355 in intra_shmem_Bcast_Large (buffer=0x2a96546010, 
count=1048576,
    datatype=0x56ac60, nbytes=8388608, root=0, comm=0x5a2060) at 
intra_fns_new.c:1704
#7  0x000000000041b6b4 in intra_Bcast_Large (buffer=0x2a96546010, 
count=1048576, datatype=0x56ac60,
    nbytes=8388608, root=0, comm=0x5a2060) at intra_fns_new.c:1309
#8  0x000000000041b157 in intra_newBcast (buffer=0x2a96546010, 
count=1048576, datatype=0x56ac60,
    root=0, comm=0x5a2060) at intra_fns_new.c:1117
#9  0x0000000000412008 in PMPI_Bcast (buffer=0x2a96546010, count=1048576, 
datatype=11, root=0,
    comm=91) at bcast.c:122
#10 0x00000000004042de in main (argc=2, argv=0x7fbfffee98) at 
large-mpi_bcast_test.c:159
(gdb)




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080821/706f5f42/attachment-0001.html


More information about the mvapich-discuss mailing list