[mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster.

Mike Heinz michael.heinz at qlogic.com
Thu Jul 16 15:06:54 EDT 2009


Krishna - just to be clear, this isn't just a problem with the bw program, it happens with all mpi programs tested on these machines. For example, when testing osu_bw, it hangs in Waitall(), polling the Cq, similar to the way bw hung in Barrier().

--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

-----Original Message-----
From: Krishna Chaitanya Kandalla [mailto:kandalla at cse.ohio-state.edu]
Sent: Thursday, July 16, 2009 1:04 PM
To: Mike Heinz
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster.

Mike,
          Can you also try out mvapich2-1.4 RC1. We have added a bunch
of enhancements and bug-fixes in this version.

Thanks,
Krishna

Mike Heinz wrote:
> mvapich-1.1.0-3355.src.rpm
>
> mvapich2-1.2p1-1.src.rpm
>
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>
> -----Original Message-----
> From: Krishna Chaitanya Kandalla [mailto:kandalla at cse.ohio-state.edu]
> Sent: Thursday, July 16, 2009 12:34 PM
> To: Mike Heinz
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster.
>
> Mike,
>           Can you also let us know the version numbers of the mvapich2
> and mvapich1 stacks that you are using?
>
> Thanks,
> Krishna
>
> Mike Heinz wrote:
>
>> Krishna,
>>
>> What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally.
>>
>> This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions.
>>
>> The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture.
>>
>> For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz.
>>
>> I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation.
>>
>> --
>> Michael Heinz
>> Principal Engineer, Qlogic Corporation
>> King of Prussia, Pennsylvania
>>
>> -----Original Message-----
>> From: Krishna Chaitanya Kandalla [mailto:kandalla at cse.ohio-state.edu]
>> Sent: Wednesday, July 15, 2009 7:19 PM
>> To: Mike Heinz
>> Cc: mvapich-discuss at cse.ohio-state.edu; Todd Rimmer
>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster.
>>
>> Mike,
>>         I guess, I had mistakenly started the job on 3 processes earlier
>> and it had hung. On running with 2 processes, (the way it is supposed to
>> be run), it executes correctly on our machines. Can you give us some
>> more information about your hardware. You were speaking about some
>> reachability issues between certain two nodes.  I am guessing that you
>> are running tests with on either :
>> 1. Nodes "A" and "D"  or
>> 2. Nodes "B" and "C"
>>
>>        Also,
>>  >  "A" can't run mvapich2 programs with machine "B", and machine "C"
>> can't run programs with machine "D"
>>
>>        What exactly is the kind of error message that you see in this case?
>>
>> Thanks,
>> Krishna
>>
>> Krishna Chaitanya Kandalla wrote:
>>
>>
>>> Mike,
>>>           Thank you for providing the source code. I am able to
>>> reproduce the hang on our cluster, as well. I will look into the issue.
>>>
>>> Thanks,
>>> Krishna
>>>
>>> Mike Heinz wrote:
>>>
>>>
>>>> I was wondering about that - I passed the parameter in a param file,
>>>> using the -param argument to mpirun_rsh. I just tried passing it
>>>> inline as well, here are the results:
>>>>
>>>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2
>>>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10
>>>>
>>>> node 0
>>>>
>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress at plt ()
>>>>    from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1
>>>> (gdb) where
>>>> #0  0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress at plt ()
>>>>    from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1
>>>> #1  0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1)
>>>>     at ch3_progress.c:174
>>>> #2  0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4,
>>>>     datatype=1275068673, source=1, tag=101, comm=1140850688,
>>>> status=0x601520)
>>>>     at recv.c:156
>>>> #3  0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91
>>>>
>>>>
>>>> (gdb) where
>>>> #0  0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1,
>>>>     wc=0x7fffb9786a60) at src/cq.c:470
>>>> #1  0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll (
>>>>     vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0,
>>>> is_blocking=1)
>>>>     at /usr/include/infiniband/verbs.h:934
>>>> #2  0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00,
>>>>     vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468
>>>> #3  0x00002b9af14a8304 in MPIDI_CH3I_read_progress
>>>> (vc_pptr=0x7fffb9786b80,
>>>>     v_ptr=0x7fffb9786b78, is_blocking=<value optimized out>)
>>>>     at ch3_read_progress.c:158
>>>> #4  0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1,
>>>>     state=<value optimized out>) at ch3_progress.c:202
>>>> #5  0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at
>>>> helper_fns.c:269
>>>> #6  0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0,
>>>>     sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0,
>>>>     recvtype=1275068685, source=0, recvtag=1, comm=1140850688,
>>>> status=0x1)
>>>>     at helper_fns.c:125
>>>> #7  0x00002b9af149b07a in MPIR_Barrier (comm_ptr=<value optimized out>)
>>>>     at barrier.c:82
>>>> #8  0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at
>>>> barrier.c:446
>>>> #9  0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81
>>>>
>>>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets
>>>> out of MPI_Init() in this case, but then one side is waiting at a
>>>> barrier while the other has already gone past the barrier. I've
>>>> attached a copy of the program.
>>>>
>>>>
>>>> --
>>>> Michael Heinz
>>>> Principal Engineer, Qlogic Corporation
>>>> King of Prussia, Pennsylvania
>>>> -----Original Message-----
>>>> From: Krishna Chaitanya Kandalla [mailto:kandalla at cse.ohio-state.edu]
>>>> Sent: Wednesday, July 15, 2009 3:42 PM
>>>> To: Mike Heinz
>>>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging
>>>> a problem that only affects a few machines in our cluster.
>>>>
>>>> Mike,
>>>> Thats a little surprising. Setting this variable off ensures that a
>>>> particular flag is set to 0. This flag is supposed to guard the piece
>>>> of code that does the 2-level communicator creation. Just out of
>>>> curiosity, can you also let me know the command that you are using to
>>>> launch the job. The env variables need to be set before the
>>>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the
>>>> executable name, the job launcher might not pick it up.
>>>>
>>>> Thanks,
>>>> Krishna
>>>>
>>>>
>>>>
>>>>
>>>> Mike Heinz wrote:
>>>>
>>>>
>>>>
>>>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL
>>>>> to zero did not seem to change the stack trace much:
>>>>>
>>>>> Node 0:
>>>>>
>>>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll
>>>>> (vbuf_handle=0x7fffcb46d698,
>>>>>
>>>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529
>>>>>
>>>>> 529 for (; i < rdma_num_hcas; ++i) {
>>>>>
>>>>> (gdb) where
>>>>>
>>>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll (
>>>>>
>>>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1)
>>>>>
>>>>> at ibv_channel_manager.c:529
>>>>>
>>>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress
>>>>> (vc_pptr=0x7fffcb46d6a0,
>>>>>
>>>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143
>>>>>
>>>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1,
>>>>>
>>>>> state=<value optimized out>) at ch3_progress.c:202
>>>>>
>>>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800)
>>>>>
>>>>> at helper_fns.c:269
>>>>>
>>>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80,
>>>>> sendcount=2,
>>>>>
>>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88,
>>>>> recvcount=2,
>>>>>
>>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688,
>>>>>
>>>>> status=0x7fffcb46d820) at helper_fns.c:125
>>>>>
>>>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=<value optimized out>,
>>>>>
>>>>> sendcount=<value optimized out>, sendtype=1275069445,
>>>>> recvbuf=0x10993a80,
>>>>>
>>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0)
>>>>>
>>>>> at allgather.c:192
>>>>>
>>>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>>>>
>>>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2,
>>>>>
>>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>>>>
>>>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0,
>>>>> key=0,
>>>>>
>>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196
>>>>>
>>>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2,
>>>>>
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>
>>>>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>>>>
>>>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c,
>>>>> argv=0x7fffcb46db30)
>>>>>
>>>>> at init.c:146
>>>>>
>>>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27
>>>>>
>>>>> Node 1:
>>>>>
>>>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48,
>>>>>
>>>>> is_blocking=1) at ch3_read_progress.c:143
>>>>>
>>>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking);
>>>>>
>>>>> (gdb) where
>>>>>
>>>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50,
>>>>> v_ptr=0x7fff0b10bb48,
>>>>>
>>>>> is_blocking=1) at ch3_read_progress.c:143
>>>>>
>>>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1,
>>>>>
>>>>> state=<value optimized out>) at ch3_progress.c:202
>>>>>
>>>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0)
>>>>>
>>>>> at helper_fns.c:269
>>>>>
>>>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2,
>>>>>
>>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4,
>>>>>
>>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688,
>>>>>
>>>>> status=0x7fff0b10bcd0) at helper_fns.c:125
>>>>>
>>>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=<value optimized out>,
>>>>>
>>>>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0xf77020,
>>>>>
>>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80)
>>>>>
>>>>> at allgather.c:192
>>>>>
>>>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>>>>
>>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2,
>>>>>
>>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>>>>
>>>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1,
>>>>> key=0,
>>>>>
>>>>> newcomm=0x2afc9fd26d94) at comm_split.c:196
>>>>>
>>>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2,
>>>>>
>>>>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>>>>
>>>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec,
>>>>> argv=0x7fff0b10bfe0)
>>>>>
>>>>> at init.c:146
>>>>>
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>
>>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27
>>>>>
>>>>> Any suggestions would be appreciated.
>>>>>
>>>>> --
>>>>>
>>>>> Michael Heinz
>>>>>
>>>>> Principal Engineer, Qlogic Corporation
>>>>>
>>>>> King of Prussia, Pennsylvania
>>>>>
>>>>> *From:* kris.c1986 at gmail.com [mailto:kris.c1986 at gmail.com] *On
>>>>> Behalf Of *Krishna Chaitanya
>>>>> *Sent:* Tuesday, July 14, 2009 6:39 PM
>>>>> *To:* Mike Heinz
>>>>> *Cc:* Todd Rimmer; mvapich-discuss at cse.ohio-state.edu;
>>>>> mpich2-dev at mcs.anl.gov
>>>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in
>>>>> debugging a problem that only affects a few machines in our cluster.
>>>>>
>>>>> Mike,
>>>>> The hang seems to be occuring when the MPI library is trying to
>>>>> create the 2-level communicator, during the init phase. Can you try
>>>>> running the test with MV2_USE_SHMEM_COLL
>>>>> <http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-16000011.74>=0.
>>>>> This will ensure that a flat communicator is used for the subsequent
>>>>> MPI calls. This might help us isolate the problem.
>>>>>
>>>>> Thanks,
>>>>> Krishna
>>>>>
>>>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz
>>>>> <michael.heinz at qlogic.com <mailto:michael.heinz at qlogic.com>> wrote:
>>>>>
>>>>> We're having a very odd problem with our fabric, where, out of the
>>>>> entire cluster, machine "A" can't run mvapich2 programs with machine
>>>>> "B", and machine "C" can't run programs with machine "D" - even
>>>>> though "A" can run with "D" and "B" can run with "C" - and the rest
>>>>> of the fabric works fine.
>>>>>
>>>>> 1) There are no IB errors anywhere on the fabric that I can find,
>>>>> and the machines in question all work correctly with mvapich1 and
>>>>> low-level IB tests.
>>>>>
>>>>> 2) The problem occurs whether using mpd or rsh.
>>>>>
>>>>> 3) If I attach to the running processes, both machines appear to be
>>>>> waiting for a read operation to complete. (See below)
>>>>>
>>>>> Can anyone make a suggestion on how to debug this?
>>>>>
>>>>> Stack trace for node 0:
>>>>>
>>>>> #0 0x000000361160abb5 in pthread_spin_lock () from
>>>>> /lib64/libpthread.so.0
>>>>>
>>>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1,
>>>>>
>>>>> wc=0x7fff9d835900) at src/cq.c:468
>>>>>
>>>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll (
>>>>>
>>>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1)
>>>>>
>>>>> at /usr/include/infiniband/verbs.h:934
>>>>>
>>>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress
>>>>> (vc_pptr=0x7fff9d8359e0,
>>>>>
>>>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143
>>>>>
>>>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1,
>>>>>
>>>>> state=<value optimized out>) at ch3_progress.c:202
>>>>>
>>>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800)
>>>>>
>>>>> at helper_fns.c:269
>>>>>
>>>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2,
>>>>>
>>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2,
>>>>>
>>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688,
>>>>>
>>>>> status=0x7fff9d835b60) at helper_fns.c:125
>>>>>
>>>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=<value optimized out>,
>>>>>
>>>>> sendcount=<value optimized out>, sendtype=1275069445,
>>>>> recvbuf=0x217fc50,
>>>>>
>>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0)
>>>>>
>>>>> at allgather.c:192
>>>>>
>>>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>>>>
>>>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2,
>>>>>
>>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>>>>
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>
>>>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0,
>>>>> key=0,
>>>>>
>>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196
>>>>>
>>>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2,
>>>>>
>>>>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>>>>
>>>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c,
>>>>> argv=0x7fff9d835e70)
>>>>>
>>>>> at init.c:146
>>>>>
>>>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27
>>>>>
>>>>> Stack trace for node 1:
>>>>>
>>>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress
>>>>> (vc_pptr=0x7fffdee81020,
>>>>>
>>>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143
>>>>>
>>>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1,
>>>>>
>>>>> state=<value optimized out>) at ch3_progress.c:202
>>>>>
>>>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0)
>>>>>
>>>>> at helper_fns.c:269
>>>>>
>>>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2,
>>>>>
>>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4,
>>>>>
>>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688,
>>>>>
>>>>> status=0x7fffdee811a0) at helper_fns.c:125
>>>>>
>>>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=<value optimized out>,
>>>>>
>>>>> sendcount=<value optimized out>, sendtype=1275069445, recvbuf=0xf79020,
>>>>>
>>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80)
>>>>>
>>>>> at allgather.c:192
>>>>>
>>>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff,
>>>>>
>>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2,
>>>>>
>>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866
>>>>>
>>>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1,
>>>>> key=0,
>>>>>
>>>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196
>>>>>
>>>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2,
>>>>>
>>>>> my_rank=<value optimized out>) at create_2level_comm.c:142
>>>>>
>>>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc,
>>>>> argv=0x7fffdee814b0)
>>>>>
>>>>> at init.c:146
>>>>>
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>
>>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27
>>>>>
>>>>> --
>>>>>
>>>>> Michael Heinz
>>>>>
>>>>> Principal Engineer, Qlogic Corporation
>>>>>
>>>>> King of Prussia, Pennsylvania
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> In the middle of difficulty, lies opportunity
>>>>>
>>>>>
>>>>>
>>>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>>
>>>
>>
>>
>
>
>



More information about the mvapich-discuss mailing list