[mvapich-discuss] mvapich2 IB problems with transfers over ~10KB.
Mike Houston
mhouston at graphics.stanford.edu
Mon Apr 2 15:03:59 EDT 2007
wei huang wrote:
> Hi Mike,
>
> Could you please run IMB test with -DCHECK (Compile IMB with this flag)
> option? This checks all collective operations with data verification.
>
Will do. I need to finish some large runs on the cluster with the
workarounds before I can run this.
> Also, what exactly do you mean by ``cannot reliably'' send message? Do you
> see data corruption, or are there other error symptoms?
>
The messages get corrupted or are corrupting data in other windows
(windows as MPI_Win). We don't see this behavior with mpich2 over GigE
or IPoIB. The later seems to be what we are generally seeing. This may
well be a bug on our side, but we don't have issues with other mpi
versions, but we could be getting lucky.
> Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
>
>
> On Mon, 2 Apr 2007, Mike Houston wrote:
>
>
>> wei huang wrote:
>>
>>> Hi Mike,
>>>
>>> Thanks for letting us know the problem. However, to us understand more
>>> what is going on, would you please let us know the following?
>>>
>>> 1) Which version of mvapich2 are you using? The latest release version now
>>> should be mvapich2-0.9.8.
>>>
>>>
>> Yes, 0.9.8
>>
>>> 2) Could you actually try running osu_benchmarks and see if they all pass
>>> on your system? The benchmarks are distributed with the packet and are in
>>> the `osu_benchmarks' directory. You should not experience problem with
>>> that if your systems are setup correctly.
>>>
>>>
>> All gives these a go, but they look like they don't verify results.
>>
>>> Thanks.
>>>
>>> Regards,
>>> Wei Huang
>>>
>>> 774 Dreese Lab, 2015 Neil Ave,
>>> Dept. of Computer Science and Engineering
>>> Ohio State University
>>> OH 43210
>>> Tel: (614)292-8501
>>>
>>>
>>>
>>>
>>>> ---------- Forwarded message ----------
>>>> Date: Sun, 01 Apr 2007 03:03:31 -0700
>>>> From: Mike Houston <mhouston at graphics.stanford.edu>
>>>> To: mvapich-discuss at cse.ohio-state.edu
>>>> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB.
>>>>
>>>> We've hit an odd snag with using mvapich2. We can't seem to reliably
>>>> send messages > 10KB. If we break up all large messages into 8KB blocks
>>>> and send, things work just fine, but as expected, performance is awful.
>>>> Under mpich2 with GigE and IPoIB, large messages seem to work just
>>>> fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I
>>>> should note that the one oddity of our system implementation is that we
>>>> have a posted MPI_IRecv waiting while doing the large transfers.
>>>> Open-MPI flips out when we do this, even in tcp mode.
>>>>
>>>> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3)
>>>> ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have
>>>> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless
>>>> there are known issues with that firmware... We built using the
>>>> defaults in make.mvapich2.vapi. Any suggestions on where to look or
>>>> what to update? It seems *very* odd that large transfers aren't working
>>>> for us...
>>>>
>>>> Thanks!
>>>>
>>>> -Mike
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>>
>>>
>>>
>
>
>
More information about the mvapich-discuss
mailing list