[mvapich-discuss] mvapich2 IB problems with transfers over ~10KB.

Mike Houston mhouston at graphics.stanford.edu
Mon Apr 2 15:03:59 EDT 2007



wei huang wrote:
> Hi Mike,
>
> Could you please run IMB test with -DCHECK (Compile IMB with this flag)
> option? This checks all collective operations with data verification.
>   
Will do.  I need to finish some large runs on the cluster with the 
workarounds before I can run this.
> Also, what exactly do you mean by ``cannot reliably'' send message? Do you
> see data corruption, or are there other error symptoms?
>   
The messages get corrupted or are corrupting data in other windows 
(windows as MPI_Win).  We don't see this behavior with mpich2 over GigE 
or IPoIB.  The later seems to be what we are generally seeing.  This may 
well be a bug on our side, but we don't have issues with other mpi 
versions, but we could be getting lucky.
> Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
>
>
> On Mon, 2 Apr 2007, Mike Houston wrote:
>
>   
>> wei huang wrote:
>>     
>>> Hi Mike,
>>>
>>> Thanks for letting us know the problem. However, to us understand more
>>> what is going on, would you please let us know the following?
>>>
>>> 1) Which version of mvapich2 are you using? The latest release version now
>>> should be mvapich2-0.9.8.
>>>
>>>       
>> Yes, 0.9.8
>>     
>>> 2) Could you actually try running osu_benchmarks and see if they all pass
>>> on your system? The benchmarks are distributed with the packet and are in
>>> the `osu_benchmarks' directory. You should not experience problem with
>>> that if your systems are setup correctly.
>>>
>>>       
>> All gives these a go, but they look like they don't verify results.
>>     
>>> Thanks.
>>>
>>> Regards,
>>> Wei Huang
>>>
>>> 774 Dreese Lab, 2015 Neil Ave,
>>> Dept. of Computer Science and Engineering
>>> Ohio State University
>>> OH 43210
>>> Tel: (614)292-8501
>>>
>>>
>>>
>>>       
>>>> ---------- Forwarded message ----------
>>>> Date: Sun, 01 Apr 2007 03:03:31 -0700
>>>> From: Mike Houston <mhouston at graphics.stanford.edu>
>>>> To: mvapich-discuss at cse.ohio-state.edu
>>>> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB.
>>>>
>>>> We've hit an odd snag with using mvapich2.  We can't seem to reliably
>>>> send messages > 10KB.  If we break up all large messages into 8KB blocks
>>>> and send, things work just fine, but as expected, performance is awful.
>>>> Under mpich2 with GigE and IPoIB, large messages seem to work just
>>>> fine.  Both MPI_Send and MPI_Put seem to exhibit the same behavior.  I
>>>> should note that the one oddity of our system implementation is that we
>>>> have a posted MPI_IRecv waiting while doing the large transfers.
>>>> Open-MPI flips out when we do this, even in tcp mode.
>>>>
>>>> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3)
>>>> ontop of the latest RHEL4 SMP x86 kernel (32-bit).  The boards have
>>>> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless
>>>> there are known issues with that firmware...  We built using the
>>>> defaults in make.mvapich2.vapi.  Any suggestions on where to look or
>>>> what to update?  It seems *very* odd that large transfers aren't working
>>>> for us...
>>>>
>>>> Thanks!
>>>>
>>>> -Mike
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>>         
>>>
>>>       
>
>
>   


More information about the mvapich-discuss mailing list