[mvapich-discuss] mvapich2 1.7rc1 and 2GB transfers

Devendar Bureddy bureddy at cse.ohio-state.edu
Thu Sep 1 11:26:46 EDT 2011


Hi Tibor

Sorry for that. Unfortunately It was generated against latest
development code base.  Can you please try this attached patch.

Thanks
Devendar

On Thu, Sep 1, 2011 at 11:07 AM, Tibor Pausz
<pausz at th.physik.uni-frankfurt.de> wrote:
> Hi Devendar,
>
> I can't apply the patch to 1.7rc1. The hunks are all rejected.
>
> Best regards,
> Tibor
>
>
> Am 18.08.2011 22:59, schrieb Devendar Bureddy:
>> Hi Tibor
>>
>> Thanks for your sample program. There is a interger over flow inside
>> the library.  Can you please apply attached patch and try again.
>> Please do  a "make clean" and "make && make install" after applying
>> the patch.
>>
>> BTW, You have declared a large memory for buffer2 but not used it in
>> your sample program.
>>
>> Thanks
>> Devendar
>>
>> On Thu, Aug 18, 2011 at 10:26 AM, Tibor Pausz
>> <pausz at th.physik.uni-frankfurt.de> wrote:
>>> Hi Devendar,
>>>
>>> a have included the small program which I used.
>>>
>>> Best regards,
>>> Tibor
>>>
>>>
>>> Am 16.08.2011 16:01, schrieb Devendar Bureddy:
>>>> Hi Tibor
>>>>
>>>> Thanks for using MVAPICH2.  In our testing we are able to transfer
>>>> larger than 2GB.  The internal warning message  which you got
>>>> indicates that there is a mismatch between send, recv buffer sizes.
>>>> Can you please share your reproducible program, so that it will be
>>>> easy for us to debug further.
>>>>
>>>> Thanks in advance
>>>> Devendar
>>>>
>>>> On Tue, Aug 16, 2011 at 4:13 AM, Tibor Pausz
>>>> <pausz at th.physik.uni-frankfurt.de> wrote:
>>>>> Hi there
>>>>>
>>>>> I have installed mvapich2 1.7rc1 with this configure options
>>>>> ./configure --enable-fc --enable-cxx --with-hwloc
>>>>> --with-slurm=.../slurm/2.2.7 --with-rdma=gen2 --enable-shared
>>>>> --enable-sharedlibs=gcc --with-xrc
>>>>> Compiler Intel 11.1, Scientific Linux 5.5
>>>>>
>>>>> Now im trying to transfer with MPI_Ssend arrays which are larger than
>>>>> 2GB, and a got the following message.
>>>>> Warning! Rndv Receiver is expecting 0 Bytes But, is receiving 0 Bytes
>>>>>
>>>>> After that the program just hangs. I have tried "int" and "long" as type
>>>>> of argument "count". But I have allways the same warning.
>>>>>
>>>>> Best regards,
>>>>> Tibor
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
Index: src/mpid/ch3/include/mpidpkt.h
===================================================================
--- src/mpid/ch3/include/mpidpkt.h	(revision 4874)
+++ src/mpid/ch3/include/mpidpkt.h	(working copy)
@@ -234,7 +234,7 @@
 #if defined(MPID_USE_SEQUENCE_NUMBERS)
     MPID_Seqnum_t seqnum;
 #endif /* defined(MPID_USE_SEQUENCE_NUMBERS) */
-    int recv_sz;
+    MPIDI_msg_sz_t recv_sz;
     MPIDI_CH3I_MRAILI_RNDV_INFO_DECL
 #else /* defined(_OSU_MVAPICH_) */
     MPIDI_CH3_Pkt_type_t type;
Index: src/mpid/ch3/src/ch3u_rndv.c
===================================================================
--- src/mpid/ch3/src/ch3u_rndv.c	(revision 4874)
+++ src/mpid/ch3/src/ch3u_rndv.c	(working copy)
@@ -348,7 +348,7 @@
     MPID_Request * sreq;
     int mpi_errno = MPI_SUCCESS;
 #if defined(_OSU_MVAPICH_)
-    int recv_size;
+    MPIDI_msg_sz_t recv_size;
     int i;
     extern int MPIDI_CH3_Rndv_transfer(MPIDI_VC_t *, MPID_Request *, MPID_Request *,
                                        MPIDI_CH3_Pkt_rndv_clr_to_send_t *,
Index: src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c
===================================================================
--- src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c	(revision 4874)
+++ src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c	(working copy)
@@ -491,7 +491,8 @@
                         sreq->dev.iov[0].MPID_IOV_LEN);
 
             {
-                int i = 0, total_len = 0;
+                int i = 0;
+                size_t  total_len = 0;
                 for (i = 0; i < n_iov; i++) {
                     total_len += (iov[i].MPID_IOV_LEN);
                 }
Index: src/mpid/ch3/channels/mrail/src/gen2/ibv_recv.c
===================================================================
--- src/mpid/ch3/channels/mrail/src/gen2/ibv_recv.c	(revision 4874)
+++ src/mpid/ch3/channels/mrail/src/gen2/ibv_recv.c	(working copy)
@@ -412,7 +412,7 @@
 
     *nb = 0;
     for (i = req->dev.iov_offset; i < n_iov; i++) {
-        if (len_avail >= (int) iov[i].MPID_IOV_LEN
+        if (len_avail >= (MPIDI_msg_sz_t) iov[i].MPID_IOV_LEN
             && iov[i].MPID_IOV_LEN != 0) {
             MPIU_Memcpy(iov[i].MPID_IOV_BUF, data_buf, iov[i].MPID_IOV_LEN);
             data_buf = (void *) ((uintptr_t) data_buf + iov[i].MPID_IOV_LEN);
Index: src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h
===================================================================
--- src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h	(revision 4874)
+++ src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h	(working copy)
@@ -201,7 +201,7 @@
 int MPIDI_CH3I_MRAILI_Eager_send(   struct MPIDI_VC* vc,
                                     MPID_IOV * iov,
                                     int n_iov,
-                                    int len,
+                                    size_t len,
                                     int * num_bytes_ptr,
                                     vbuf **buf_handle);
 
Index: src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c
===================================================================
--- src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c	(revision 4874)
+++ src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c	(working copy)
@@ -256,11 +256,6 @@
 
     /* Calculate_IOV_len(iov, n_iov, len); */
 
-    if (len > VBUF_BUFFER_SIZE)
-    {
-        len = VBUF_BUFFER_SIZE;
-    }
-
     avail   = len;
     PACKET_SET_RDMA_CREDIT(header, vc);
     *num_bytes_ptr = 0;
@@ -843,7 +838,7 @@
 int MPIDI_CH3I_MRAILI_Eager_send(MPIDI_VC_t * vc,
                                  MPID_IOV * iov,
                                  int n_iov,
-                                 int pkt_len,
+                                 size_t pkt_len,
                                  int *num_bytes_ptr,
                                  vbuf **buf_handle)
 {
@@ -854,7 +849,11 @@
 
     /* first we check if we can take the RDMA FP */
     if(MPIDI_CH3I_MRAILI_Fast_rdma_ok(vc, pkt_len)) {
+    
         *num_bytes_ptr = pkt_len;
+        if (pkt_len > VBUF_BUFFER_SIZE) {
+            *num_bytes_ptr  = VBUF_BUFFER_SIZE;
+        }
         MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3I_MRAILI_EAGER_SEND);
         return MPIDI_CH3I_MRAILI_Fast_rdma_send_complete(vc, iov,
                 n_iov, num_bytes_ptr, buf_handle);


More information about the mvapich-discuss mailing list