[mvapich-discuss] Use of mvapich with Ethernet

Sashi Balasingam sashibala2 at yahoo.com
Thu Mar 12 16:40:34 EDT 2015


The node uses Intel i350 Ethernet Adaptor....which doesn't seem to have the RoCE support.
So, given that I have use the standard TCP/IP mode in MPI, will there any issues with MPI-Broadcast features ? ....I recently saw a paper that stated issues with using IP-multicast for MPI_Bcast.
Thanks,Bala 
       From: "mvapich-discuss-request at cse.ohio-state.edu" <mvapich-discuss-request at cse.ohio-state.edu>
 To: mvapich-discuss at cse.ohio-state.edu 
 Sent: Thursday, March 12, 2015 4:07 AM
 Subject: mvapich-discuss Digest, Vol 111, Issue 18
   
Send mvapich-discuss mailing list submissions to
    mvapich-discuss at cse.ohio-state.edu

To subscribe or unsubscribe via the World Wide Web, visit
    http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
or, via email, send a message with subject or body 'help' to
    mvapich-discuss-request at cse.ohio-state.edu

You can reach the person managing the list at
    mvapich-discuss-owner at cse.ohio-state.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of mvapich-discuss digest..."


Today's Topics:

  1. Re: mvapich2 mpirun error (Panda, Dhabaleswar)
  2. Re: Use of mvapich with Ethernet (Panda, Dhabaleswar)
  3. Re: BLCR+MAVPICH2 (hljgqz)


----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Mar 2015 07:32:34 +0000
From: "Panda, Dhabaleswar" <panda at cse.ohio-state.edu>
To: "karthika.kumar at wipro.com" <karthika.kumar at wipro.com>,
    "mvapich-discuss at cse.ohio-state.edu"
    <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] mvapich2 mpirun error
Message-ID:
    <AD990D5965069C48BC0A42D6AC0FE903621708AF at CIO-KRC-D1MBX05.osuad.osu.edu>
    
Content-Type: text/plain; charset="iso-8859-1"

Hi,

You are using MVAPICH2 1.8 which was released during April 2012 (almost three years back). Unfortunately,
we will not be able to extend support for such an old version.

Please update your installation to the latest MVAPICH2 2.0.1 GA or MVAPICH2 2.1rc1 and let us know if you
encounter any issue. We will be happy to take a look at it.

Thanks,

DK

________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of karthika.kumar at wipro.com [karthika.kumar at wipro.com]
Sent: Thursday, March 12, 2015 2:43 AM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] mvapich2 mpirun error

Hi Team,

I am trying to build MVAPICH2 with intel compilers. It is getting build successfully, but getting error while running sample mpi programs. Please find the methods I followed during installation and the error I am getting. Let me know where I am going wrong and help me with the solution.

CC=icc FC=ifort ./configure --prefix=/apps/mvapich2-intel; make; make install

/apps/mvapich2-intel/bin/mpicc hello.c
/apps/mvapich2-intel/bin/mpiexec -np 2 ./a.out

Error :


=====================================================================================
=  BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=  EXIT CODE: 11
=  CLEANING UP REMAINING PROCESSES
=  YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)



MVAPICH2 version : mvapich2-1.8
OS version : centOS 5.6



Thanks in advance!


Karthika



The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 7286 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150312/12a4e576/attachment-0001.bin>

------------------------------

Message: 2
Date: Thu, 12 Mar 2015 07:36:16 +0000
From: "Panda, Dhabaleswar" <panda at cse.ohio-state.edu>
To: Sashi Balasingam <sashibala2 at yahoo.com>,
    "mvapich-discuss at cse.ohio-state.edu"
    <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] Use of mvapich with Ethernet
Message-ID:
    <AD990D5965069C48BC0A42D6AC0FE903621708E6 at CIO-KRC-D1MBX05.osuad.osu.edu>
    
Content-Type: text/plain; charset="iso-8859-1"

Which 1Gb adapter with RDMAoE you are planning to use? Currently, the RoCE support in
MVAPICH2 and MVAPICH2-X are supported for Mellanox Adapters only.

DK
________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Sashi Balasingam [sashibala2 at yahoo.com]
Sent: Thursday, March 12, 2015 12:47 AM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] Use of mvapich with Ethernet

I am planning on using a 1 Gb Ethernet connection with mvapich 2.x for a multi-node compute cluster with up to 200 cores collectively. Just a few quick questions -

1. Does any of the recent mvapich support RDMAoE with 1 Gb Ethernet links, or is it only with 10 GbE ?
2. Are the MPI broadcast primitives, such as MPI_Bcast(), MPI_Scatterv(), are reliable for usage in this config ?

3. Would the latest mvapich be the best choice for deployment ?

4. Any issues with MPI usage, when 'binding' two 1 GbE, within a node, for higher bandwidth >

Appreciate any related responses.

Bala
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5315 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150312/20d0af4c/attachment-0001.bin>

------------------------------

Message: 3
Date: Thu, 12 Mar 2015 19:07:35 +0800
From: hljgqz <15776869853 at 163.com>
To: Jian Lin <lin.2180 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] BLCR+MAVPICH2
Message-ID: <676ec462.2877b.14c0dab0b77.Coremail.15776869853 at 163.com>
Content-Type: text/plain; charset="UTF-8"

Hi thank you for your help, here comes more detials:<br/><br/>[root at node3 bin]# mpiname -a<br/>MVAPICH2 2.1rc1 Thu Dec 18 20:00:00 EDT 2014 ch3:mrail<br/><br/>Compilation<br/>CC: gcc    -DNDEBUG -DNVALGRIND -O2<br/>CXX: g++  -DNDEBUG -DNVALGRIND -O2<br/>F77: gfortran -L/lib -L/lib  -O2<br/>FC: gfortran  -O2<br/><br/>Configuration<br/>--enable-ckpt --disable-shared<br/>Yesterday I inster sleep(30) into cpi ,however Today I run NPB3.3-MPI program named bt.A.4<br/><br/>[root at node3 NPB3.3-MPI]# mpirun_rsh -np 4 -hostfile hosts MV2_DEBUG_FT_VERBOSE=1 bin/bt.A.4<br/>on another console I run cr_checkpoint -p <PID> try to make a checkpoint , but I can't checkpoint the mpirun_rsh.<br/>here is messages come from console of mpirun_rsh<br/>[node3:mpirun_rsh][CR_Callback] Unexpected results from 0: ""<br/>[node3:mpirun_rsh][CR_Callback] Some processes failed to checkpoint. Abort checkpoint...
At 2015-03-12 03:53:55, "Jian Lin" <lin.2180 at osu.edu> wrote:
>Hi, 
>
>Thanks for your note.
>
>Are you using the cpi program came with MPICH without any modification?
>This program runs very fast, and there may be no enough time for
>capturing a snapshot. When trying to checkpoint a job that has
>completed, cr_checkpoint will dump the errors as you post.
>
>Besides the output of "mpiname -a" and the output with
>"MV2_DEBUG_FT_VERBOSE=1", can you please also provide the last few lines
>of dmesg output after the error occurs? It will be helpful for us to
>understand what happens.
>
>On Wed, 11 Mar 2015 14:55:09 +0000
>Jonathan Perkins <perkinjo at cse.ohio-state.edu> wrote:
>
>> Hi thanks for your note.
>> 
>> Can you provide us the output of mpiname -a?  Also can you rerun the
>> job(s) but also set MV2_DEBUG_FT_VERBOSE equal to 1?
>> 
>> On Wed, Mar 11, 2015 at 8:35 AM hljgqz <15776869853 at 163.com> wrote:
>> 
>> > Dear all,
>> >    I have a problem on using Checkpoint/Restart on mvapich2-2.1 .
>> > My cluster nodes use centos6.6 x86_64 , mallenox infiniband  , BLCR
>> > is well installed , I can use it ckpt normal programs.
>> > and I configure the mvapich2 with ./configure --enable-ckpt
>> > --disable-shared .
>> >    However , I can't checkpoint when use mpirun_rsh -np 4 -hostfile
>> > hosts ./cpi (or other mpi program like lu.A.4 ). When the program
>> > finished ,here came :
>> > [root at node3 node0]# cr_checkpoint -p 3366
>> > - chkpt_watchdog: 'mpirun_rsh' (tgid/pid 3366/3367) exited with
>> > code 0 during checkpoint
>> > - chkpt_watchdog: 'mpirun_rsh' (tgid/pid 3366/3371) exited with
>> > code 0 during checkpoint
>> > - chkpt_watchdog: 'mpirun_rsh' (tgid/pid 3366/3366) exited with
>> > code 1 during checkpoint
>> > - chkpt_watchdog: 'mpirun_rsh' (tgid/pid 3366/3368) exited with
>> > code 1 during checkpoint
>> > - chkpt_watchdog: 'mpirun_rsh' (tgid/pid 3366/3369) exited with
>> > code 1 during checkpoint
>> > Checkpoint failed: no processes checkpointed
>> >
>> >  And ,if I use mpiexec -n 4 ./cpi , I can run cr_checkpoint to get a
>> > context , but I can't restart . here come :
>> > [root at node3 node0]# cr_restart context.3436
>> > [mpiexec at node3] HYDT_dmxu_poll_wait_for_event
>> > (tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents &
>> > ~POLLIN & ~POLLOUT & ~POLLHUP & ~POLLERR)) failed
>> > [mpiexec at node3] HYD_pmci_wait_for_completion
>> > (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
>> > [mpiexec at node3] main (ui/mpich/mpiexec.c:344): process manager error
>> > waiting for completion
>> >
>> >
>> >
>> >
>> >
>> > ?????????????? <http://shouji.163.com>
>> > _______________________________________________
>> > mvapich-discuss mailing list
>> > mvapich-discuss at cse.ohio-state.edu
>> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >
>
>
>
>-- 
>Jian Lin
>http://linjian.org



------------------------------

Subject: Digest Footer

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


------------------------------

End of mvapich-discuss Digest, Vol 111, Issue 18
************************************************


  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150312/0135053d/attachment-0001.html>


More information about the mvapich-discuss mailing list