[mvapich-discuss] collectives fail under mvapich2-1.0 (fwd)

Edmund Sumbar esumbar at ualberta.ca
Mon Oct 1 15:12:35 EDT 2007


amith rajith mamidala wrote:
> We were able to run the 12 process test for collectives on 3 nodes.
> Can you provide us some details as to how the processes were launched?
> e.g. block or cyclic or any other distrubution.

Hi Amith,

I've been running the tests as batch jobs
through Torque/Maui using the mpiexec
program.  All parameters are defaults,
as far as I know.  Typical job script is

   #!/bin/bash
   #PBS -S /bin/bash
   #PBS -l nodes=3:ppn=4
   #PBS -l pvmem=1gb
   #PBS -W x=QOS:test
   test=coll
   size=3x4
   mpiexec=/usr/local/mpiexec/bin/mpiexec
   skampi=/scratch/esumbar/mpi-test.d/skampi/mvapich/skampi-5.0.1-r0191/skampi
   cd $PBS_O_WORKDIR
   $mpiexec $skampi -i ${test}.ski -o ${test}_ib-${size}.sko


Mpiexec details...

   $ /usr/local/mpiexec/bin/mpiexec --version
   Version 0.81, configure options: '--with-pbs=/opt/torque'
     '--with-default-comm=mpich2-pmi' '--prefix=/usr/local/mpiexec'


System...

   $ uname -a
   Linux 2.6.21-smp #1 SMP Tue Aug 7 12:45:20 MDT 2007 GNU/Linux


Routing table...

   $ netstat -r
   Kernel IP routing table
   Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
   255.255.255.255 *               255.255.255.255 UH        0 0          0 eth0
   10.0.6.0        *               255.255.255.0   U         0 0          0 ib0
   129.128.125.0   *               255.255.255.0   U         0 0          0 eth1
   192.168.44.0    *               255.255.255.0   U         0 0          0 vmnet8
   192.168.43.0    *               255.255.255.0   U         0 0          0 vmnet1
   10.0.0.0        *               255.255.0.0     U         0 0          0 eth0
   224.0.0.0       *               240.0.0.0       U         0 0          0 eth0
   default         gateway.nic.ual 0.0.0.0         UG        0 0          0 eth1


Please let me know if you need further info.

Is there a diagnostic mode that can be
enabled?

Could there be some MVAPICH2 parameter
that needs adjusting from its default
value?

-- 
Ed[mund [Sumbar]]
AICT Research Support, Univ of Alberta


More information about the mvapich-discuss mailing list