[mvapich-discuss] Issue with mpi_alltoall on 64 nodes or more

Rick Warner rick at microway.com
Wed Apr 26 14:27:11 EDT 2006


It gave the same slow behavior with the DISABLE_RDMA_ALLTOALL=1 addition.  
Another thing that has been tried is to split the machines list up so that 16 
systems from each leaf switch are used.  With this configuration, it seems to 
run properly about 90% of the time, only sometimes taking multiple seconds to 
complete. 

On Wednesday 26 April 2006 01:48, Sayantan Sur wrote:
> Hello Rick,
>
> * On Apr,1 Rick Warner<rick at microway.com> wrote :
> > Hello all,
> >  We are experiencing a problem on a medium sized infiniband cluster (89
> > nodes).  mpi_alltoall on 64 or more nodes takes an excessively long time.
> >  On 63 nodes, it completes in a fraction of a second.  On 64, it takes
> > about 20 seconds.
>
> Thanks for your report to the group. Could you please try to use the
> Alltoall program like this:
>
> $ mpirun_rsh -np 64 -hostfile mf DISABLE_RDMA_ALLTOALL=1 ./a.out
>
> If you could report the result of this back, it will help us in
> narrowing down the problem.
>
> Thanks,
> Sayantan.

-- 
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517


More information about the mvapich-discuss mailing list