[mvapich-discuss] mpiexec: unable to post a write of the barriercommand (fwd)

Rafael Arco Arredondo rafaarco at ugr.es
Mon Jun 9 03:01:26 EDT 2008


Hi Wei,

Thanks for your reply. We'll try the following release when it's
available. 

We have to use smpd in our scenario. We use Sun Grid Engine to launch
the MPI processes, and so far it doesn't support mpd.

Best regards,

Rafa

El jue, 05-06-2008 a las 23:36 -0400, wei huang escribió:
> Hi Rafael,
> 
> smpd is not the default launcher for mpich2, on which our mvapich2 is
> based. Thus, there can be issues and unstability with that. We strongly
> recommend using mpd based startup.
> 
> Is there any specific reason that you want to use daemonless startup?
> 
> FYI, we are working on a new mvapich2 release which will have our own
> daemonless startup support. It will be available in couple of weeks. Maybe
> you can use that once it is released.
> 
> Thanks.
> 
> -- Wei
> 
> > ---------- Forwarded message ----------
> > Date: Thu, 05 Jun 2008 10:48:20 +0200
> > From: Rafael Arco Arredondo <rafaarco at ugr.es>
> > To: mvapich-discuss at cse.ohio-state.edu
> > Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier
> >     command
> >
> > Hello,
> >
> > We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for
> > InfiniBand. The compilation process of MVAPICH2 ends successfully, and
> > the applications compile with (apparently) no problems with mpicc.
> > However, mpiexec fails when programs are executed on more than one
> > computer. Particularly, MPI_Finalize reports an error which comes from
> > MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL.
> > We are using daemonless smpd
> >
> > Here is the command executed and its output:
> > mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul
> >
> > Task 0 of 10
> > Task 3 of 10
> > mpi_matrix_mult_slave()
> > Task 4 of 10
> > mpi_matrix_mult_slave()
> > Task 5 of 10
> > mpi_matrix_mult_slave()
> > Task 7 of 10
> > mpi_matrix_mult_slave()
> > Task 8 of 10
> > mpi_matrix_mult_slave()
> > Task 9 of 10
> > mpi_matrix_mult_slave()
> > Task 1 of 10
> > mpi_matrix_mult_slave()
> > Task 2 of 10
> > mpi_matrix_mult_slave()
> > Task 6 of 10
> > mpi_matrix_mult_slave()
> > mpi_matrix_mult_master()
> > Exiting task 1 of 10
> > Exiting task 2 of 10
> > Exiting task 3 of 10
> > Exiting task 4 of 10
> > Exiting task 5 of 10
> > Exiting task 6 of 10
> > Exiting task 7 of 10
> > Exiting task 8 of 10
> > Exiting task 9 of 10
> > Time: 3.258242
> > Exiting task 0 of 10
> > [0] unable to post a write of the barrier command.
> > [0] PMI_Barrier failed.
> > Fatal error in MPI_Finalize:
> > Other MPI error, error stack:
> > MPI_Finalize(234)............: MPI_Finalize failed
> > MPI_Finalize(154)............:
> > MPID_Finalize(132)...........:
> > MPIDI_CH3_Finalize(87).......: MPI_Finalize failed
> > MPIDI_CH3_Finalize(70).......:
> > MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1
> >
> > Any clues about what the problem may be?
> >
> > Thanks in advance,
> >
> > Rafa
> >
> > --
> > Rafael Arco Arredondo
> > Centro de Servicios de Informática y Redes de Comunicaciones
> > Campus de Fuentenueva - Edificio Mecenas
> > Universidad de Granada
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> 
-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada
E-18071 Granada Spain
Tel: +34 958 241010   Ext:31114   E-mail: rafaarco at ugr.es



More information about the mvapich-discuss mailing list