[mvapich-discuss] hydra errors

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Jun 4 08:17:36 EDT 2012


Thanks for the reply. Is there a way we can reproduce this? What happens
when you use mpirun_rsh? Have you tried mvapich2-1.8?
On Jun 4, 2012 8:09 AM, "Walid" <walid.shaari at gmail.com> wrote:

> Dear Jonathan,
>
> it is 1.7 compiled with intel 10 for application compatibility and system
> reason the configuration as below
>
>  /usr/local/mpi/mvapich2/intel10/1.7/bin/mpiname -o
> Configuration
> --prefix=/usr/local/mpi/mvapich2/intel10/1.7 --with-device=ch3:psm
> --enable-g=dbg --enable-romio --enable-debuginfo
> -with-file-system=panfs+nfs+ufs --with-psm-include=/usr/include
> --with-psm=/usr/lib64
>
> On 4 June 2012 02:05, Jonathan Perkins <perkinjo at cse.ohio-state.edu>wrote:
>
>> Thanks for the report Walid.  Can you tell us the version of MVAPICH2
>> being used and whether or not this is reproduceable with the
>> OSU-Micro-Benchmarks?  Providing the configuration options used to build
>> MVAPICH2 as well as architecture information of the machines this is
>> being run on may be helpful as well.
>>
>> On Sun, Jun 03, 2012 at 01:56:55PM +0300, Walid wrote:
>> > Dear all,
>> >
>> > One of the users have reported that almost all of his jobs die when he
>> run
>> > using mvapich2, below are the error messages, he is using a simple call:
>> >
>> >                                 mpirun program program options  > output
>> > file
>> >
>> > I have asked him to use --stdout=output file, and mpiexec.hydra, he did
>> not
>> > come back to me yet with if it was successful or not, however i wanted
>> to
>> > see if these errors were seen before or not
>> >
>> >
>> > [mpiexec at plci340] stdoe_cb (./ui/utils/uiu.c:309): assert (!closed)
>> failed
>> >
>> > porgram.err:[mpiexec at plci340] control_cb
>> (./pm/pmiserv/pmiserv_cb.c:306):
>> > error in the UI defined callback
>> >
>> > H.err:[mpiexec at plci340] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > SW.err:[mpiexec at plci340] HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
>> > SW.err:[mpiexec at plci340] main (./ui/mpich/mpiexec.c:405): process
>> manager
>> > error waiting for completion
>> >
>> > Test.err:[mpiexec at ulca103] control_cb (./pm/pmiserv/pmiserv_cb.c:215):
>> > assert (!closed) failed SW_Test.err:
>> >
>> > [mpiexec at ulca103] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> >
>> > SW_Test.err:[mpiexec at ulca103] HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
>> >
>> > _SW_Test.err:[mpiexec at ulca103] main (./ui/mpich/mpiexec.c:405): process
>> > manager error waiting for completion _2.err:[mpiexec at plch419] stdoe_cb
>> > (./ui/utils/uiu.c:309): assert (!closed) failed VG_2.err:
>> >
>> > [mpiexec at plch419] control_cb (./pm/pmiserv/pmiserv_cb.c:306): error in
>> the
>> > UI defined callback _2.err:[mpiexec at plch419]
>> >
>> > HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
>> > returned error status
>> >
>> > VG_2.err:[mpiexec at plch419] HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
>> >
>> > VG_2.err:[mpiexec at plch419] main (./ui/mpich/mpiexec.c:405): process
>> manager
>> > error waiting for completion
>> >
>> > VG_CO_2.err:[mpiexec at plch374] stdoe_cb (./ui/utils/uiu.c:309): assert
>> > (!closed) failed
>> >
>> > CO_2.err:[mpiexec at plch374] control_cb (./pm/pmiserv/pmiserv_cb.c:306):
>> > error in the UI defined callback
>> >
>> >  ion available (required by /red/ct2/GP/GP_plch.exe)
>> >
>> > [mpiexec at plch416] stdoe_cb (./ui/utils/uiu.c:309): assert (!closed)
>> failed
>> > [mpiexec at plch416] control_cb (./pm/pmiserv/pmiserv_cb.c:306): error in
>> the
>> > UI defined callback [mpiexec at plch416] HYDT_dmxu_poll_wait_for_event
>> > (./tools/demux/demux_poll.c:77): callback returned error status
>> > [mpiexec at plch416] HYD_pmci_wait_for_completion
>> > (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
>> [mpiexec at plch416]
>> > main (./ui/mpich/mpiexec.c:405): process manager error waiting for
>> > completion
>> >
>> >
>> > thank you,
>> >
>> > Walid
>>
>> > _______________________________________________
>> > mvapich-discuss mailing list
>> > mvapich-discuss at cse.ohio-state.edu
>> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> --
>> Jonathan Perkins
>> http://www.cse.ohio-state.edu/~perkinjo
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120604/36983b62/attachment.html


More information about the mvapich-discuss mailing list