[mvapich-discuss] MPD related error

yogeshwar sonawane yogyas at gmail.com
Sat Jul 5 04:23:47 EDT 2008


Hi all,

I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes.
Every node has 8 cores/cpus.

Out of 64, sometimes one or more processes gets killed or closed. The
node on which there are less than 8 processes running has following
message which comes in /var/log/messages file :-

Jul  4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb
handling:   exceptions.AttributeError: 'int'
object has no attribute 'send_dict_msg'
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py  652
handle_lhs_input         self.ring.rhsSock.send_dict_msg(msg)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py  743
handle_active_streams         handler(stream,*args)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py  481  run
rv = self.streamHandler.handle_active_streams(timeout=5.0)
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py  1408
launch_mpdman_via_fork         mpdman.run()
/home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py  1325
run_one_cli         (manPid,toManSock) =
self.launch_mpdman_via_fork(msg,man_env)
   /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py  1199
do_mpdrun         self.run_one_cli(lorank,msg)
 /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py  854
handle_lhs_input         self.do_mpdrun(msg)     /home/htdg

Can anybody give me some more info about this ?
Is this some kind of setup/settings issue on nodes ?

Thanks,
Yogeshwar


More information about the mvapich-discuss mailing list