[mvapich-discuss] problem running mpd as daemon with mvapich2 0.9.8

Matthew Koop koop at cse.ohio-state.edu
Wed Dec 20 15:50:20 EST 2006


> [testm at master ~]$ mpirun -np 1 ./cpi
> cannot create cq
> Failed to Initialize HCA type
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(230): Initialization failed
> MPID_Init(81)........: channel initialization failed
> (unknown)(): Other MPI errorrank 0 in job 1  master.cl.slac.stanford.edu_4268
> caused collective abort of all ranks
>   exit status of rank 0: return code 13
>
>
> The strange thing is that running "service mpd stop" on the nodes,
> then "service mpd restart" on the master and then "service mpd start" on the
> nodes fixes the problem.  After restarting the mpd service, regular users can
> successfully run jobs.  I've investigated the problem, but can't seem to
> isolate it.  Any clues?

Rick,

The error about not being able to create the CQ suggests to me that at the
point of time when the MPD daemons are being started the memory locking
limits are not set high enough. My guess is that at the later point of
time when you restart the daemons that setting has likely been set in your
environment.

Can you try adding 'ulimit -l unlimited' within your init script for MPD
and see if that solves your issue?

Thanks,
Matt



More information about the mvapich-discuss mailing list