[mvapich-discuss] Re: confirm 43b1c24702cc8a5af97208dd4d163c4ca64ede44

Hari Subramoni subramon at cse.ohio-state.edu
Tue Apr 20 22:51:53 EDT 2010


Hi,

Looks like one of the standard memory operations failed on the nodes.
This is very strange given that it is passing on some of the nodes of
your cluster. Are all machines running the same version of OFED/Linux?

Also, could you please check the amount of memory applications are
allowed to use? You can do this by verifying whether
/etc/security/limits.conf file on all systems has the following lines in
them

# End of file
* soft memlock unlimited
* hard memlock unlimited

Thx,
Hari.

On Tue, 20 Apr 2010, Battalgazi YILDIRIM wrote:

> Hi,
>
> I am using mvapich2   and running my program up 64 processors successful
> (using gcc 4.1.2
> on Readhat 4.1.2-44) , but if I  try 128 or 256 processors, I am getting
> following  error from each processors,
>
> I hope that you can help me out,
>
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(311)..: Initialization failed
> MPID_Init(191).........: channel initialization failed
> MPIDI_CH3_Init(156)....:
> MPIDI_CH3I_CM_Init(993): Error initializing MVAPICH2 MPIU_Malloc library
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(311)..: Initialization failed
> MPID_Init(191).........: channel initialization failed
> MPIDI_CH3_Init(156)....:
> MPIDI_CH3I_CM_Init(993): Error initializing MVAPICH2 MPIU_Malloc library
>



More information about the mvapich-discuss mailing list