[mvapich-discuss] Mvapich2 terminates with errors for > 64 procs
Mehmet
mbelgin at gmail.com
Thu Sep 29 16:53:07 EDT 2011
Hi Everyone,
I am using two 48-core nodes to try out mvapich, which is compiled using gcc
4.4.5 on RHEL6. I noticed that even a simple hello_world does not work for >
64 processors using mvapich. If you try a generic mpirun_rsh -np 96 ... this
is what you will get:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(413): Initialization failed
(unknown)(): Other MPI error
After some hair pulling, I found this could be due to failing on-demand
connection management. I know that disabling "registration caching" when
compiling mvapich could cause this, but it is enabled by default and I did
not use any flags to disable it. I cannot think of any other things to check
and will very much appreciate your help.
I tried to bypass the problem by increasing the on-demand threshold
(MV2_ON_DEMAND_THRESHOLD=96). It made some difference, allowing code to run
for a while more, but it eventually crashes with the same errors.
Have you ever seen this happening? Any thoughts?
Thanks in advance!
-Mehmet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110929/4bb20227/attachment.html
More information about the mvapich-discuss
mailing list