[mvapich-discuss] WRF 3.4.1 hangs at mpi_finalize with
mvapich2-1.9rc1
Devendar Bureddy
bureddy at cse.ohio-state.edu
Thu May 9 14:45:53 EDT 2013
Hi Parker
We would like to reproduce hang issue(with 74-day simulation) to analyze it
further. Can you please provide additional details like WRF configuration,
I/O configuration and run-time details( #processes, run-time parameters
..etc).
The additional messages with 1-day run are because of some objects(comm,
group..etc) are not freed in the application. I think these messages
should not be a major concern here.
-Devendar
On Thu, May 9, 2013 at 1:18 PM, Parker Norton <parker.norton at gmail.com>wrote:
> Hello,
>
> I have been successfully using mvapich2-1.5.1 with the Weather Research
> and Forecast Model (WRF) version 3.4.1. Recently our cluster OS was
> updated which necessitated re-compiling the WRF model and required
> libraries. I chose to use mvapich2-1.9rc1 for the parallel library. I
> successfully used the Intel compilers (version 11.1) to compile the
> software.
>
> However when I run the WRF model I get the following behavior. For long
> runs (73 day simulations) the logfile indicates the model completed
> successfully but then just hangs, never terminating execution. When I turn
> on additional debugging output for the model it appears to be hanging on
> the MPI_Finalize call.
>
> When I perform a 1-day run of the same model the model successfully
> completes and terminates execution but I get the following additional
> messages in my log output:
> leaked context IDs detected: mask=0x2b5524fd3260 mask[0]=0x1ffffff
> In direct memory block for handle type GROUP, 2 handles are still
> allocated
> In direct memory block for handle type ATTR, 2 handles are still
> allocated
> In direct memory block for handle type KEYVAL, 1 handles are still
> allocated
> In direct memory block for handle type COMM, 7 handles are still
> allocated
>
> I found a discussion at
> http://www.nacad.ufrj.br/online/sgi/007-3773-018/sgi_html/ch10.html#Z1175712035tlsthat indicated the problem with MPI_Finalize hanging was usually related to
> unmatched or uncomplete send/recv requests. During my searches I was not
> able to find any discussions where others were experiencing problems with
> WRF related to this.
>
> I also tried compiling mvapich2-1.7a2 to see if that version would work
> correctly with WRF but it exhibits the same behavior.
>
> I was able to get a binary of the mvapich2-1.5.1 library that I had been
> using on the old system onto the new system and got it to work. When I use
> this rather dated version of the mvapich2 library the WRF model runs
> without any problems or additional error/warning messages.
>
> At this point I am able to run my WRF model with the older version of
> mvapich2 but I would like to be able to take advantage of the improvements
> and bug fixes in the newer versions.
>
> The system I am on uses Infiniband to connect the nodes. The configure
> line I used is:
>
> ./configure --prefix=/usr/local/mvapich2-1.9rc1-intell11
> --enable-shared --enable-g=all --enable-error-messages=all F77="ifort"
> FC="ifort" CC="icc" CXX="icpc"
>
> Results from mpichversion:
> MVAPICH2 Version: 1.9rc1
> MVAPICH2 Release date: Tue Apr 16 12:35:17 EDT 2013
> MVAPICH2 Device: ch3:mrail
> MVAPICH2 configure: --prefix=/usr/local/mvapich2-1.9rc1-intel11
> --enable-shared --enable-g=all --enable-error-messages=all
> MVAPICH2 CC: icc -g -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 CXX: icpc -g -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77: gfortran -L/lib -L/lib -g -O2
> MVAPICH2 FC: ifort -g -O2
>
>
> Any help or insights that could be offered in figuring this out would be
> appreciated. Please let me know if you have further questions.
>
> Parker Norton
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130509/0f06a3ce/attachment.html
More information about the mvapich-discuss
mailing list