[mvapich-discuss] Problem with mvapich2 + blcr

Raghu rajachan at cse.ohio-state.edu
Fri Oct 31 08:48:08 EDT 2014


Hi Ramy,

For #1, the MVAPICH2_Sync_Checkpoint() call is exposed via mpi.h itself, so
you need not include another header, or link to any other library. I see
that you are installing to a non-standard prefix, but later using the
default mpicc in your path when running the test. Are you exporting $PATH
and $LD_LIBRARY_PATH appropriately? Can you send me the output of the
following:

$ mpiname
and
$ /home/gad/install_mvapich2-2.1a/bin/mpiname

I am guessing that your test run is picking up the system's default build
of MVAPICH2, which might not have been configured with checkpointing
support.

For #2, I quickly tested it locally, and things work as expected. I have a
feeling it could be related to the above point I made about using a build
which does not have this support enabled. Can you retry your runs
explicitly using the install in your prefix?

$ /home/gad/install_mvapich2-2.1a/bin/mpicc test.cpp -o testcp
$  /home/gad/install_mvapich2-2.1a/bin/mpiexec.hydra.....

If case #2 fails even after you verify that you are using the correct
install, can you try it once with mpirun_rsh (MVAPICH2's recommended
launcher)?



Raghu

On Fri, Oct 31, 2014 at 3:53 AM, Gad, Ramy <gad at uni-mainz.de> wrote:

>  Hi,
>
>
>  I have installed mvapich2 V2.0 and V2.1a with this configuration.
>
>
>  ====
>
> ./configure --prefix=/home/gad/install_mvapich2-2.1a
> --with-ib-libpath=/global/packages/libibverbs-pd/lib --enable-ckpt
> --with-blcr=/opt/blcr --enable-checkpointing --with-hydra-ckpointlib=blcr
>
> ====
>
>
>  I have BLCR installed on my system and its kernel module are loaded.
>
>
>  ====
>
> gad at pandora1:/home/gad$ lsmod | grep blcr
> blcr                  115465  0
> blcr_imports           10683  1 blcr
> gad at pandora1:/home/gad$
> gad at pandora1:/home/gad$ echo $LD_LIBRARY_PATH
> :/home/gad/install_mvapich2-2.0/lib:/opt/blcr/lib
> ====
>
>
>  The problem are:
>
>
>  1- While compiling a programme with application initiated synchronous
> checkpointing (using MVAPICH2_Sync_Checkpoint() ) getting following error
> messages . : undefined reference to `MVAPICH2_Sync_Checkpoint' Is there any
> header file I need to include or link with any library ...??
>
>
>
>  ====
>
> gad at pandora1:/home/gad/mvapich2-2.1a_test$ cat testcp.cpp
> #include "mpi.h"
>     #include <unistd.h>
>     #include <stdio.h>
>
>
>
>     int main(int argc,char *argv[])
>     {
>         MPI_Init(&argc,&argv);
>         printf("Computation\n");
>         sleep(5);
>         MPI_Barrier(MPI_COMM_WORLD);
>         MVAPICH2_Sync_Checkpoint();
>         MPI_Barrier(MPI_COMM_WORLD);
>         printf("Computation\n");
>         sleep(5);
>         MPI_Finalize();
>         return 0;
>     }
> gad at pandora1:/home/gad/mvapich2-2.1a_test$ mpicc testcp.cpp -o testcp
> testcp.cpp: In function ‘int main(int, char**)’:
> testcp.cpp:13: error: ‘MVAPICH2_Sync_Checkpoint’ was not declared in this
> scope
> ====
>
>
>
>  2- When I try to checkpoint an MPI program with cr_checkpoint and
> restore it with cr_restart, I got the following error
>
> ====
>
> gad at pandora1:/home/gad$ cr_checkpoint -p 2120
>                         //2120 is the PID of mpirun process
>
> gad at pandora1:/home/gad$ cr_restart context.2120
> [mpiexec at pandora1] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN &
> ~POLLOUT & ~POLLHUP & ~POLLERR)) failed
> [mpiexec at pandora1] HYD_pmci_wait_for_completion
> (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
> [mpiexec at pandora1] main (ui/mpich/mpiexec.c:336): process manager error
> waiting for completion
>
> ====
>
>
>  I can see that a context file is only generated for the mpirun process
> "context.2120", however there are no context files generated for the MPI
> processes.
>
>  Please can you help me with this problem so that MVAPICH2 checkpointing
> works with BLCR.
>
>
>   Best Regards,
>
>  Ramy Gad
>  Johannes Gutenberg - Universität Mainz
>  Zentrums für Datenverarbeitung (ZDV)
>
> Anselm-Franz-von-Bentzel-Weg 12
> 55128 Mainz
>  Germany
>  E-Mail: gad at uni-mainz.de
>  Office Phone: +49-6131-39-26437
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141031/53b951e4/attachment.html>


More information about the mvapich-discuss mailing list