[mvapich-discuss] Problems with MPI_Publish_name in MVAPICH2....
Shucai Xiao
shucai at vt.edu
Thu Jun 23 16:23:45 EDT 2011
Skipped content of type multipart/alternative-------------- next part --------------
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpicc testPub.c
[scxiao at gpu0031 connect]$ ./sync.sh
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:006c004d:00000001)#"$P"
serve name: "MyTest"
Here1
^CCtrl-C caught... cleaning up processes
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:0014004b:00000001)#"$P"
serve name: "MyTest"
Here1
^CCtrl-C caught... cleaning up processes
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name)"
Here1
[gpu0032.cluster:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:222): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:179): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name)"
Here1
[gpu0031.cluster:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
-------------- next part --------------
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
int errs = 0;
char port_name[MPI_MAX_PORT_NAME], port_name_out[MPI_MAX_PORT_NAME];
char serv_name[256];
int merr, mclass;
MPI_Comm comm;
MPI_Status status;
char errmsg[MPI_MAX_ERROR_STRING];
int msglen;
int rank;
int data = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
//strcpy(port_name, "otherhost:122");
strcpy(serv_name, "MyTest");
MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
if (rank == 0)
{
MPI_Open_port(MPI_INFO_NULL, port_name);
merr = MPI_Publish_name(serv_name, MPI_INFO_NULL, port_name);
if (merr)
{
errs++;
MPI_Error_string(merr, errmsg, &msglen);
printf("Error in publishing_name:\"%s\"\n", errmsg);
fflush(stdout);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
merr = MPI_Unpublish_name(serv_name, MPI_INFO_NULL, port_name);
if (merr)
{
errs++;
MPI_Error_string(merr, errmsg, &msglen);
printf("Error in unpublishing_name:\"%s\"\n", errmsg);
fflush(stdout);
}
printf("Here1\n");
merr = MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
printf("Here2\n");
if (merr != MPI_SUCCESS)
{
printf("accept error = %d\n", merr);
}
data = 100;
MPI_Send(&data, 1, MPI_INT, 0, 0, comm);
}
else
{
MPI_Barrier(MPI_COMM_WORLD);
merr = MPI_Lookup_name(serv_name, MPI_INFO_NULL, port_name_out);
if (merr)
{
errs++;
MPI_Error_string(merr, errmsg, &msglen);
printf("Error in lookup name: \"%s\"\n", errmsg);
fflush(stdout);
}
else
{
printf("lookup name: \"%s\"\n", port_name_out);
printf("serve name: \"%s\"\n", serv_name);
// if (strcmp(port_name, port_name_out))
// {
// errs++;
// printf("Lookup name returned the wrong value (%s)\n", port_name_out);
// fflush(stdout);
// }
}
MPI_Barrier(MPI_COMM_WORLD);
sleep(3);
MPI_Comm_connect(port_name_out, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
MPI_Recv(&data, 1, MPI_INT, 0, 0, comm, &status);
printf("data = %d\n", data);
}
printf("rank = %d\n", rank);
MPI_Finalize();
return 0;
}
-------------- next part --------------
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpicc testPub.c
[scxiao at gpu0031 connect]$ ./sync.sh
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:007c004d:00000001)#"$"
serve name: "MyTest"
Here1
Here2
rank = 0
data = 100
rank = 1
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:0024004e:00000001)#"$"
serve name: "MyTest"
Here1
Here2
rank = 0
data = 100
rank = 1
------------------The above cases work----------------------------------------------------
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name), error stack:
MPID_NS_Lookup(185): Lookup failed for service name MyTest"
Here1
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name), error stack:
MPID_NS_Lookup(185): Lookup failed for service name MyTest"
Here1
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion
-----------------------------------------These above two do not work--------------------------------------------------
More information about the mvapich-discuss
mailing list