[mvapich-discuss] Problems with MPI_Publish_name in MVAPICH2....

Shucai Xiao shucai at vt.edu
Thu Jun 23 16:23:45 EDT 2011


Skipped content of type multipart/alternative-------------- next part --------------
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpicc testPub.c 
[scxiao at gpu0031 connect]$ ./sync.sh 
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:006c004d:00000001)#"$P"
serve name: "MyTest"
Here1
^CCtrl-C caught... cleaning up processes
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:0014004b:00000001)#"$P"
serve name: "MyTest"
Here1
^CCtrl-C caught... cleaning up processes
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name)"
Here1
[gpu0032.cluster:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:222): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:179): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.7a2-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name)"
Here1
[gpu0031.cluster:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

-------------- next part --------------
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
	int errs = 0;
	char port_name[MPI_MAX_PORT_NAME], port_name_out[MPI_MAX_PORT_NAME];
	char serv_name[256];
	int merr, mclass;
	MPI_Comm comm;
	MPI_Status status;
	char errmsg[MPI_MAX_ERROR_STRING];
	int msglen;
	int rank;
	int data = 0;

	MPI_Init(&argc, &argv);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);

	//strcpy(port_name, "otherhost:122");
	strcpy(serv_name, "MyTest");
	MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

	if (rank == 0)
	{
		MPI_Open_port(MPI_INFO_NULL, port_name);
		merr = MPI_Publish_name(serv_name, MPI_INFO_NULL, port_name);
		if (merr)
		{
			errs++;
			MPI_Error_string(merr, errmsg, &msglen);
			printf("Error in publishing_name:\"%s\"\n", errmsg);
			fflush(stdout);
		}

		MPI_Barrier(MPI_COMM_WORLD);
		MPI_Barrier(MPI_COMM_WORLD);

		merr = MPI_Unpublish_name(serv_name, MPI_INFO_NULL, port_name);
		if (merr)
		{
			errs++;
			MPI_Error_string(merr, errmsg, &msglen);
			printf("Error in unpublishing_name:\"%s\"\n", errmsg);
			fflush(stdout);
		}

		printf("Here1\n");
		merr = MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
		printf("Here2\n");
		if (merr != MPI_SUCCESS)
		{
			printf("accept error = %d\n", merr);
		}
		data = 100;
		MPI_Send(&data, 1, MPI_INT, 0, 0, comm);
	}
	else
	{
		MPI_Barrier(MPI_COMM_WORLD);
		merr = MPI_Lookup_name(serv_name, MPI_INFO_NULL, port_name_out);
		if (merr)
		{
			errs++;
			MPI_Error_string(merr, errmsg, &msglen);
			printf("Error in lookup name: \"%s\"\n", errmsg);
			fflush(stdout);
		}
		else
		{
			printf("lookup name: \"%s\"\n", port_name_out);
			printf("serve name: \"%s\"\n", serv_name);
//			if (strcmp(port_name, port_name_out))
//			{
//				errs++;
//				printf("Lookup name returned the wrong value (%s)\n", port_name_out);
//				fflush(stdout);
//			}
		}

		MPI_Barrier(MPI_COMM_WORLD);
		sleep(3);

		MPI_Comm_connect(port_name_out, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
		MPI_Recv(&data, 1, MPI_INT, 0, 0, comm, &status);
		printf("data = %d\n", data);
	}

	printf("rank = %d\n", rank);

	MPI_Finalize();

	return 0;
}

-------------- next part --------------
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpicc testPub.c 
[scxiao at gpu0031 connect]$ ./sync.sh 
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out 
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:007c004d:00000001)#"$"
serve name: "MyTest"
Here1
Here2
rank = 0
data = 100
rank = 1
[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0031.cluster
lookup name: "tag#0$description#"#RANK:00000000(000001c2:0024004e:00000001)#"$"
serve name: "MyTest"
Here1
Here2
rank = 0
data = 100
rank = 1
------------------The above cases work----------------------------------------------------

[scxiao at gpu0031 connect]$ ~/software/trunk-r8710-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name), error stack:
MPID_NS_Lookup(185): Lookup failed for service name MyTest"
Here1

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
[scxiao at gpu0031 connect]$ ~/software/mvapich2-1.6-install/bin/mpiexec.hydra -env MV2_SUPPORT_DPM=1 -nameserver gpu0031 -hosts gpu0031,gpu0032 -np 2 ./a.out
hostName = gpu0031.cluster
hostName = gpu0032.cluster
Error in lookup name: "Invalid service name (see MPI_Publish_name), error stack:
MPID_NS_Lookup(185): Lookup failed for service name MyTest"
Here1

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at gpu0031.cluster] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:0 at gpu0031.cluster] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at gpu0031.cluster] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
[mpiexec at gpu0031.cluster] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at gpu0031.cluster] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion
[mpiexec at gpu0031.cluster] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion
-----------------------------------------These above two do not work--------------------------------------------------


More information about the mvapich-discuss mailing list