[mvapich-discuss] problem w/MVAPICH in the frames of Gen1
Mikhail Kuzminsky
kus at free.net
Fri Aug 4 09:36:02 EDT 2006
In message from Dhabaleswar Panda <panda at cse.ohio-state.edu> (Thu, 3
Aug 2006 12:32:45 -0400 (EDT)):
>Mikhail - Thanks for your note. Since you are trying MVAPICH with
>IBGD-1.8.0, let me suggest that you
>contact Mellanox people regarding this problem.
>You are also using a very old version of MVAPICH (0.9.5).
There is no difference in the case of using of your last
MVAPICH-0.9.8:
for example, after mpicc -noshlib -o cpi cpi.c :
mpirun_rsh -rsh -np 1 c5ws1.chem.ac.ru ./cpi
[0] Abort: Cannot allocate PD (Invalid Virtual Address) at line 745 in
file viainit.c
mpirun: executable version 0 does not match our version 3.
done.
The only plus of 0.9.8 in this sense is that it install w/right pathes
in mpif77/mpif90/mpicc etc.
BTW, what means here message about "mismatch" of executable version ?
Yours
Mikhail
> You have
>indicated a reasoning behind this. However, as you know, IB and the
>related stacks are getting updated very frequently. We have just
>released 0.9.8. Thus, it will be better if you start using the latest
>version and the latest stacks.
>
>DK
>
>
>> We worked very well w/MVAPICH-0.9.4 from Gen1 IBGD-1.6.1 Mellanox
>> toolset. Then about 1 year ago we upgraded to IBGD-1.8.0
>> w/MVAPICH-0.9.5 but worked only w/applications using TCP/IP over IB
>> stack, and I didn't verify MPI (the common IBGD installation process
>> said that all is OK) :-(
>>
>> But now I found that MVAPICH-0.9.5 in 1.8.0 don't work for me :-( !
>> Any issuing of mpirun_rsh (for any MPI application - from our own
>> program to standard tests) gives the messages like following:
>>
>> /home/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh -rsh -np 2
>> -hostfile mf /home/local/ibgd/mpi/osu/gcc/tests/osu-tests/bw 1000 16
>>>
>> testmpi 2>&1
>>
>> [1] Abort: Cannot allocate PD (Invalid Virtual Address) at line 688
>>in
>> file viainit.c
>> [0] Abort: Cannot allocate PD (Invalid Virtual Address)mpirun:
>> executable version 1 does not match our version 3.
>> at line 688 in file viainit.c
>>
>> I installed also mvapich-0.9.5 from IBGD-1.8.0 "manually", i.e. not
>>as
>> part of standard IBGD installation. The reason was that after IBGD
>> installation I found wrong pathes in mpif77/mpif90/mpirun etc
>>scripts
>> (they don't include starting "prefixes" of whole pathes). But after
>> manual installation of mvapich-0.9.5 for Intel and Pathscale
>>compilers
>> I found the same wrong pathes, and the same problem as it pointed
>> above :-(
>>
>> Could you pls help me to solve of this problem ?
>>
>> Some (may be stupid's) ideas - strange for me things, based on
>>strace
>> output (I'm also applying strace output below):
>>
>> 1) I have no LD_LIBRARY_PATH set for user, it looks as not necessary
>> according README, but ld.so.cache was searched
>>
>> 2) There is an attempt to access to //home/local/.../mvapich.conf -
>> i.e. w/2 slashes instead of one at begin ?
>>
>> We use old versions of software because of using of (now ancient)
>>SuSE
>> Prof. 9.0 for x86-64 w/2.4.21 SMP kernel; this environment is
>> necessary because of restrictions of some used binary applications.
>>
>> Yours
>> Mikhail Kuzminsky
>> Zelinsky Institute of Organic Chemistry
>> Moscow
>>
>> strace /home/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh
>>-rsh
>> -np 2 -hostfile mf /home/local/ibgd/mpi/osu/gcc/tests/osu-tests/bw
>> 1000 1
>>
>> ===================================== strace output
>>=================
>> execve("/home/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh",
>> ["/home/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh",
>>"-rsh",
>> "-np", "2", "-hostfile", "mf",
>> "/home/local/ibgd/mpi/osu/gcc/tests/osu-tests/bw", "1000", "16"],
>>[/*
>> 69 vars */]) = 0
>> uname({sys="Linux", node="c5ws1", ...}) = 0
>> brk(0) = 0x505b80
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>>-1,
>> 0) = 0x2a9556b000
>> open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
>> directory)
>> open("/home/SGE/lib/lx24-amd64/tls/x86_64/libc.so.6", O_RDONLY) = -1
>> ENOENT (No such file or directory)
>> <a set of like striings was skipped>
>> stat("/home/SGE/lib/lx24-amd64", {st_mode=S_IFDIR|0755,
>>st_size=4096,
>> ...}) = 0
>> open("/usr/local/ifort/lib/tls/x86_64/libc.so.6", O_RDONLY) = -1
>> ENOENT (No such file or directory)
>> stat("/usr/local/ifort/lib/tls/x86_64", 0x7fbfffe640) = -1 ENOENT
>>(No
>> such file or directory)
>> <a set of like messages was skipped>
>> open("/etc/ld.so.cache", O_RDONLY) = 3
>> fstat(3, {st_mode=S_IFREG|0644, st_size=117044, ...}) = 0
>> mmap(NULL, 117044, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2a9556c000
>> close(3) = 0
>> open("/lib64/libc.so.6", O_RDONLY) = 3
>> read(3,
>>"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\335\1"...,
>> 640) = 640
>> fstat(3, {st_mode=S_IFREG|0755, st_size=1534814, ...}) = 0
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>>-1,
>> 0) = 0x2a95589000
>> mmap(NULL, 2365888, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
>> 0x2a9566d000
>> mprotect(0x2a95791000, 1169856, PROT_NONE) = 0
>> mmap(0x2a9586d000, 253952, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED, 3, 0x100000) = 0x2a9586d000
>> mmap(0x2a958ab000, 14784, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2a958ab000
>> close(3) = 0
>> munmap(0x2a9556c000, 117044) = 0
>> access("//home/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/etc/mvapich.conf",
>> R_OK) = -1 ENOENT (No such file or directory)
>> brk(0) = 0x505b80
>> brk(0x526b80) = 0x526b80
>> brk(0) = 0x526b80
>> brk(0x527000) = 0x527000
>> open("mf", O_RDONLY) = 3
>> fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>>-1,
>> 0) = 0x2a9556c000
>> read(3, "c5ws1.chem.ac.ru\nc5ws1.chem.ac.r"..., 4096) = 34
>> close(3) = 0
>> munmap(0x2a9556c000, 4096) = 0
>> getcwd("/home/kus/an/examples", 256) = 22
>> uname({sys="Linux", node="c5ws1", ...}) = 0
>> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
>> bind(3, {sa_family=0xeee0 /* AF_??? */,
>> sa_data="\0\0\0\0\0\0000\223X\225*\0\0\0"}, 16) = 0
>> getsockname(3, {sa_family=AF_INET, sin_port=htons(39601),
>> sin_addr=inet_addr("0.0.0.0")}, [12884901904]) = 0
>> listen(3, 2) = 0
>> rt_sigaction(SIGHUP, {0x4033e0, [HUP], SA_RESTART|0x4000000},
>> {SIG_DFL}, 8) = 0
>> rt_sigaction(SIGINT, {0x4033e0, [INT], SA_RESTART|0x4000000},
>> {SIG_DFL}, 8) = 0
>> rt_sigaction(SIGTSTP, {0x4035d0, [TSTP], SA_RESTART|0x4000000},
>> {SIG_DFL}, 8) = 0
>> rt_sigaction(SIGCHLD, {0x403640, [CHLD], SA_RESTART|0x4000000},
>> {SIG_DFL}, 8) = 0
>> rt_sigaction(SIGALRM, {0x4035f0, [ALRM], SA_RESTART|0x4000000},
>> {SIG_DFL}, 8) = 0
>> alarm(1000) = 0
>> getpid() = 19113
>> fork() = 19114
>> brk(0) = 0x527000
>> brk(0) = 0x527000
>> brk(0x526000) = 0x526000
>> brk(0) = 0x526000
>> getpid() = 19113
>> fork() = 19115
>> accept(3, [0] Abort: Cannot allocate PD (Invalid Virtual Address) at
>> line 688 in file viainit.c
>> {sa_family=AF_INET, sin_port=htons(39604),
>> sin_addr=inet_addr("192.168.0.21")}, [12884901904]) = 4
>> --- SIGCHLD (Child exited) @ 0 (0) ---
>> alarm(10) = 1000
>> wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 19114
>> wait4(-1, [1] Abort: Cannot allocate PD (Invalid Virtual Address) at
>> line 688 in file viainit.c
>> [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 19115
>> alarm(0) = 10
>> exit_group(0) = ?
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at mail.cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
More information about the mvapich-discuss
mailing list