[Columbus] parallel CI having troubles
Thomas Mueller
th.mueller at fz-juelich.de
Thu Sep 8 10:33:47 EDT 2005
On Tue, 6 Sep 2005, Andy Young wrote:
> Hello COLUMBUS users and developers!
> I've attached the outcome of a parallel test job that
> was tarred with Professor Mueller's COLUMBUS
> distribution. I've left all of the input and output
> intact, with the exception of some binary files. I
> noted that I removed them in the "REMOVED_FILES" file.
> The error is listed as a "lately reading sequential
> formatted external IO". I'm not sure what to make of
> that. My set up is 8 nodes over Myrinet, and I built
> GA on MPI-CH.
> Thanks for any ideas or comments!
> -Andy
>
Andy,
although process 0 has generated all the integral files
etc. and they are in fact in the WORK directory, some
processes can not access these files.
Hence, make a simple test running the job on a single
node.
control.run
niter = 1
hermit
mcscf
pciudg
hosts=node001
nproc=2
This should run.
BTW, what does your install.config file look like.
It is essential for any mpirun that the startup command
is correct and that the node, on which the integrals and
all that stuff is stored corresponds to the node on which
MPI process 0 starts.
To check that you might try
control.run
niter = 1
hermit
mcscf
pciudg
hosts=node001:node001
nproc=1:1
Upon starting some message such as
pcallprog startup1:
gacom:MPICH
pscript:
mpi_startup:mpirun -np _NPROC_ -machinefile _HOSTS_ _EXE_ _EXEOPTS_
pcallprog startup2:
1 1 0 0
0 0 nentering secion hosts && nproc
mpirun -np 2 -machinefile machines
/home/columbus/Columbus.hlischka/Columbus/Columbus/pciudg.x -m 40000000
>> /bigscr/problem1/parallel.test.jhu/runc.error 2>&1
should appear and you can check the file machines in the WORK dir
for correctness.
In your case it is the ordering as in the control.run file
(as it should be).
cat machines
node001
node002
node003
node004
node005
node006
node007
node008
Now, in ciudgin you indicated
&input
NTYPE = 0,
GSET = 0,
DAVCOR =10,
NCOREL = 22
NROOT = 1
IVMODE = 3
NBKITR = 1
NVBKMN = 1
NVBKMX = 4
RTOLBK = 1e-3,
NITER = 10
NVCIMN = 1
NVCIMX = 4
RTOLCI = 1e-3,
IDEN = 1
CSFPRN = 10,
MAXSEG = 20
nseg0x = 1,1,2,2
nseg2x = 1,1,2,2
nseg1x = 1,1,2,2
nseg3x = 1,1,2,2
nsegd = 1,1,2,2
nseg4x = 1,1,2,2,
c2ex0ex=0
c3ex1ex=0
cdg4ex=1
fileloc=1,1,3,3,3,3,2
finalv=-1
finalw=-1
&end
fileloc=1,1,3,3,3,3,2
meaning that the integral files are to be distributed to
from node0 to the other nodes local disk.
Again it is now worthwhile checking in what directory
the spawned process end up. This depends upon mpi version
and configuration.
Regards,
Thomas
_______________________________________________________________
Dr. Thomas Mueller
Zentralinstitut fuer Angewandte Mathematik
Forschungszentrum Juelich
52425 Juelich
Tel. 02461-61-3175
FAX 02461-61-6656
*****
Central Institute for Applied Mathematics
Research Centre Juelich
D-52425 Juelich
Tel. 0049-2461-61-3175
FAX 0049-2461-61-6656
_______________________________________________________________
More information about the Columbus
mailing list