[Columbus] parallel CI having troubles

Thomas Mueller th.mueller at fz-juelich.de
Thu Sep 8 10:33:47 EDT 2005


On Tue, 6 Sep 2005, Andy Young wrote:

> Hello COLUMBUS users and developers!
> I've attached the outcome of a parallel test job that
> was tarred with Professor Mueller's COLUMBUS
> distribution.  I've left all of the input and output
> intact, with the exception of some binary files.  I
> noted that I removed them in the "REMOVED_FILES" file.
>  The error is listed as a "lately reading sequential
> formatted external IO".  I'm not sure what to make of
> that.  My set up is 8 nodes over Myrinet, and I built
> GA on MPI-CH.  
> Thanks for any ideas or comments!
> -Andy 
> 

  Andy,

 although process 0 has generated all the integral files
 etc. and they are in fact in the WORK directory, some 
 processes can not access these files.

 Hence, make  a simple test running the job on a single
 node.

 control.run

niter = 1 
hermit
mcscf
pciudg
hosts=node001
nproc=2

 This should run.

BTW, what does your install.config file look like.


It is essential for any mpirun that the startup command
is correct and that the node, on which the integrals and
all that stuff is stored corresponds to the node on which
MPI process 0 starts.  

To check that you might try

 control.run

niter = 1
hermit
mcscf 
pciudg
hosts=node001:node001             
nproc=1:1


Upon starting some message such as 

pcallprog startup1:
 gacom:MPICH
 pscript:
 mpi_startup:mpirun -np _NPROC_ -machinefile _HOSTS_  _EXE_ _EXEOPTS_
pcallprog startup2:
 1 1 0 0 
 0 0  nentering secion hosts && nproc 
mpirun -np 2 -machinefile machines
/home/columbus/Columbus.hlischka/Columbus/Columbus/pciudg.x -m 40000000
>> /bigscr/problem1/parallel.test.jhu/runc.error 2>&1 

should appear and you can check the file machines in the WORK dir
for correctness.

In your case it is the ordering as in the control.run file
(as it should be).

cat machines

node001 
node002 
node003 
node004 
node005 
node006 
node007 
node008 



Now, in ciudgin you indicated

 &input 
 NTYPE = 0, 
 GSET = 0,
  DAVCOR =10,
 NCOREL = 22 
 NROOT = 1
 IVMODE = 3 
 NBKITR = 1
 NVBKMN = 1
 NVBKMX = 4
 RTOLBK = 1e-3,
 NITER = 10
 NVCIMN = 1
 NVCIMX = 4
 RTOLCI = 1e-3,
 IDEN  = 1 
 CSFPRN = 10, 
 MAXSEG = 20
 nseg0x = 1,1,2,2
 nseg2x = 1,1,2,2
 nseg1x = 1,1,2,2
 nseg3x = 1,1,2,2
 nsegd = 1,1,2,2
 nseg4x = 1,1,2,2,
 c2ex0ex=0
 c3ex1ex=0
 cdg4ex=1
 fileloc=1,1,3,3,3,3,2
 finalv=-1
 finalw=-1
 &end   


 fileloc=1,1,3,3,3,3,2

 meaning that the integral files are to be distributed to
 from node0 to the other nodes local disk.

 Again it is now worthwhile checking in what directory
 the spawned process end up. This depends upon mpi version
 and configuration. 
 
 Regards,

 Thomas


 




_______________________________________________________________

Dr. Thomas Mueller

Zentralinstitut fuer Angewandte Mathematik
Forschungszentrum Juelich
52425 Juelich
Tel. 02461-61-3175
FAX  02461-61-6656
                   *****


                      Central Institute for Applied Mathematics
                      Research Centre Juelich
                      D-52425 Juelich
                      Tel. 0049-2461-61-3175
                      FAX  0049-2461-61-6656

_______________________________________________________________






More information about the Columbus mailing list