[mvapich-discuss] Problem with NPB-2.4/mvapich2/BLCR
sunway qilu
sunwaycn at gmail.com
Tue Oct 23 06:04:11 EDT 2007
I'm the the mvapich2 + Blcr, but the result is not all right .
would you please help me?
many thanks!
This is my env:
OS : Linux Kernel 2.6.42
C/Fortran : intel C/C++/Fortran 10.0.0.23
mvapich2 : mvapich2-trunk-2007-10-22
BLCR : 0.6.1
Program: NPB-2.4
following is my test step:
1. $ mpdboot -n 3
2.$ cat cfg
cn22
cn22
cn22
cn22
cn23
cn23
cn23
cn23
3. normal test,the result is good.
$ mpirun -machinefile ./cfg -np 8 ./lu.A.8
NAS Parallel Benchmarks 2.4 -- LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of processes: 8
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Time step 220
Time step 240
Time step 250
Verification being performed for class A
Accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.7790210760669E+03 0.7790210760669E+03 0.1386387341159E-13
2 0.6340276525969E+02 0.6340276525969E+02 0.5603404937070E-14
3 0.1949924972729E+03 0.1949924972729E+03 0.9036993778374E-14
4 0.1784530116042E+03 0.1784530116042E+03 0.3185343769198E-15
5 0.1838476034946E+04 0.1838476034946E+04 0.1187280792767E-13
Comparison of RMS-norms of solution error
1 0.2996408568547E+02 0.2996408568547E+02 0.1185657295234E-14
2 0.2819457636500E+01 0.2819457636500E+01 0.1370326007271E-13
3 0.7347341269878E+01 0.7347341269877E+01 0.7373944071964E-14
4 0.6713922568778E+01 0.6713922568778E+01 0.7937342832911E-15
5 0.7071531568839E+02 0.7071531568839E+02 0.1185656063379E-13
Comparison of surface integral
0.2603092560489E+02 0.2603092560489E+02 0.2729609951429E-15
Verification Successful
LU Benchmark Completed.
Class = A
Size = 64x 64x 64
Iterations = 250
Time in seconds = 17.72
Total processes = 8
Compiled procs = 8
Mop/s total = 6733.74
Mop/s/process = 841.72
Operation type = floating point
Verification = SUCCESSFUL
Version = 2.4
Compile date = 23 Oct 2007
Compile options:
MPIF77 = mpif90
FLINK = mpif90
FMPI_LIB = (none)
FMPI_INC = (none)
FFLAGS = -O3
FLINKFLAGS = (none)
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb at nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
4 As the lu.A.8 running(4.1), checkpoint it(4.2) .the lu.A.8 contiune(4.3),the
result is good.
4.1 $ mpirun -machinefile ./cfg -np 8 ./lu.A.8
NAS Parallel Benchmarks 2.4 -- LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of processes: 8
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
...
4.2 $ mv2_checkpoint
PID USER TT COMMAND %CPU VSZ START CMD
7968 yangshj pts/0 mpirun 0.0 14672 17:25 mpirun -machinefile
./cfg -np 8 ./lu.A.8
Enter PID to checkpoint or Control-C to exit: 7968
Checkpointing PID 7968
Checkpoint file: context.7968
4.3 $ mpirun -machinefile ./cfg -np 8 ./lu.A.8
NAS Parallel Benchmarks 2.4 -- LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of processes: 8
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Time step 220
Time step 240
Time step 250
Verification being performed for class A
Accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.7790210760669E+03 0.7790210760669E+03 0.1386387341159E-13
2 0.6340276525969E+02 0.6340276525969E+02 0.5603404937070E-14
3 0.1949924972729E+03 0.1949924972729E+03 0.9036993778374E-14
4 0.1784530116042E+03 0.1784530116042E+03 0.3185343769198E-15
5 0.1838476034946E+04 0.1838476034946E+04 0.1187280792767E-13
Comparison of RMS-norms of solution error
1 0.2996408568547E+02 0.2996408568547E+02 0.1185657295234E-14
2 0.2819457636500E+01 0.2819457636500E+01 0.1370326007271E-13
3 0.7347341269878E+01 0.7347341269877E+01 0.7373944071964E-14
4 0.6713922568778E+01 0.6713922568778E+01 0.7937342832911E-15
5 0.7071531568839E+02 0.7071531568839E+02 0.1185656063379E-13
Comparison of surface integral
0.2603092560489E+02 0.2603092560489E+02 0.2729609951429E-15
Verification Successful
LU Benchmark Completed.
Class = A
Size = 64x 64x 64
Iterations = 250
Time in seconds = 18.78
Total processes = 8
Compiled procs = 8
Mop/s total = 6352.76
Mop/s/process = 794.10
Operation type = floating point
Verification = SUCCESSFUL
Version = 2.4
Compile date = 23 Oct 2007
Compile options:
MPIF77 = mpif90
FLINK = mpif90
FMPI_LIB = (none)
FMPI_INC = (none)
FFLAGS = -O3
FLINKFLAGS = (none)
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb at nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
5.restart the PID 7968 ,then result has "NaN "(5.1),sometimes the
"FAILURE: " & "UNSUCCESSFUL"
5.1 $ cr_restart context.7968
mpiexec_cn21 (mpiexec 335): mpiexec: Restarting
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Time step 220
Time step 240
Time step 250
Verification being performed for class A
Accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 NaN 0.7790210760669E+03 NaN
2 NaN 0.6340276525969E+02 NaN
3 NaN 0.1949924972729E+03 NaN
4 NaN 0.1784530116042E+03 NaN
5 NaN 0.1838476034946E+04 NaN
Comparison of RMS-norms of solution error
1 NaN 0.2996408568547E+02 NaN
2 NaN 0.2819457636500E+01 NaN
3 NaN 0.7347341269877E+01 NaN
4 NaN 0.6713922568778E+01 NaN
5 NaN 0.7071531568839E+02 NaN
Comparison of surface integral
NaN 0.2603092560489E+02 NaN
Verification Successful
LU Benchmark Completed.
Class = A
Size = 64x 64x 64
Iterations = 250
Time in seconds = 66.11
Total processes = 8
Compiled procs = 8
Mop/s total = 1804.50
Mop/s/process = 225.56
Operation type = floating point
Verification = SUCCESSFUL
Version = 2.4
Compile date = 23 Oct 2007
Compile options:
MPIF77 = mpif90
FLINK = mpif90
FMPI_LIB = (none)
FMPI_INC = (none)
FFLAGS = -O3
FLINKFLAGS = (none)
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb at nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
5.2.$ mpirun -machinefile ./cfg -np 8 ./lu.A.8
NAS Parallel Benchmarks 2.4 -- LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of processes: 8
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Time step 220
Time step 240
Time step 250
Verification being performed for class A
Accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
FAILURE: 1 0.7790355334612E+03 0.7790210760669E+03 0.1855841227478E-04
FAILURE: 2 0.6340489955249E+02 0.6340276525969E+02 0.3366245600758E-04
FAILURE: 3 0.1949964027466E+03 0.1949924972729E+03 0.2002884068547E-04
FAILURE: 4 0.1784563048837E+03 0.1784530116042E+03 0.1845460320509E-04
FAILURE: 5 0.1838499810682E+04 0.1838476034946E+04 0.1293230623563E-04
Comparison of RMS-norms of solution error
FAILURE: 1 0.2996451081467E+02 0.2996408568547E+02 0.1418795824413E-04
FAILURE: 2 0.2819496132217E+01 0.2819457636500E+01 0.1365358930094E-04
FAILURE: 3 0.7347450238213E+01 0.7347341269877E+01 0.1483098878912E-04
FAILURE: 4 0.6714013230847E+01 0.6713922568778E+01 0.1350359173032E-04
FAILURE: 5 0.7071607035800E+02 0.7071531568839E+02 0.1067194005931E-04
Comparison of surface integral
FAILURE: 0.2603109553197E+02 0.2603092560489E+02 0.6527892352571E-05
Verification failed
LU Benchmark Completed.
Class = A
Size = 64x 64x 64
Iterations = 250
Time in seconds = 17.15
Total processes = 8
Compiled procs = 8
Mop/s total = 6956.73
Mop/s/process = 869.59
Operation type = floating point
Verification = UNSUCCESSFUL
Version = 2.4
Compile date = 22 Oct 2007
Compile options:
MPIF77 = mpif90
FLINK = mpif90
FMPI_LIB = (none)
FMPI_INC = (none)
FFLAGS = -O3
FLINKFLAGS = (none)
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb at nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20071023/6e1a01cb/attachment-0001.html
More information about the mvapich-discuss
mailing list