|
|
|
The run-time trace facility may used to debug situations where bad data are generated in a program run, but the
origin of the bad data is unclear. It may also be used to remove numerical drift when comparing runs of a program under
two different compilers or in different computing environments.
FPT instruments the program code, or selected components of the code, to capture all left-hand-side
scalar quantities to file. For example,
click here for an program which
models a cannon ball with square-law drag.
This code is modified by FPT to capture the outputs of every statement.
Click here for the modified code.
The routine trace_start_sub_program logs the start of a program,
subroutine or function. The FPT library routines trace_r4_data, trace_i4_data etc.
write the left-hand-side quantities to file. These routines are self-initialising - the log file is created on the
first call if it does not already exist.
The output shows entries to sub-programs, and the left-hand-side quantities,
written one to each line. Click here to see the trace output.
The main loop, starting with the copy of hddot to p_hddot and
ending with the computation of the new value of x can clearly be seen.
|
|
When a program crashes or rus incorrectly it is often possible to use a debugger to
see bad data at the site of the crash. It is sometimes difficult to find where the bad data has come from.
The FPT run-time trace facility may be used for tracing arithmetic errors.
The program is instrumented for run-time trace (Please see the FPT Reference Manual page
for a description of the procedure) and a trace file is generated. The bad data may be found in the trace file, and then, by searching backwards,
their origin may be identified.
Note that the trace output files can become very large, so we
recommend that only a small sub-set of the files in a large program should be
instrumented.
|
|
The same program may produce significantly different results when it is built with different compilers or with different levels of
optimisation under the same compiler. The differences may be due only to numerical drift. This occurs when different systems choose
different orders of execution, or different variables to store in processor registers, with the result that there are small differences
in rounding errors. These differences integrate and eventually affect the results. However, differences may also be due to compiler
bugs or to coding errors which behave differently in different environments.
The WinFPT run-time trace facility, and the library of support routines distributed with WinFPT, are used to analyse this issue.
Suppose that we wish to compare runs under two compilers, for example, gfortran and ifort. We want to know whether differences between
the runs are due to coding errors or just to numerical drift. The procedure is as follows:
The program is instrumented to capture a run-time trace.
It is built under ifort and run. A trace file is generated.
It is built under gfortran.
It would now be possible to run the program again and compare the two trace files. This is usually not
practical. The trace files drift apart because of numerical drift, and any differences due, for example, to coding errors are
hidden amongst the large number of differences due to drift. Instead:
- In the second run, under gfortran, the same subroutines which captured the trace of the first run read the trace file and compare
every value computed by gfortran with the value computed by ifort. If the values are the same, no action is taken. If the values
differ by more than a criterion amount, the difference is reported. The values computed in the second run are then overwritten
by the values from the first run. This prevents the accumulation of numerical drift so that the runs do not drift apart.
The run-time trace files record a unique index which identifies each trace routine call. These indices are used to detect the
situation where the two program runs follow different paths. If this occurs, the second run terminates at once, with a report of
the point at which the two runs diverge.
This technique has proved to be very powerful in detecting:
|
|
The detailed behaviour in the second, comparison, run may be refined by writing an optional configuration file.
This file specifies:
The critieria for comparing real numbers. Two criteria are specified, a relative criterion difference and an absolute criterion
difference. By default, the relative criterion difference is 1% and the absolute difference is 0.0001. A real number is reported
as different if the difference exceeds both criteria. The requirement for an absolute criterion difference prevents the report of spurious
differences when values are close to zero.
Whether integer and logical values are to be overwritten when differences are detected. Some programs use integers to store
file and database handles which are always different on different runs. If these are overwritten, the file or database handling
may fail.
The location of the trace file. These files may become very large and it may be necessary to store them on external devices.
|
|
|
|
|
Copyright ©1995 to 2015 Software Validation Ltd. All rights reserved. |
|
|