Emulating Real Arithmetic

Emulating Real Arithmetic in Fortran

Version 1.0 25-Nov-17

This page describes the changes to an existing Fortran program which must be made in order to emulate the real and complex arithmetic. Except for a few special cases, all of these changes are made automatically by fpt.

Why Emulate Real Arithmetic?

There are (at least) two important applications of emulated arithmetic:

Experiments on precision and on real number formats.
Tracking the units and dimensions of variables as a program runs.

Precision: Most computer systems use IEEE 4-byte real numbers with 23 mantissa bits, and 8-byte real numbers with 52 mantissa bits. If the numbers and the arithmetic operators are emulated it is possible to change the numbers of mantissa bits and thus the precisions. Please see the page Testing Real Precision or the paper Testing the Numerical Precisions Required to Execute Real World Programs [1] for a detailed description.

Units and Dimensions: When the values of variables are computed in arithmetic expressions, the units and dimensions of the variables must conform. Thus, for example, if a is computed as a = b + c then a, b and c must be measured in the same units. In the expression x = y * z the units of x must be the units of y multiplied by the units of z. When numbers and arithmetic are emulated it is possible to attach a units attribute to every quantity and to track the units through program execution. The units may be checked for consistency, and the relationships between them may be found.

Why Fortran?

The Fortran language supports the two critical features required for emulation of arithmetic:

user-defined derived types;
the overloading of operators to operate on the derived types.

Fortran is used extensively in engineering design and simulation and in climatology. These applications make heavy use of of real variables. Validation of the consistency of units and dimensions may be very important.

What Version of Fortran?

The intention is that emulation of derived types may be used to analyse or validate existing programs. The tools and procedures described here should work on all variants of Fortran from extended FORTRAN 77 to at least Fortran 2008. These pages refer to language elements which include COMMON blocks, EQUIVALENCE statements, ENCODE and DECODE statements and constructs like STRUCTURE, MAP and UNION . Please do not infer that the author recommends any of these usages.

Emulating Real Arithmetic

The first step in emulating real arithmetic is the replacement of all REAL and COMPLEX declarations by declarations of derived types, and the overloading of all arithmetic and relational operators, and of all intrinsic functions, to operate on the derived types. In the studies described here the derived types and overloads are written in a Fortran module. A USE statement for the module is automatically inserted in every top-level compilation unit. Except for a few special cases noted below, all of the changes required for emulation are made automatically.

Note that the objects which must be changed to the emulated derived types may be scalar variables, arrays, functions, function interfaces, components of derived types and even, in legacy codes, fields of structures. The declarations may be written in many different ways. fpt carries out a complete static semantic analysis of a program and should handle all possible representations.

The Emulation Derived Types

The derived types used in emulation may have only a single component, of the same type and kind as the original real or complex number. This was sufficient, for example, in the study of real precision described here. In other applications the emulated types may require two or more components. In all cases in this study, the emulated types always contain one component which is a real or complex number of the same kind as the original type, with the name value (The precision of the number may be manipulated). This component may be used:

in I/O statements;
as an argument to external routines, particularly those written in other languages;
for inspection by a symbolic debugger.

The emulated types must be SEQUENCE derived types because the emulated variables may be in COMMON blocks.

Examples of the types are shown here.

The Overload Functions and Subroutines

A module function or subroutine must be created for each overload. There are a large number of these routines. For example, the overload of + requires routines for: unary + for each kind of REAL object and each kind of COMPLEX object; for addition of each kind of REAL object to each kind of REAL, COMPLEX and INTEGER object; and addition of each kind of COMPLEX object to each kind of REAL, COMPLEX and INTEGER object. Overloading functions are non-communative. Therefore, with 2 REAL kinds and 4 INTEGER kinds there are 96 functions for the + operator alone. The emulation module is described here.

Literal Values as Actual Arguments

The formal real and complex arguments to subroutines and functions (The arguments written in the sub-program code) will be changed to the corresponding emulation derived types. Variables passed to the subroutines and functions will also be changed, and no inconsistency will occur. Literal values (numbers) and arithmetic expressions passed as arguments must be changed. Literal numbers are therefore enclosed in type constructors, and expressions are enclosed in wrapper functions which return the value components of the expression results.

For example, if the original code contains the limiter function:

REAL*4 FUNCTION limiter(a,a_min,a_max) REAL*4 a,a_min,a_max limiter = MAX(a_min,MIN(a,a_max)) END FUNCTION limiter

This is converted to use the emulation types:

TYPE (em_real_k4) FUNCTION limiter(a,a_min,a_max) TYPE (em_real_k4)a,a_min,a_max limiter = MAX(a_min,MIN(a,a_max)) END FUNCTION limiter

When the function is used in the original code, for example, in the test program t_em:

! Test of literal passed to a function with real arguments a = 1.0 a_min = 6.1 a_max = 7.0 r(1) = limiter(1.0,a_min,a_max) r(2) = limiter(a,6.2,a_max) r(3) = limiter(a,a_min,7.0)

the literal arguments must be converted for emulation:

! Test of literal passed to a function with real arguments a = 1.0 a_min = 6.1 a_max = 7.0 r(1) = limiter(em_real_k4(1.0,23),a_min,a_max) r(2) = limiter(a,em_real_k4(6.2,24),a_max) r(3) = limiter(a,a_min,em_real_k4(7.0,25))

Converting All Literal Real and Complex Numbers

In some applications of emulation, most of the real and complex literal numbers in the original code may be left unchanged. The precision of the numbers, for example, may be manipulated when they are used in assignment or arithmetic statements in the subroutines and functions which overload the operators. In other cases, particularly in studies of units and dimensions, it is useful to change all of the real and complex numbers to objects of the emulation derived types. For example, if the same real number is added to the values held in two different variables, the two values in the variables must have the same units and dimensions.

The system which converts a program automatically for emulation therefore has a switch to convert all literal numbers.

If all literal numbers are converted, the number of overloads of assignment and of the arithmetic and relational operators is reduced. For example, we no longer need to provide an overload to add native real numbers to emulated real numbers. However, it is still necessary to overload addition of each kind of real number to every kind of integer.

Further Issues in Emulation

In most programs, and in some emulation applications, it is not sufficient simply to insert the USE statement for the emulation module and to change the declarations. Additional complications arise in:

Fortran parameter values and data specifications;
I/O statements;
TRANSFER functions;
Calls to external libraries;
Memory-mapped constructs (COMMON and EQUIVALENCE);
Initialisation of the emulation system.

Parameter Values and Data Specifications

Suppose the original code contains a statement:

REAL(KIND=kr8) :: c = 2.99792458D8

It is not sufficient simply to change this to:

TYPE(em_real_k8) :: c = 2.99792458D8

where em_real_k8 is our emulated REAL kind. The problem is that the overload mechanism calls a subroutine at run-time, and this subroutine is not available during compilation. The compiler has no rule to convert the REAL(KIND=kr8) value 2.99792458D8 to an object of the derived type em_real_k8.

The statement must be re-written with a type constructor for the value. This constructor must contain all of the components of the emulation type. Thus, if the emulation type has 2 components, the second of which is an integer which identifies the instance of the type, the statement might be written:

TYPE(em_real_k8) :: c = em_real_k8(2.99792458D8,173)

Note that if the value is an expression, the entire expression must be enclosed in the type constructor. If the expression contains other Fortran parameters, the value component of any parameter must be specified. Thus, the statements:

REAL(KIND=kr8) :: pi = 4*ATAN(1.0_kr8) REAL(KIND=kr8) :: tpi = 2*pi

are rewritten:

TYPE(em_real_k8) :: pi = em_real_k8(4*ATAN(1.0_kr8),174) TYPE(em_real_k8) :: tpi = em_real_k8(2*pi%value,175)

This change must be made to all REAL and COMPLEX literal values in PARAMETER and DATA statements and in data specifications in declaration statements. Note that these may be written in several different ways. The issue is described here.

I/O Statements: WRITE, PRINT, TYPE and ENCODE

Emulation Types with a Single Component

If each emulation type has only a single component which is a scalar of the same type and kind as the original real or complex object, the component will be written in exactly the same way as the original object. No change to the WRITE statement is required. This applies if the object is a scalar variable, an array or even an object of a derived type with a real or complex component. The values of the emulated objects will be written in exactly the same way as the original values.

The complications described below apply only to the situation where the emulation types have two or more components.

Scalar Variables

Suppose that the original code contains a formatted WRITE statement:

WRITE(8,'(1F8.2)')x

where x is a real scalar variable. In the emulation code x is of the emulation derived type. If the emulation type has two or more components, the WRITE statement will attempt to write all of the components. In a formatted write this will often fail because the format will not match. In a * format write or an unformatted write, additional data will be written. In either case, the I/O statement must be modified.

In the present study, the components of the emulated types which contain the original numeric value are named value. The formatted WRITE statement shown above is rewritten:

WRITE(8,'(1F8.2)')x%value

For multi-component emulation types similar changed must be made to all WRITE statements which write real or complex objects.

Arithmetic Expressions

WRITE statements may output the results of arithmetic expressions, for example:

WRITE(8,'(1F8.2)')x+y

where x and y are real variables in the original code. In the emulated code, the quantity x+y is an object of the emulated type. If this has only one component no problem occurs. If it has two or more components the WRITE statement fails as described above.

For correct execution of the WRITE statement it would be sufficient to rewrite the statement:

WRITE(8,'(1F8.2)')x%value+y%value

However, if emulation is used to track units and dimensions of variables this would lose the information that x is added to y and must therefore have the same units. The emulation module therefore contains a set of wrapper functions which return the value components of each REAL and COMPLEX emulated type. The statement is re-written:

WRITE(8,'(1F8.2)')em_real_k4_value(x+y)

Whole Arrays, Array Sections and Implied DO loops

WRITE statements may output entire arrays or array sections and may contain implied DO loops. For example:

WRITE(8,'(8F8.2)')xa(1:8) WRITE(8,'(8F8.2)')(xa(i),i=1,8)

These do not cause a problem. The above statements are rewritten:

WRITE(8,'(8F8.2)')xa(1:8)%value WRITE(8,'(8F8.2)')(xa(i)%value,i=1,8)

WRITE Sub-statements in IF Statements

If WRITE occurs as a sub-statement of an IF statement, for example:

IF (report_flag) WRITE(*,*)x1

where x1 is a real variable in the original code, it is handled in exactly the same way as in a WRITE statement.

Derived Types with REAL or COMPLEX Components

The original code may contain derived types with components which are real or complex. Entire objects of these derived types may (rarely) occur in I/O statements, for example:

TYPE labelled_real CHARACTER(LEN=32) :: label REAL(KIND=kr8) :: number END TYPE labelled_real TYPE (labelled_real) :: attitude(3) : : WRITE(8,'(3(A32,4X,E12.3))') (attitude(i),i=1,3)

At present, automatic conversion of the WRITE is not attempted in this situation and instances are reported.

PRINT, TYPE, Internal WRITE and ENCODE

These statements are handled in the same way as WRITE.

I/O Statements: READ, ACCEPT and DECODE

Emulation Types with a Single Component

If each emulation type has only a single component which is a scalar of the same type and kind as the original real or complex object, the component will be read with the same rules and behaviour as the original object. No change to the READ statement is required. This applies if the object is a scalar variable, an array of even an object of a derived type with a real or complex component. The values of the emulated objects will be read in exactly the same way as the original values.

The changes to the I/O statements described below are required only when the emulation types have two or more components.

Reading Scalar Objects

If the original code contains a READ statement, for example:

READ (8,*) x

where x is a scalar real variable, x will become an object of the corresponding emulation real derived type. The read statement must be changed to read the value component of x:

READ (8,*) x%value

However, this change alone is not sufficient. The problem is that the second and subsequent components of the derived type must be initialised. It is possible to specify an initialisation value for a type component in the derived type declaration. For example:

TYPE em_real_k4 SEQUENCE REAL (KIND=kr4) :: value INTEGER (KIND=ki4) :: i_precision = 0 END TYPE em_real_k4

where the component i_precision is a handle which indicates, in this case, the precision of the emulated type, and the value 0 specifies a default state. This is the strategy used by Dawson and Düben 2016 [2] in studies of reduced precision. The problem with this strategy is that it is illegal to specify a default initialiser for an object in a COMMON block. A large proportion of the codes we hope to study use COMMON blocks. Therefore the strategy is not used here. Instead, the original read statement is rewritten:

READ (8,*) x%value CALL em_init_r_k4(x,176)

The subroutine em_init_r_k4 is one of a set of initialisation subroutines, one for each emulated data type and kind, which initialises the second and subsequent components of the derived type. The second argument is a unique integer identifier which indicates where the initialisation has occurred.

Whole Arrays, Array Sections, and Implied DO Loops

The derived type initialisation subroutines are elemental. They may therefore operate on whole arrays and array sections.

If a READ statement contains implied DO loops DO-ENDDO constructs are written to initialise any variables of the emulation derived types within them.

Thus, for example, the original code:

REAL(8) :: ya(2,3,2) : : READ(8,*) ya : READ(8,*) ya(:,1:2,2) : READ(8,*) (((ya(i,j,k),i=1,2),j=1,3),k=1,2)

is re-written:

TYPE (em_real_k8) :: ya(2,3,2) : : READ(8,*) ya%value CALL em_init_r_k8(ya,177) : READ(8,*) ya(:,1:2,2)%value CALL em_init_r_k8(ya(:,1:2,2),178) : READ(8,*) (((ya(i,j,k)%value,i=1,2),j=1,3),k=1,2) DO k = 1,2 DO j = 1,3 DO i = 1,2 CALL em_init_r_k8(ya(i,j,k),179) ENDDO ENDDO ENDDO

READ Sub-statements in IF Statements

If READ occurs as a sub-statement of an IF statement, for example:

IF (init_flag) READ(8,*)x1

where x1 is a real variable in the original code, it is not possible to write the emulation derived type initialisation statement immediately after the IF statement because the READ sub-statement may not be executed.

The code is rewritten:

IF (init_flag) THEN READ(8,*)x1%value CALL em_init_r_k4(x1,180) ENDIF

If the original IF statement is labelled:

If the label is unused or is the target of a GOTO the label remains on the IF statement.
If the label is the end of a DO loop it is moved to a CONTINUE statement after the ENDIF.
If the label is both the target of a GOTO and the end of a DO loop (And, yes, we have seen this!) a new label is created for the DO loop.

Alternate Return Labels in READ Statements

READ statements may contain END= and ERR= alternate return destinations. In these cases it is assumed that no data are read when the alternate return is executed. The alternate return mechanism jumps around the emulation derived type initialisation statement(s) without any further modification of the code.

Derived Types with REAL or COMPLEX Components

As in the case of WRITE statements, instances where entire derived types in the original code occur in READ statements are reported but are not yet converted.

TRANSFER Intrinsic Functions

The TRANSFER intrinsic function may operate on a range of different data types and kinds, and it is not easy to create a system to overload it. Instead, when the TRANSFER operates on real or complex objects in the original code:

If the emulation types each contain only a single component or the original types and kinds no change is required to TRANSFER functions.

If the emulation types contain multiple components:

If the source or mold argument to a TRANSFER function in the original code is real or complex, the argument is changed to use the value component of the emulation type.
If a real or complex variable receives the output of a TRANSFER function, the variable is treated in the same way as the input variable in a READ statement. The assignment statement is followed by an initialisation statement for the variable of the emulated derived type.

For example, the original code:

REAL(4) :: xa(256) INTEGER(4) :: buffer(256) : : buffer = TRANSFER(xa,buffer) : : xa = TRANSFER(buffer,xa)

becomes:

TYPE (em_real_k4) :: xa(256) INTEGER(4) :: buffer(256) : : buffer = TRANSFER(xa%value,buffer) : : xa%value = TRANSFER(buffer,xa%value) CALL em_init_r_k4(xa,181)

Calls to External Libraries

Any routine which is not processed by fpt to change the real and complex declarations is considered to be external. The formal arguments (the arguments in the sub-program code) of these routines are not converted to the emulation types, and the actual arguments must match the formal arguments.

Emulation Types with a Single Component

If the emulation types have only a single component then no modification to the call would be required for the data to be passed correctly. Variables of the emulation types resolve to objects of the original types and kinds which could be read or written to by the external routines. However, this will fail in two circumstances:

If an INTERFACE block for the external routine is visible to the caller, then the INTERFACE block must specify the arguments either as of the emulation types or as the original real or complex types. In the first case, all literal numbers in arguments must be converted to emulation types, and in the second case, all variables must be changed to specify the value component.
Some debugging environments will trap the type mis-match between an emulation type in an actual argument and a real or complex type in a formal argument at run-time (FTN95 is expected to do this).

For these reasons, single and multi-component emulation types are handled in the same way when data are passed to external routines.

INTERFACE Blocks

If the interface to the external routine is specified in an INTERFACE block visible to the caller, the data type declarations in the INTERFACE block should match the formal arguments of the external routines, and the value components passed as the actual arguments. Therefore, the declarations of real and complex arguments in INTERFACE blocks for external routines are not changed to emulation types.

The INTENT of the Arguments

Input, output and input-output arguments to the external routines must be handled in different ways. fpt will find the intent of the arguments from any INTERFACE block anywhere in the code which is processed. If no intent is specified for an argument, the argument is assumed to be INTENT(INOUT) and this may lead to inefficiency and loss of information.

Input Arguments

If an object of an emulation type is passed as an input argument to an external routine, only the value component is passed. If an expression is passed, the entire expression is enclosed in the appropriate wrapper routine which returns the value of the emulation type. Thus, for example, if xa is a real array, x and y are real variables and network_pack is an external routine with a real input argument, the original code:

CALL network_pack(xa,512) CALL network_pack(x+y,1)

becomes:

CALL network_pack(xa%value,512) CALL network_pack(em_real_k4_value(x+y),1)

Output and Input-Output Arguments

If a real or complex object is passed as an argument to an external routine, the code is changed for emulation to pass the value component, and a call to the initialisation routine for the emulated type is written after the statement. As with I/O and TRANSFER statements, IF statements are converted to IF-THEN-ENDIF constructs and labels are moved or changed if necessary. Thus, if xa is a real, kind=4 array and network_unpack has a real, kind=4 output argument, the code:

CALL network_unpack(xa,512) : IF (unpack_flag) CALL network_unpack(xa,512) : DO 200,i=1,4 200 CALL network_unpack(xa(1+128*(i-1):128*i),128)

becomes:

CALL network_unpack(xa%value,512) CALL em_init_r_k4(xa,182) : IF (unpack_flag) THEN CALL network_unpack(xa%value,512) CALL em_init_r_k4(xa,183) ENDIF : DO 200,i=1,4 CALL network_unpack(xa(1+128*(i-1):128*i)%value,128) CALL em_init_r_k4(xa(1+128*(i-1):128*i),184) 200 CONTINUE

Arguments of Unknown Intent

If the intent of an argument is unknown, the intent may be assumed to be input-output. This could result in loss of information, particularly when tracking units and dimensions. A proposed scheme (Not yet implemented) to deal with this case is:

The scalar value component, or an exclusive OR checksum of the value components of an array, is stored before calling the external routine.
The value or checksum is compared after the call.
The initialisation routine is called only if the value or checksum has changed.

External Functions

External functions which return real or complex values in the original code are enclosed in type constructors to convert the returns to emulation types. The type declarations of these functions are not changed to declarations of the emulation types. The original real or complex declarations (or implicit typing) is retained.

External functions may return objects of derived types which contain real or complex components. At present, no attempt is made to change the code automatically, and this situation is reported.

Memory-mapped constructs, COMMON and EQUIVALENCE

EQUIVALENCE, and the implied EQUIVALENCE which occurs when different variables and arrays are placed at the same addresses in COMMON blocks should not cause a problem if:

The emulation types have only a single component.
The objects at the same addresses are of the same data type and kind.
Complex variables or arrays are equivalenced to real variables or arrays of the same kind.

No changes to the code are required in these situations.

In all other cases where real or complex variables or arrays are equivalenced to objects of different data types, emulation by multi-component derived types will require the code to be changed. This has not been attempted automatically and the situation is reported.

Initialisation of the Emulation System

Optionally, a call to an initialisation subroutine may be inserted as the first executable statement in each main program in the emulation project (Usually there is only one, but multi-program multiprocessor systems are supported). The initialisation is specific to the emulation application. For example, in the analysis of numerical precision described here, the initialisation routine read the numbers of mantissa bits for the emulation types from file.

The fpt Commands to Support Emulation

EMULATE REAL ARITHMETIC

This command makes the changes described above. The behaviour may be modified as described below.

INITIALISE EMULATED ARITHMETIC

By default, fpt does not insert the initialisation routine for emulation at the start of the main program. This command makes the insertion. The statement inserted is:

CALL initialise_emulated_arithmetic

Insertion may be suppressed by the command DO NOT INITIALISE EMULATED ARITHMETIC The keyword INITIALISE may be spelled INITIALIZE.

EMULATE REALS TO INITIALISE ON INPUT

By default, fpt will initialise emulated objects on input if the emulation types have two or more components. It will not initialise them if there is only a single component. This command may be used to change this behaviour. The command EMULATE REALS NOT TO INITIALISE ON INPUT suppresses the behaviour. The keyword INITIALISE may be spelled INITIALIZE.

EMULATE ALL REAL LITERAL NUMBERS

By default, fpt will not change literal real and complex values in the code unless they occur as arguments to sub-programs which have been converted to accept the emulation derived types. This command converts all literal real and complex numbers in the code to the corresponding emulation types. It is expected to be used in studies of units and dimensions.

CHECK EQUIVALENCE

This command may be used to check for mixed EQUIVALENCE which will interfere with the emulation process.

References

[1] Collins J. 2017, "Testing the Numerical Precisions Required to Execute Real World Programs", IJSEA Vol 8, No 2, March 2017.

[2] Dawson, A. and Düben, P. D, 2016, "rpe v5: An emulator for reduced floating-point precision in large numerical simulations", Geoscientific Model Development Discussions, November 2016, doi: 10.5194/gmd-2016-247.