|
|
|
|
FPGAs are Field programmable Gate Arrays. They have a far higher density of
programmable gates than more conventional computing systems, but a lower clock
speed. They offer enormous potential for fine-grain parallel computing,
where the results of individual computations are almost immediately available
to other computing elements. This is quite unlike the coarse-grain parallel
computing used in modern supercomputers, where there are significant delays in
transferring data between different computing elements.
Multi-core PC chips are now widely available. A new generation of parallel
chips is under development where there may be as many as 64 parallel cores in a
closely coupled system. These devices are far less parallel than FPGAs but are
faster and may offer similar scope for fine-grain parallelism.
|
|
|
Algorithms for fine-grain parallel computing were developed by a member of
SimCon in 1983 for the
Applied Dynamics International AD10 simulation computer.
This machine was by far the fastest real-time simulation system available in the mid
1980s, and was used in the design of Space Shuttle, comunications satellites, nuclear
powerplant, helicopters and a range of military hardware. The algorithms were implemented
as a stand-alone optimiser and instruction scheduler. They are applicable
to Fortran or C programming for FPGAs and multicore systems. Research is in progress in
collaboration with Liverpool Hope University and The University of Cape Town to apply
the algorithms to these devices.
|
|
|
"Hydra", the first implementation of the parallel algorithmes for an FPGA is a pipeline of
processes made up of:
WinFPT: WinFPT replaces separate variable lives with different variables
(For a description of variables lives, please see the description of this issue in the
research on physical units and dimensions. Essentially,
variables which are assigned more than once are replaced by separate variables, each with only
a single assignment). WinFPT expands sub-programs in-line and unrolls DO loops. It cuts
long expressons into "triples" - expressions of the form a = b + c, a = b * c etc.;
WinFPT API: Code written in the WinFPT API outputs the triples in a format
appropriate for the parallel optimiser;
Hydra Optimiser: The Hydra optimiser, derived from the original system for the AD10,
schedules the instructions in an optimised parallel sequence;
Hydra Image Builder: The image builder constructs the load image for the FPGA.
The FPGA program consists of an interpreter which reads the load image and executes
groups of operations in parallel. The FPGA program is data driven. It is a single generic
code which will execute any program. Compilation of a Fortran code through the pipeline does
not require any further coding within the FPGA.
The Hydra pipeline is described in detail in the paper
"APPRASE: Automatic parallelisation of Fortran to run on an FPGA" presented at the SCSC 2008
conference in Edinburgh.
|
|
|
"Fynbos", is a successful working system. The FPGA code was developed by Jane Wyngaard at The University of Cape Town.
It is described in Wyngaard J, Inggs M, Collins J and Farrimond B, 2013,
"Towards a many-core architecture for HPC", 23rd International Conference on Field Programmable Logic and Applications (FPL).
The software pipeline is largely as described above, except that the image builder is now integrated into the optimiser code.
Experiments were carried out in the detailed architecture of the FPGA engine. In particular, division is expensive in FPGA
program space, and the number of divisions in a program tends to be small. Therefore division was implemented in only a sub-set of
the system cores.
|
|
|
FPGAs are most efficiently programmed in fixed point arithmetic. Real numbers
are represented by scaled fractions. Each real value, x, is assigned a maximum absolute
value, scale(x), and is represented internally by an integer which is calculated
as X = 2**(n-1) * x / scale(x) where n is the number of bits used in the scaled fraction.
Usually, n would be 32 bits, but in an FPGA this is not necessarily the case.
The AD10 was also a fixed-point machine. A system was developed to maintain
the coefficients and data structures required to support the scaled-fraction
arithmetic. For example, where:
a = b + c
and a, b and c have different scales,
a set of coefficients must be generated to correct the values so that the effective
calculation would be:
A = ( B + C * ( scale(c) / scale(b) ) ) * ( scale(b) / scale(a) )
where A, B and C are the scaled fractions
representing a, b and c.
This system will be applied in the FPGA programs.
|
|
|
|
|
Copyright ©1995 to 2015 Software Validation Ltd. All rights reserved. |
|
|