|
Asynchronous Circuit and System Design Group
|
Asynchronous Open-Source DLX Processor (ASPIDA)
Synthesis Procedure Summary
|
|
General information
This page summarizes the options that we used for the synthesis of the
asynchronous DLX. The target device is a Spartan IIE and more
specifically the xcs2s200e
device. We restricted the maximum fanout of the nets to 100 and allowed
register duplication. The optimization effort targets area and the
design is flattened, which means that we do not preserve the hierarchy
of the design.
Asynchronous DLX
The approach of the implementation of a de-synchronized circuit is
different
than the approach used in a synchronous circuit. The absence of a clock
and the presence of special asynchronous elements in the circuit
introduces a number of difficulties that need to be dealt with.
Delay elements are necessary in a de-synchronized circuit in order to
match the delay of combinational logic present between a pair of
latches. In the timed de-synchronization model a latch should only be
enabled after data have arrived at its inputs, thus delay elements
matching the delay of combinational logic blocks are used to enforce
correct timing, as shown in the figure below.
As data propagate only during the valid phase of the handshake between
controllers and not during the return-to-zero phase, assymetric delay
elements are used. The figure below shows a possible implementation of
an assymetric delay element.
DLX Controllers
Figure below shows the implementation of
the DLX controllers. The controllers below coresponds to an
implementation of the semi-decoupled four-phase handshake protocol.
Figure below depicts an implementation,
with static CMOS gates, of a pair of controllers (even and odd) for a
fragment of data-path. The figure also shows the marked graphs modeling
the behavior of each controller. The only difference is the initial
marking, that determines the reset logic (signal RST).
Resetting the
controllers is crucial for a correct behavior. In this case, the even
latches are transparent and the odd latches opaque in the initial
state. With this strategy, only the odd latches must be reset in the
data-path. The implementation also assumes a relative timing constraint
(arc Ro- ---> Ri+) that can be easily met with the actual design.
The controllers also include a delay that must be
matched with the delay of the combinational logic incremented by the
pulse width
of the latch control signal. The implementation of the symmetric
(pulse) and the asymmetric
(matched) delay elements of the controllers for the DLX is shown in the
figure below. The pulse delay consist of a chain of an even number of
inverters, and the matched delay of a chain of AND gates, where the
longest path of the matched delay is equal to the corresponding delay
of the critical path of the combinational logic.
DLX Synthesis
The main problem was to instruct the tool not to optimize out the chain
of AND gates and the chain of inverters that the tool thinks is
redundant logic.
This problem was finally solved by adding ``KEEP'' constraints to the
wires
that connect the gates of the delay element. This constraint has the
effect
that the wires are kept through the mapping despite their redundancy
and
the gates that the wires originate from are kept too. One limitation of
this
constraint is that in the case that the inputs of a two-input gate are
the
same wire, a ``KEEP'' constraint conflict is produced. The only way to
get
around this conflict is to convince the tool that the two inputs are
not
exactly the same wire. This has been achieved by adding two inverters
before
one of the two inputs. ``KEEP'' constraints have to be added to the
wires
that connect the extra inverters otherwise they will be optimized out.
More information on the constraints in Xilinx devices, refer to the Xilinx
software manuals.
Our asynchronous DLX has five pipeline stages, which are separated by
flip-flops. Asynchronous controllers provide the flip-flops with the
appropriate signals so that the data move safely from one pipeline
stage to the other. In order for these signals to arrive at the
flip-flops at approximately the same moment, we used low skew lines.
However, this approach proved to be insufficient for the Instruction
Decode and the Execute stage, which happen to be the two stages with
the most combinatorial logic. The signals from the controllers had to
be reinforced by buffers.
The datapath optimization was performed by optimizing each pipeline
stage independently. Two timing groups have been created for each pair
of
stages of the pipeline. The first group contains all the flip-flops of
the
pipeline stage that the signals to be analyzed begin from. The second
group
contains all the flip-flops of the next pipeline stage. After applying
``FROM-TO''
constraints to the generated groups, a timing analysis is performed to
the
path that the groups form. The results of the static timing analysis
represent
the delay of the datapath. This delay shows the minimum delay that the
delay elements must have in order for the circuit to operate correctly.
It is necessary to emphasize that the delay elements were excluded from
the optimization of the datapath for two reasons. The first reason is
that the delay elements do not belong to the datapath and the second
reason is that we are not interested in minimizing the delay of a delay
element, but rather bounding it within predefined limits. The synthesis
directive for the exclusion of the delay elements from the timing
optimization is the ``TIG'' constraint (a.k.a. Timing IGnore).
VGA Controller
The clock of the VGA controller has half the frequency of the clock
that the board provides. The board has a clock of 20ns (50 MHz) and the
VGA controller needs a clock of 40ns (25MHz). In order to create the
appropriate frequency for the VGA controller, we used a clock divider
that has as input the clock that the board provides and outputs the
clock for the VGA controller. The clock divider is configured to divide
the clock frequency by a factor of 2. For more information on the use
of clock dividers in Xilinx devices, refer to the Xilinx
software manuals.
The clock signal of the VGA controller has been configured as a low
skew line. No other synthesis attributes have been applied to the VGA
controller.
|
|