Asynchronous Open-Source DLX Processor (ASPIDA)
ASPIDA DLX Processor FAQ
The ASPIDA DLX Processor implements an asynchronous IP of the DLX Instruction Set Architecture (ISA) with incorporated support for ISA conversion so it can be easily converted to any RISC ISA. A design flow that is based on existing EDA tools for all design steps is used in order to produce a portable netlist and to distribute all the intermediate HDL files used for high-level and gate-level design. The final product is technology-independent and timing-independent and in a form suitable for integration using only standard, industrial tools and flows, with no dependence on asynchronous tools and specific knowledge of asynchronous design for potential end users. Both an FPGA and an ASIC versions have been finished and tested successfully.
Synchronous circuits typically consist of a set of combinational logic clouds seperated by registers. The combinational logic clouds perform predefined computations and pass the results to a neighbouring combinational logic cloud through the registers. The registers store the results of the computations preventing overwriting of the data. The registers are controlled by a global clock, which dictates when the registers are to accept a new value and store it.
An asynchronous circuit consists of a set of combinational logic clouds seperated by latches. Unlike synchronous circuits, the latches of the asynchronous circuits are not controlled by a global clock, but by a set of controllers. These controllers control the flow of the data in the circuit, by performing handshakes, using a communication protocol with their neighbouring controllers.
A handshake is a communication protocol which was originally devised for shared buses. The most commonly used handshake mechanisms are the two-phase and four-phase ones.
In the four-phase handshake the request and acknowledge signals return to zero to complete the handshake. Data change when the handshake is complete.
In the two-phase handshake the data change whenever both the request and the acknowledge change polarity.
A de-synchronized circuit is an asynchronous circuit which has been produced by a synchronous one after removing the global clock and replacing it with a set of controllers. The set of controllers performs the flow control locally, as it is done in asynchronous circuits.
FPGA stands for "Field Programmable Gate Array". The array can be programmed by a special software, in order to implement a circuit. It can be programmed in a lab, as the only requirements are the FPGA, the programming software and the circuit described in a hardware description language. They are reusable and a low cost solution for the implementation of digital circuits.
ASIC stands for "Application Specific Integrated Circuit". An ASIC, unlike FPGAs, cannot be implemented in a lab, but only in a fabrication facility. The fabrication procedure is complicated and expensive. Once manufactured, an ASIC cannot be altered. ASIC implementation is essential in high end implementations.
You only need the FPGA, the programming software and a computer. You also need to have described the circuit in a hardware description language. The two heavyweight FPGA manufacturers are Xilinx and Altera.
Hardware description languages are high level languages which describe the functionality of the circuit and how it is supposed to be implemented. There is a wide range of HDLs, the most commong being Verilog and VHDL.
You need to specify the circuit in a HDL, and then use special software in order to implement it. You have to follow the ASIC design flow to the end and then send the design to a fabrication facility.
A design flow consists of all of the steps that need to be followed for a design to be implemented. It typically consists of the description of the circuit in an HDL, the synthesis, the placement and the routing. After that, the design can be sent for fabrication.
DLX is a processor introduced by Hennesey and Patterson. It is widely used for educational purposes and appears in a range of textbooks. It has five pipeline stages and contains two basic building blocks, an ALU and a register file.
A pipeline is a way of making concurrent computations, which increases the speed of the overall execution, compared to a serial computation. In a pipeline, all computational logic blocks, which have something to do, make computations at the same time.
In the DLX pipeline, there are 5 stages: Instruction Fetch (IF), Instruction Decode (ID), Execution (EXE), Memory Access (MEM) and Register Writeback (WB). After the initial filling stages, all 5 stages are active executing 5 successive instructions at a time.
In a synchronous pipeline, the register stages also exchange their results at the same time, in predefined time slots. In an asynchronous circuit, they make agreements with their neighbouring combinational logic clouds, when to exchange results.
Our ASIC implementation of DLX, can run as fast as 77MHz both in the synchronous and the asynchronous implementation. Our FPGA implementation runs at 13.7 MHz.
Yes, you can download our implementation of DLX from our web site or our FTP site. Your are welcome to implement DLX on your FPGA, but for the ASIC version, you need to send it to a fabrication facility, which is not free of charge.
DLX is not commercially availabe. It is protected from the General Public License. It is freely available under this license.
There is a wide range of FPGAs, from very low end to very high end. The price of these devices is defined according to their capabilities. In our implementation, we used a very low cost FPGA, which is available for no more than $100. You can find a complete pricelist at www.digilentinc.com.
Both Xilinx and Altera offer free versions of their programming software, which include all the essential functionality for the implemetation of a design, even our DLX. If you want a full version, you need to contact the vendors for a pricelist.
You can contact Christos Sotiriou for more information regarding this project. Take a look at our Asynchronous group web site!
Sure, there is a C compiler, which provides DLX assembly. You just need to download this compiler from our FTP site and use the appropriate link libraries.
Yes, the ASIC version has been manufactured and tested. The tests have been a success and we now have a lot of fully working ASIC ASPIDA chips. Here's a photo (click it for a larger one). The two left-side boxes are the Instruction Cache and the Data Cache.
You can also see a photo during the chip testing. More photos are available here.
You have to follow a synthesis, place and route flow. The flow we used in our implementation is this one.
If you want to implement the synchronous version, download the source files here and follow the synthesis and place and route procedure of the Xilinx ISE tool. If you want to implement the de-synchronized version, download the source files here and follow the same synthesis and place and route procedure as above.
If you just want to simulate the placed and routed DLXs, download the Verilog and SDF files: de-synchronized or synchronous.
Finally, if you want to just download the DLXs to an FPGA, download the appropriate bitfile: de-synchronized or synchronous.
The de-synchronizing controllers that are used in the FPGA version of DLS are semi-decoupled controllers.
They consist of a "master" part and a "slave" part and the verilog description can be found here.
The complete de-synchronizing controller of DLX includes the "master" and "slave" parts along with a pulse delay and a matched delay. The pulse delay is just an inverter chain. A verilog description of the pulse delay can be found here. The "KEEP" constraints are essential, otherwise the ISE software will optimize out the inverters.
A verilog description of the matched delay can be found here. The synthesis attributes are essential so that the chain of AND-gates, which implement the assymetric matched delay, are not optimized out by the tool.
The complete controller verilog description can be found here.
Sure, you can download movies which show the DLX executing "the game of life" here. The DLX runs a program which fills memory locations with a bit-mapped matrix of the game screen. In the demo board, a VGA controller is displaying the memory matrix onto the monitor.
You can also see our lab setup.
More photos available here.
Created by the ASYNC group of the ICS-FORTH