Category Archives: ASIC

The 5-minute introduction to FSMDs for practitioners

A design approach that is widely used by HLS (high-level synthesis) tools but is not really advertised loud and proud to HLS users is the Finite-State Machine with Datapath, aka FSMD. For instance, the Wikipedia entry on FSMDs is really sketchy. FSMDs are the primary approach for dealing with generic/control-flow dominated codes in an HLS context.

An FSMD is a microarchitectural paradigm for implementing non-programmable/ hardwired processors with their control unit and datapath combined/merged. In FSMDs, the datapath actions are embedded within the actual next state and output logic decoder of your FSM description. From an RTL (Register Transfer Level) abstraction point you can view an FSM as comprising of:

  • a current state logic process for updating state and register storage
  • a next state logic process for calculating the subsequent state to transition
  • an output logic process for producing the circuit’s outputs.

[NOTE: There is an excellent writeup on alternate FSM description styles in VHDL by Douglas J. Smith that you can consult; any recent XST manual provides good advice if targeting Xilinx FPGAs for casual RTL coding of FSMs.]

Let’s see FSMDs as considered by HercuLeS high-level synthesis (http://www.nkavvadias.com/hercules/); a manual for HercuLeS is here: http://www.nkavvadias.com/hercules-reference-manual/hercules-refman.pdf while a relevant book chapter can be downloaded from: http://cdn.intechweb.org/pdfs/29207.pdf if you want to go beyond these five minutes.

HercuLeS’ FSMDs are based on Prof. Gajski’s and Pong P. Chu’s work, mostly on some of their books and published papers. When I had started my work on HercuLeS, I had rented a couple of Gajski’s books from the local library and had actually bought two of P.P. Chu’s works; the RTL Hardware Design using VHDL book is highly relevant. Gajski’s work on SpecC and the classic TRs (technical reports such as Modeling Custom Hardware in VHDL) from his group were at some point night (by the bed) and day (by the desk) readings…

I believe Vivado HLS (aka AutoESL/xPilot) and the others do the same thing, following a very similar approach, with one key difference on how the actual RTL FSMD code is presented. Their datapath code is implemented with concurrent assignments and there are lots of control and status signals going in and out of the next state logic decoder. On the contrary I prefer datapath actions embedded within state decoding; produces a little slower and marginally larger hardware overall, but the user’s intention in the RTL is much more clear and it is to grasp and follow.

In an FSMD, the key notion is understanding how the _reg and _next signals work as they represent a register, i.e. its currently accessible value and the value that is going to be written into that register. Essentially _reg and _next is what you can see if probing the register’s output and input port at any time.

If following the basic principles from Pong P. Chu, every register is associated to a _reg and a _next signal. Some advice:

  1. Have a _reg and _next version for each register as declared signals in VHDL code.
  2. In each state, read all needed _reg signals and assign all needed _next ones.
  3. Donnot reassign the same _next version of a register within a single FSMD state.
  4. You can totally avoid variables in your code. Not all tools provide equally mature support for synthesizing code with variables.
  5. Operation chaining is possible but requires that you write _next versions and read them in the same state. Then these are plain wires and donnot implement registers. Again, you can’t peruse (for writing) the same _next version more than once in the same state.

At some point I had developed a technique for automatically modifying a VHDL FSMD code for adding controlled operation chaining. It just uses a lexer and to read more about it, see chapter III.E of http://www.nkavvadias.com/publications/kavvadias_asap12_cr.pdf.

If you have a deeper curiosity on HercuLeS, you can read http://www.nkavvadias.com/publications/hercules-pci13.pdf; a journal paper has been accepted for publication and will soon be available. I had to say it!

METATOR – A look into processor synthesis

These last few months, I have been slowly moving back to my main interests, EDA tools (as a developer and as a user), FPGA application engineering, and last but not least processor design. After a 5-year hiatus I have started revamping (and modernizing) my own environment, developed as an outcome of my PhD work on application-specific instruction-set processors (ASIPs). The flow was based on SUIF/Machine-SUIF (compiler), SALTO (assembly-level transformations) and ArchC (architecture description language for producing binary tools and simulators). It was a highly-successful flow that allowed me (along with my custom instruction generator YARDstick) to explore configurations and extensions of processors within seconds or minutes.

I have been thinking about what’s next. We have tools to assist the designer (the processor design engineer per se) to speedup his/her development. Still, the processor must be designed explicitly. What would go beyond the state-of-the-art is not to have to design the golden model of the processor at all.

What I am proposing is an application-specific processor synthesis tool that goes beyond the state-of-the-art. A model generator for producing the high-level description of the processor, based only on application analysis and user-defined constraints. And for the fun of it, let’s codename it METATOR, because I tend to watch too much Supernatural these days, and METATOR (messenger) is a possible meaning for METATRON, an angelic being from the Apocrypha with a human past. So think of METATOR as an upgrade (spiritual or not) to the current status of both academic and commercial ASIP design tools.

The Context, the Problem and its Solution

ASIPs are tuned for cost-effective execution of targeted application sets. An ASIP design flow involves profiling, architecture exploration, generation and selection of functionalities and synthesis of the corresponding hardware while enabling the user taking certain decisions.

The state-of-the-art in ASIP synthesis includes commercial efforts from Synopsys which has accumulated three relevant portfolios: the ARC configurable processor cores, Processor Designer (previously LISATek) and the IP Designer nML-based tools (previously Target Compiler Technologies); ASIPmeister by ASIP Solutions (site down?), Lissom/CodAL by Codasip, and the academic TCE and NISC toolsets. Apologies if I have missed any other ASIP technology provider!

The key differentiation point of METATOR against existing approaches is that ASIP synthesis should not require the explicit definition of a processor model by a human developer. The solution implies the development of a novel scheme for the extraction of a common denominator architectural model from a given set of user applications (accounting for high-level constraints and requirements) that are intended to be executed on the generated processor by the means of graph similarity extraction. From this automatically generated model, an RTL description, verification IP and a programming toolchain would be produced as part of an automated targeting process, in like “meta-“: a generated model generating models!.

 

Conceptual ASIP Synthesis Flow

METATOR would accept as input the so-called algorithmic soup (narrow set of applications) and generate the ADL (Architecture Description Language) description of the processor. My first aim would be for ArchC but this could also expand to the dominant ADLs, LISA 2.0 and nML.

METATOR would rely upon HercuLeS high-level synthesis technology and the YARDstick profiling and custom instruction generation environment. In the past, YARDstick has been used for generating custom instructions (CIs) for ByoRISC (Build Your Own RISC) soft-core processors. ByoRISC is a configurable in-order RISC design, allowing the execution of multiple-input, multiple-output custom instructions and achieving higher performance than typical VLIW architectures. CIs for ByoRISC where generated by YARDstick, which purpose is to perform application analysis on targeted codes, identify application hotspots, extract custom instructions and evaluate their potential impact on code performance for ByoRISC.

                                                                                                                Conclusion

To sum this up, METATOR is a mind experiment in ASIP synthesis technology. It automatically generates a full-fledged processor and toolchain merely from its usage intent, expressed as indicative targeted application sets.

A few words on HercuLeS high-level synthesis

HercuLeS is a new high-level synthesis tool marketed by Ajax Compilers (http://www.ajaxcompilers.com). HercuLeS has been in development since 2009 and it seems that now is the proper time to hit the market :) Full disclosure: I’m the main (read: sole) developer of HercuLeS.

A free evaluation of HercuLeS is available. You can grab it by sending me an email (see either ajaxcompilers.com or nkavvadias.com for contact details).

HercuLeS is based on the following flow: C-> GIMPLE -> N-Address Code -> VHDL.

HercuLeS is extensible in since frontends, analyses and optimization passes can be added by third parties. At this moment, HercuLeS is bundled with a number of external modules for analyses and optimizations at the C, NAC (N-Address Code, its textual IR), Graphviz, and VHDL levels. It supports vendor-independent code so generated HDL descriptions are synthesizable (in principle) to either FPGA or ASIC targets.

It should be noted that certain things are still missing from HercuLeS and there is ongoing work to support them in the future. This is inevitable since our resources are somewhat limited. For instance there is no Verilog backend yet.

We are looking to establish close communication with our users. Our users provide inspiration and their requests drive future development. Criticism is well-accepted at Ajax Compilers :)