Category Archives: Processors

YARDstick: custom processor development toolset

YARDstick is a design automation tool for custom processor development flows that focuces on the hard part: generating and evaluating application-specific hardware extensions. YARDstick is a powerful building block for ASIP (Application-Specific Instruction-set Processor) development, since it integrates application analysis, ultra-fast algorithms for custom instruction generation and selection with user-defined compiler intermediate representations. It integrates retargetable compiler features for the targeted IRs/architectures.

Features

Important features of YARDstick are as follows:

  • retargetable to used-defined IRs by machine description
  • can be targeted to low-level compiler IRs, assembly-level representations of
    virtual machines, or assembly code for existing processors
  • fully parameterized custom instruction generation and selection engine
  • code selector for multiple-input multiple-output patterns
  • virtual register assignment for virtual machine targets
  • an extensive set of backends including:
    • assembly code emitter
    • C backend
    • visualization backend for Graphviz (http://www.graphviz.org)
    • visualization backend for VCG (http://rw4.cs.uni-sb.de/~sander/html/gsvcg1.html) (or aiSee: http://www.absint.com/aisee/)
    • an XML format backend amenable to graph rewriting.

YARDstick comes along with a cross-platform GUI written in Tcl/Tk 8.5 (http://wiki.tcl.tk).

Aim

The ultimate goal of YARDstick is to liberate the designer’s development infrastructure from compiler and simulator idiosyncrasies. With YARDstick, the ASIP designer is empowered
with the freedom of specifying the target architecture of choice and adding new implementations of analyses and custom instruction generation/selection methods.

Typically, 2x to 15x speedups for benchmark applications (ANSI C optimized source code) can be fully automatically obtained by using YARDstick depending on the target architecture. Speedups are evaluated against a scalar RISC architecture.

Detailed look

  1. Analysis engines generating both static and dynamic statistics:
    1. Data types
    2. Operation-level statistics
    3. Basic block statistics (ranking)
    4. Performance estimations with/without custom instructions.
  2. Generation of CDFGs (Control-Data Flow Graphs).
  3. Backend engines:
    1. ANSI C
    2. dot (Graphviz)
    3. VCG (GDL, aiSee)
    4. XML (GGX for the AGG graph rewriting tool)
    5. Retargetable assembly emitter for entire translation units
    6. CDFG formats for various RTL synthesis tools
  4. Custom instruction engines:
    1. Full-parameterized MIMO custom instruction generation algorithm
    2. Fast heuristic
    3. Configurable number of inputs
    4. Configurable number of outputs
  5. Custom instruction selection:
    1. Based on priority metrics
  6. Graph (and graph-subgraph) isomorphism features for eliminating redundant patterns. Multiple algorithms supported.
  7. Visualization of custom instructions, basic blocks, control-flow graphs and control-data flow graphs (basic block nodes expanded to their constituent instructions).
  8. Basic retargetable compiler features:
    1. Code selector for MIMO instructions (tested with large cases).
    2. Virtual register assignment (allocation for a VM).
    3. Hard register allocator in the works.
  9. Miscellaneous features:
    1. single constant multiplication optimizer
    2. elimination of false data-dependences in assembly-level CDFGs.
    3. beautification options for visualization
    4. interfacing (co-operation) with external tools such as peephole optimizers, profilers, code generators etc.
    5. features related to the custom processor architecture named ByoRISC

Benchmarks

Here is a list of application benchmarks that have been tested with YARDstick:

  • ADPCM encoder and decoder (typically: 4x speedup)
  • Video processing kernels: full-search block-matching motion estimation,
    logarithmic search motion estimation, motion compensation
  • Image processing kernels: steganography (hide/uncover), edge detection, matrix
    multiplication
  • Cryptographic kernels: crc32, rc5, raiden (7x speedup, 12x for unrolled
    version)

At the YARDstick homepage (http://www.nkavvadias.com/yardstick/ ) you can find
some additional materials:

Supporting testimonial to the ArchC infrastructure (2004)

[I was an early PhD student at the time and was quite impressed with ArchC had to offer; I still am]

What ArchC is all about

ArchC is an architecture description language that exploits the already mature SystemC legacy. It targets the automated generation of a) functional and cycle-accurate instruction set simulators and at a later stage of b) tools from the entire application development toolchain for a custom processor (compiler, assembler, linker, debugger, etc).

Simulator generation is at a very good and usable state, however needs to be extended towards the system-level (bus generation ?). Toolset generation which naturally comes next, requires more hard effort and therefore the appropriate support of the open-source community. In my personal opinion, compiler generation is so hard that it can only be possible through funding. Other issues as automated bus generation are also very HARD and need support.

Strengths of ArchC

Important points regarding ArchC are:

  • Nice integration with SystemC. The ArchC user codes instruction decoding, hardware resources, etc in small ArchC files and the corresponding SystemC are automatically generated, which removes heavy burden from the user.
  • Unique approach to the design of a retargetable system call library. This makes possible to benchmark real-sized applications with realistic demands, i.e. many input arguments. Many of the other simulators can’t cope with that.
  • Ease of introducing resource elements, e.g. multi-banked memories.
  • Reuse possibilities for existing SystemC codes.
  • Models of 3 processors, 2 of them with complete retargeted GNU toolchains! Plus binaries of very popular benchmarks (MediaBench, [MediaBench-II is here], MiBench).

How has ArchC helped me in my research

  • Ease of designing variations of a base processor. Most of my papers require tradeoff analysis on alternative architectures.
  • My “drproc” model is a 5-pipeline stage RISC processor with specialized instructions for data-reuse transformations [work of Catthoor et al, IMEC]. I originally had coded an assembler (~few days) and the entire processor in plain SystemC (~month). When I found out about ArchC, it required only 3-4 8-hour days to deliver to my colleagues a much more robust cycle-accurate simulator!
  • I have ran reasonably sized applications (e.g. one sourceforge application, of 25K C lines with intensive use of pointers) with no problem on the R3000 and MIPS models of ArchC.

What needs to be done

NOTE: In the most part, the roadmap has been completed!

This work will be incomplete unless the “ArchC roadmap” is fulfilled. While I believe the compiler retargeting is a very hard issue, [addressing UNICAMP team] I am certain that your team is experienced in DSP compilation techniques (I have read some of the papers). Certainly, assembler generation, and bus wrapper generation (e.g. for OCP) plus some more models are awaited by many users, including myself.

Thank you www.archc.org for all the (free) fish!

METATOR – A look into processor synthesis

These last few months, I have been slowly moving back to my main interests, EDA tools (as a developer and as a user), FPGA application engineering, and last but not least processor design. After a 5-year hiatus I have started revamping (and modernizing) my own environment, developed as an outcome of my PhD work on application-specific instruction-set processors (ASIPs). The flow was based on SUIF/Machine-SUIF (compiler), SALTO (assembly-level transformations) and ArchC (architecture description language for producing binary tools and simulators). It was a highly-successful flow that allowed me (along with my custom instruction generator YARDstick) to explore configurations and extensions of processors within seconds or minutes.

I have been thinking about what’s next. We have tools to assist the designer (the processor design engineer per se) to speedup his/her development. Still, the processor must be designed explicitly. What would go beyond the state-of-the-art is not to have to design the golden model of the processor at all.

What I am proposing is an application-specific processor synthesis tool that goes beyond the state-of-the-art. A model generator for producing the high-level description of the processor, based only on application analysis and user-defined constraints. And for the fun of it, let’s codename it METATOR, because I tend to watch too much Supernatural these days, and METATOR (messenger) is a possible meaning for METATRON, an angelic being from the Apocrypha with a human past. So think of METATOR as an upgrade (spiritual or not) to the current status of both academic and commercial ASIP design tools.

The Context, the Problem and its Solution

ASIPs are tuned for cost-effective execution of targeted application sets. An ASIP design flow involves profiling, architecture exploration, generation and selection of functionalities and synthesis of the corresponding hardware while enabling the user taking certain decisions.

The state-of-the-art in ASIP synthesis includes commercial efforts from Synopsys which has accumulated three relevant portfolios: the ARC configurable processor cores, Processor Designer (previously LISATek) and the IP Designer nML-based tools (previously Target Compiler Technologies); ASIPmeister by ASIP Solutions (site down?), Lissom/CodAL by Codasip, and the academic TCE and NISC toolsets. Apologies if I have missed any other ASIP technology provider!

The key differentiation point of METATOR against existing approaches is that ASIP synthesis should not require the explicit definition of a processor model by a human developer. The solution implies the development of a novel scheme for the extraction of a common denominator architectural model from a given set of user applications (accounting for high-level constraints and requirements) that are intended to be executed on the generated processor by the means of graph similarity extraction. From this automatically generated model, an RTL description, verification IP and a programming toolchain would be produced as part of an automated targeting process, in like “meta-“: a generated model generating models!.

 

Conceptual ASIP Synthesis Flow

METATOR would accept as input the so-called algorithmic soup (narrow set of applications) and generate the ADL (Architecture Description Language) description of the processor. My first aim would be for ArchC but this could also expand to the dominant ADLs, LISA 2.0 and nML.

METATOR would rely upon HercuLeS high-level synthesis technology and the YARDstick profiling and custom instruction generation environment. In the past, YARDstick has been used for generating custom instructions (CIs) for ByoRISC (Build Your Own RISC) soft-core processors. ByoRISC is a configurable in-order RISC design, allowing the execution of multiple-input, multiple-output custom instructions and achieving higher performance than typical VLIW architectures. CIs for ByoRISC where generated by YARDstick, which purpose is to perform application analysis on targeted codes, identify application hotspots, extract custom instructions and evaluate their potential impact on code performance for ByoRISC.

                                                                                                                Conclusion

To sum this up, METATOR is a mind experiment in ASIP synthesis technology. It automatically generates a full-fledged processor and toolchain merely from its usage intent, expressed as indicative targeted application sets.