I’ve spent these last couple of days to perform head-to-head comparisons of Xilinx Vivado HLS against HercuLeS on HLS-generated digital circuits (from input C code).
I believe that HercuLeS lived up to the challenge; it is competitive to Vivado HLS. The reader should take account that:
- Both tools have been used (almost) out-of-the-box. Vivado HLS was configured with no bufg inclusion, and in “out_of_context” mode. These mean that no clock buffers and I/O pins were routed.
- HercuLeS does not (yet) customize the generated HDL in order to fit better specific architectural features (DSP blocks, embedded SRL units).
- Vivado HLS had some TOTAL FAILURES on some relatively simple codes such as a simple perfect number detector (positive integers equal to the sum of their divisors), a 1D wavelet code, and easter date calculation. It seems that Vivado HLS experiences some hard time with integer modulo/remainder. Codes are provided to anyone interested.
The following table provides a summary of the results:
Vivado HLS (VHLS) | HercuLeS | Comment | |||||||
Benchmark | Description | LUTs | Regs | TET (ns) | LUTs | Regs | TET (ns) | ||
1 | arraysum | Array sum | 102 | 132 | 26.5 | 103 | 63 | 73.3 | |
2 | bitrev | Bit reversal | 67 | 39 | 72.0 | 42 | 40 | 11.6 | |
3 | edgedet | Edge detection | 246 | 130 | 1636.3 | 680 | 361 | 1606.4 | 1 BRAM for VHLS |
4 | fibo | Fibonacci series | 138 | 131 | 60.2 | 137 | 197 | 102.7 | |
5 | fir | FIR filter | 102 | 52 | 833.4 | 217 | 140 | 2729.4 | |
6 | gcd | Greatest common divisor | 210 | 98 | 35.2 | 128 | 93 | 75.9 | |
7 | icbrt | Cubic root approximation | 239 | 207 | 260.6 | 365 | 201 | 400.5 | |
8 | popcount | Population count | 45 | 65 | 19.4 | 53 | 102 | 26.1 | |
9 | sieve | Prime sieve of Eratosthenes | 525 | 595 | 6108.4 | 565 | 523 | 3869.5 | 1 BRAM for VHLS |
10 | sierpinski | Sierpinski triangle | 88 | 163 | 11326.5 | 230 | 200 | 16224.9 |
NOTES:
- Measurements where obtained for the KC705 development board device: xc7k325t-ffg900-2
- TET is Total Execution Time in ns.
- VHLS is a shortened form for Vivado HLS.
- Vivado HLS 2013.1 was used.
- Bold denotes smaller area and lower execution time.
- Italic denotes an inconclusive comparison.
- For the cases of edgedet and sieve, VHLS identifies a BRAM; HercuLeS does not. In these cases, HercuLeS saves a BRAM while VHLS saves on LUTs and FFs (Registers).
Overall, there are about 30% wins for HercuLeS and ~70% wins for Vivado HLS. Not too bad for a tool like HercuLeS; producing generic, portable, vendor-independent code. I estimate that HercuLeS development effort is around 1-5% to Vivado HLS.
I believe that HercuLeS will do much better in the out-of-the-box experience (which is of high importance in order to draw more software-minded engineers in the game) in the near future.
Both HercuLeS and Vivado HLS have optimization features (e.g. loop unrolling). HercuLeS applies optimizations by using a source-to-source C code optimizer. Vivado HLS mostly resorts to end-user directives. These coding aspects will be taken into account in a followup comparison; they also yield a much more extensive solution space.
A nice simple evaluation of these hls tools. I am a phd student currently studying debug for hls. Thus, the Vivado HLS failures you reported are interesting to me. At what point in the flow did Vivado HLS fail? If you still have it, I am interested in looking at the source for these benchmarks. Thanks.
Dear Mr. Monson,
thank you for your comment. I digged and found one of the benchmarks.
Even for Vivado HLS 2013.2 (latest that I have), SystemC-VHDL cosimulation does not work.
Please find the benchmark’s code as below; use -DTEST to compile:
/*
* Filename: perfect.c
* Purpose : C implementation of a naive algorithm for detecting perfect
* numbers. A perfect (positive integer) number is equal to the sum of
* its divisors. The first members of this sequence are:
* 6, 28, 496, 8128.
* Author : Nikolaos Kavvadias (C) 2010, 2011, 2012, 2013, 2014
* Date : 17-Apr-2010
* Revision: 0.3.0 (17/04/10)
* Initial version.
*/
#ifdef TEST
#include
#include
#endif
void perfect(unsigned int value, unsigned int *isperfect)
{
unsigned int factorsum = 1, i;
for (i = 2; i < = value/2; i++) { if (value % i == 0) { factorsum += i; } } if (factorsum == value) { *isperfect = 1; } else { *isperfect = 0; } } #ifdef TEST int main(void) { FILE *fp; unsigned int i; unsigned int result; fp = fopen("out.dat", "w"); for (i = 2; i <= 65535; i++) { perfect(i, &result); if (i <= 6 || i == 28 || i == 496 || i == 8128) { fprintf(fp, "%08x %08x\n", i, result); } } fclose(fp); printf ("Comparing against reference data \n"); if (system("diff -w out.dat perfect_test_data.txt")) { fprintf(stdout, "*******************************************\n"); fprintf(stdout, "FAIL: Output DOES NOT match the golden output\n"); fprintf(stdout, "*******************************************\n"); return 1; } else { fprintf(stdout, "*******************************************\n"); fprintf(stdout, "PASS: The output matches the golden output!\n"); fprintf(stdout, "*******************************************\n"); return 0; } } #endif
Pingback: Vivado HLS vs HercuLeS (Kintex-7 and VDS 2013.2 update) | EDA stuff