Title | aprof (ALMA profiler) |
Author | Nikolaos Kavvadias |
Contact | nkavv@uop.gr |
Website | http://www.nkavvadias.com |
Release Date | 06 May 2013 |
Version | 0.4.0 |
Rev. history | |
v0.1.0 | 31-07-2012 Draft/preliminary binary release of nac2c, the compiled simulator of aprof. |
v0.2.0 | 31-08-2012 Source release for the 1st increment of nac2c. |
v0.3.0 | 30-11-2012 Binary release for the 1st draft release of aprof. nac2c is now considered a component of aprof. |
v0.4.0 | 06-05-2013 Added tutorial section in README. |
"aprof" (ALMA profiler) is a performance and resource utilization estimation tool. For obtaining these measures, "aprof" implements an abstract machine with unlimited resources. It accepts input specification in either the NAC (N-Address Code) intermediate representation or ALMA IR (ANSI C) form. "aprof" produces two basic outcomes, a) the number of dynamic abstract machine cycles and b) basic block operation schedule that indicates resource utilization for a given application.
"aprof" consists of the following components:
The current NAC specification is detailed in the corresponding reference manual found in the /doc subdirectory in HTML and PDF form.
aprof releases use the aprof-[src|lin|win]-yymmdd.tar.bz2 naming convention.
For using aprof, a Linux or Windows installation is required. For Windows, Cygwin is suggested (optional) in order to significantly ease the use of aprof.
In any case, standard Unix/Linux tools are expected:
Boehm's garbage collector is also required, but is included both in source and compiled form (binary releases only) within the /thirdparty subdirectory.
For Windows:
Cygwin will then be setup in the C:\cygwin directory of your Windows OS.
For Linux:
There is no actual installation procedure; the user should just unzip the aprof-[lin|win]-yymmdd.tar.bz2 binary release archive to a local directory. Usual choices include C:/cygwin/home/user for Windows (no Cygwin) users and /home/user for Windows Cygwin/Linux users where user is the name of the current user.
Then, change directory to /home/user/aprof. On Cygwin for instance, type:
Set up the APROFTOP environmental variable:
The location of the garbage collector is adjusted accordingly in the corresponding makefiles.
You may add the /aprof/bin directory to your path:
This subsection is relevant only to the source releases of aprof (aprof-src-yymmdd.tar.bz2). To build aprof from sources the following are required:
The aprof distribution includes the following files. Files and/or directories denoted by a capital S are available in source releases of aprof. Similarly, a capital B denotes files/directories present solely in binary releases:
/aprof | Top-level directory |
COPYRIGHT | aprof (binary or source code) license. |
S build.sh S build-a.sh S build-lin.sh S build-lin-a.sh S clean.sh |
Build script for aprof (Windows). Build script for aprof and gc (Windows). Build script for aprof (Linux). Build script for aprof and gc (Linux). Cleans up the /bin and /src subdirectories. |
env.sh | Script to setup the environment. |
B /aprof/bin | Binaries' directory |
fixnac.exe meascycles.exe nac2c.exe nacbbinscount.exe nacparser.exe nactoglobal.exe cygwin1.dll |
fixnac executable for either Windows or Linux. meascycles executable for either Windows or Linux. nac2c executable for either Windows or Linux. nacinsbbcount exec. for either Windows or Linux. nacparser executable for either Windows or Linux. nactoglobal executable for either Windows or Linux. Cygwin API DLL (not required with a Cygwin setup). |
/aprof/doc | Documentation |
README README.html README.pdf nac-refman.txt nac-refman.html nac-refman.pdf |
This file. HTML version of README. PDF version of README. Reference manual for the NAC programming language. HTML version of the above. PDF version of the above. |
S /aprof/src | Main source directory |
/aprof/src/instrument | "instrument" directory |
Makefile build.sh fixnac.c nac.Grm nacinsbbcount.txl nacparser.txl nactoglobal.txl |
Makefile for Windows Cygwin and Linux. Bash script for building the TXL applications. Applies additional fixes to an instrumented NAC file. TXL grammar for NAC. Inserts basic block counters in NAC programs. NAC parser and pretty-printer. Moves all declarations to the earliest possible site. |
/aprof/src/libnac | "libnac" directory |
Makefile Makefile.linux attrgraph.[c|h] cdfa.[c|h] cga.[c|h] datastructs.h emit.[c|h] genansic.[c|h] genmacros.h graph.[c|h] item.[c|h] list.[c|h] machine.[c|h] lexer.patch nac.[c|h] nac.[l|y] sched.[c|h] symtab.[c|h] utils.[c|h] |
Makefile for Windows Cygwin. Makefile for Linux. Attributed graphs API. Control and data flow analyses API (includes SSA). Call graph API (mainly SSA). Basic data structures and enums. Emitters for graph representations. ANSI C code generation routines. General purpose C macros. Graph manipulation API CDFG (Control-Data Flow Graph) items API. Doubly-linked list and iterators API. Machine paramteters for the NAC abstract machine. Patch for the NAC lexer (lex.nac.c). NAC (N-Address Code) manipulation API. Lexer and parser for the NAC programming language. Scheduling (naive, ASAP) API. Symbol table API. Various utility functions. |
/aprof/src/nac2c | "nac2c" directory |
Makefile Makefile.linux nac2c.c |
Makefile for Windows Cygwin. Makefile for Linux. Driver code and option parsing for nac2c. |
/aprof/src/prof | "prof" directory |
Makefile build.sh countbbs.awk meascycles.c |
Makefile for Windows Cygwin and Linux. Bash script for building the TXL applications. Counts the number of BBs in a NAC translation unit. Counts the number of abstract machine cycles spent. |
/aprof/tests | Test suite directory |
*.0.nac | The aprof test suite. Includes 30 applications, each in the corresponding subdirectory: (binarysearch, bitrev, bubblesort, cordic, divider, editdist, fact, factr, fibo, fibor, fir, fixsqrt, frac, gcd, knapsack, loop1, mandel, matmult, minimal, mips, multiply, perfect, popcount, sieve, smithwaterman, sobel, tak, thornapprox, xorshift, yuv2rgba). |
*.c | Reference C implementation for test suite, used for generating reference data. |
clean-tests.sh run-aprof.sh run-aprof-app.sh |
Clean the debris in all /tests subdirectories. Run the entire test suite. Run a single application from test suite. |
thorn.pgm | PGM image required for running the thornapprox benchmark. |
/aprof/thirdparty | Third-party source/binaries directory |
B /gc B /gc-linux B /gc-mingw |
Garbage collector binaries for Windows Cygwin. Garbage collector binaries for Linux. Garbage collector binaries for Windows MingW. |
/src | Source code versions of the garbage collector. |
The basic usage of nac2c follows the syntax:
The translated C representation of input.nac is produced in a series of output files called input<i>_nac.c, separately for each NAC-level procedure, where input<i> is the name of the corresponding procedure. Pre-existing files are overwritten.
options` is one or more of the following:
The basic usage of fixnac follows the syntax:
Additional fixes are applied to the instrumented input.nac such as the additon of the declaration of the globalvar BB array for storing BB execution frequencies.
options` is one or more of the following:
The basic usage of meascycles follows the syntax:
It reads the input.nac which is assumed to be uninstrumented, the input_prof.txt profiling report file and the corresponding input_sched.txt scheduling data file. Then it reports the total number of dynamic abstract machine cycles in the following form:
as a C-based long long int (64-bit signed integer).
Executables generated by TXL passes source files share a common invocation style:
This scheme applies for executables nacbbinscounters, nacparser and nactoglobal.
This AWK script generates a textual report named bbs.txt that stores the total number of basic blocks in the given NAC translation unit. countbbs is invoked as follows:
The basic tests under the /tests subdirectory can be exercised by running corresponding test script:
Alternatively, each application can be tested separately using the run-aprof-app.sh script, e.g. as follows for the case of the fibo benchmark:
By running a benchmark, the following files can be generated, if using the appropriate options, assumably for a benchmark called app comprising of proc procedures:
ansic.mk | Makefile for GCC or LLVM compilation. |
bbs.txt | Total number of BBs in the NAC translation unit. |
builtin_names.txt | Name listing of builtin (black box) functions. |
proc.dot proc.dot.png |
CDFG representation in Graphviz for procedure proc. Visualization of the Graphviz CDFG for procedure proc. |
proc_cfg.dot proc_cfg.dot.png |
CFG representation in Graphviz for procedure proc. Visualization of the Graphviz CFG for procedure proc. |
app_cg.dot app_cg.dot.png |
Call graph representation in Graphviz for app. Visualization of the Graphviz call graph for app. |
app.nac | Working NAC representation of the application. |
app.exe | Executable generated by the C implementation of app. |
app_test_data.txt | Reference test data generated by app.exe. |
app_prof.txt | Basic block profiling report. |
app_sched.txt | Scheduling report (number of static cycles per BB). |
main.c | Generated C code containing the main() function. |
main.h | Header/interface file for the generated files. |
proc_nac.c | Backend C code generated from the corresponding NAC. |
procedure_names.txt | Name listing of the procedures used in app. |
This section provides detailed information on the actual process of profiling. First, in order to profile an application which is assumed to be contained in a single NAC translation unit, two files are required:
As a test vehicle, the iterative implementation of a factorial computation will be used, namely the fact application. Thus, the corresponding initial files are fact.0.nac and fact.c.
The contents of fact.0.nac are as follows:
procedure fact (in s32 n, out s32 y) { localvar s32 res; localvar s32 x; localvar s32 i; L0005: x <= mov n; res <= ldc 1; i <= ldc 1; D_1363 <= jmpun; D_1362: res <= mul res, i; i <= add i, 1; D_1363 <= jmpun; D_1363: D_1362, D_1364 <= jmple i, x; D_1364: y <= mov res; }
Since NAC is a relatively low-level language, a high-level language frontend would have to be used for profiling larger applications. In this sense, fact.c would serve as input to a C frontend producing NAC output.
The reference fact.c has the following contents:
#ifdef TEST #include <stdio.h> #endif int fact(int n) { int res, x, i; x = n; res = 1; for (i = 1; i <= x; i++) { res = res * i; } return res; } #ifdef TEST int main() { int i; int result; for (i = 0; i <= 13; i++) { result = fact(i); printf("%08x %08x\n", i, result); } return 0; } #endif
To automate the profiling process, it is more suitable to use scripting. The aprof distribution contains reference scripts for profiling. Specifically, the run-aprof-app.sh can be used.
The rest of this guide will provide a detailed view of the approach taken by the aforementioned script in the form of a series of steps. The $APROFTOP environmental variable is the path to the top-level directory of aprof.
Assuming that gcc is used as the host machine compiler, the following prompt generates the corresponding executable:
Then, the reference data can be generated:
The contents of fact_test_data.txt are input and output values for n and y=fact(n) in hexadecimal form:
00000000 00000001 00000001 00000001 00000002 00000002 00000003 00000006 00000004 00000018 00000005 00000078 00000006 000002d0 00000007 000013b0 00000008 00009d80 00000009 00058980 0000000a 00375f00 0000000b 02611500 0000000c 1c8cfc00 0000000d 7328cc00
This can be accomplished by copying fact.0.nac to fact.nac:
The following bash script variable
is used for maintaining the number of basic blocks in the NAC translation unit.
An AWK script, countbbs.awk is used for counting the basic blocks in the entire translation unit. This is performed by enumerating the labels in the NAC program, since all NAC basic blocks have explicit labels:
Then, the bbs.txt file is processed, to get the number of basic blocks:
# Process the bbs.txt file. bbsfile="bbs.txt" while read -r bbs; do num_bbs="${bbs}" done < ${bbsfile}
A while loop is used, in order to extract all the basic block counts in bbs.txt in case of a multi-translation unit application (currently unsupported by most features of aprof).
The nacinsbbcount TXL pass inserts profiling code for dynamic basic block counting in NAC programs:
A usual setup for TXL options is:
Then, fixnac is invoked for adding bookkeeping code as for the declaration of the _BB global array, its initialization and specifying the maximum number of basic blocks in the program.
The resulting fact.nac representation is as follows:
globalvar u64 _BB[4]={0,0,0,0}; procedure fact(in s32 n,out s32 y) { localvar s32 res; localvar s32 x; localvar s32 i; localvar u32 _temp_addr; localvar u32 _temp_data; L0005: _temp_addr <= ldc 0; _temp_data <= load _BB,_temp_addr; _temp_data <= add _temp_data,1; _BB <= store _temp_data,_temp_addr; x <= mov n; res <= ldc 1; i <= ldc 1; D_1363 <= jmpun; D_1362: _temp_addr <= ldc 1; _temp_data <= load _BB,_temp_addr; _temp_data <= add _temp_data,1; _BB <= store _temp_data,_temp_addr; res <= mul res,i; i <= add i,1; D_1363 <= jmpun; D_1363: _temp_addr <= ldc 2; _temp_data <= load _BB,_temp_addr; _temp_data <= add _temp_data,1; _BB <= store _temp_data,_temp_addr; D_1362,D_1364 <= jmple i,x; D_1364: _temp_addr <= ldc 3; _temp_data <= load _BB,_temp_addr; _temp_data <= add _temp_data,1; _BB <= store _temp_data,_temp_addr; y <= mov res; }
The profiling process is based on the generation of a compiled simulator for the NAC program. This is accomplished with the use of the nac2c decompiler which is applied on the original form of the application (fact.0.nac). This is needed in order to extract the static schedule of the initial form of the application.
Either the sequential or the ASAP scheduler can be used, which correspondingly reflect a sequential or intra-block parallel abstract machine.
First, a static scheduling extraction run of nac2c must be performed.
For enabling the sequential scheduler the following should be used:
The ASAP scheduler is enabled as follows, since it mandates at least pseudo-SSA (Static-Single Assignment):
Then, nac2c generates a multitude of files, which have been detailed in Section 5.
A file named fact_sched.txt is expected to be passed to a second run of aprof, which is the profiling run:
fact.sched.txt contains the estimated static cycles per basic block:
5 4 2 2
aprof proceeds with the second run of nac2c:
Optionally, the Graphviz (*.dot) representation of each NAC procedure can be visualized using the following snippet:
procfile="procedure_names.txt" while read -r app2; do echo "Creating CDFG view for ${app2}" dot -Tpng -O ${app2}.dot done < ${procfile}
In this step, the ansic.mk generated Makefile must be run in order to build main.exe, which is the compiled simulator for the examined application, fact.
This run produces fact_prof.txt which contains dynamic basic block counts:
14 91 105 14
Finally, meascycles is used for combining the dynamic basic block counts written in fact_prof.txt with the static cycle estimates which are found in fact_sched.txt:
As a result, the profiling estimate is produced in the standard output. For instance, the sequential scheduler produces:
while the ASAP scheduler computes the following:
You may contact me for further questions/suggestions/corrections at: