Using PTLsim Benchmarks on the x86-64 Cluster

PTLsim Full System Example Benchmark

The following benchmark kit is provided as an example of how to set up a full system benchmark for PTLsim/X. For this test suite, PTLsim was configured as close as possible to the AMD K8 (Opteron / Athlon 64) microarchitecture before running a demanding client server networked benchmark (rsync of a large file set with ssh as the transport). PTLsim's results across multiple statistics were then compared with a real Athlon 64's performance counters.

Our results using this benchmark kit are provided in the ISPASS 2007 paper PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator.

Files in the benchmark kit:

ptlsim-k8-test.img.bz2
Disk image (Xen compatible ext2 format) containing the complete benchmark and scripts to run it as soon as the system boots. This must be decompressed (via bunzip2) before use.
ptlsim-k8-test-source.tar.bz2
PTLsim source code, ptlsim-k8-test Xen configuration and make-timelapse-xxx scripts to re-create the colorful graphs in the paper.

To rebuild the source code, first copy smtcore-amd-k8.h to smtcore.h, and copy dcache-amd-k8.h to dcache.h, then run make.

The domain may be started with the included script run-domain ptlsim-k8-test.

Using the PTLsim SPEC 2000 Benchmark Suite

SPEC CPU 2000 Benchmarks

To help users get up and running with PTLsim in a benchmarking environment, we have prepared the majority of the SPEC CPU 2000 1.1 benchmark suite for use with PTLsim. Our test suite differs from the SPEC suite in several ways:

We use a set of custom Makefiles to ensure the benchmarks are built with the optimal compiler options and use PTLsim instead of running natively
We've patched the benchmark source code by inserting calls to ptlcall_switch_to_sim() at trigger points at the top of each benchmark's main loop. The advantages of trigger points are discussed in the FAQ.
Scripts are included for distributing large benchmark runs across multiple nodes in a Linux cluster if desired.

IMPORTANT NOTE: SPEC does not allow us to distribute their copyrighted datasets and some benchmark programs. Therefore, we distribute our changes as a set of patches and scripts instead. You will need access to the original SPEC CPU 2000 files for the following to work. Hopefully some day SPEC will adopt a more open source friendly licensing policy, but at the present time, this is our only option.

Presently the 25 included benchmarks are:

SPECint 2000: gzip, vpr, gcc, mcf, crafty, eon, bzip2, twolf, perlbmk parser, gap, vortex
SPECfp 2000: wupwise, swim, mgrid, applu, mesa, art, equake, apsi, sixtrack, ammp, fma3d, lucas, facerec

Steps for building and using the benchmarks:

Get PTLsim from the download page, either by downloading the stable version or checking out the Subversion repository.
Enter the spec2000 directory:

cd ptlsim cd spec2000
Copy files from your SPEC 2000 distribution. The following script needs the path to the benchspec directory at the root of your SPEC CD or distribution. It copies SPEC proprietary files into the PTLsim specific SPEC tree and patches the code with our PTLsim trigger points. The patches assume you have SPEC CPU 2000 v1.1, not some later version.

get-from-spec /path/to/SPEC2000-CD/benchspec
Build the benchmarks. You should have gcc 3.4.x and especially gfortran 4.x installed for this to work - many of the benchmarks are Fortran 90 so earlier gcc versions will not work.

make
Set PTLsim configuration for all benchmarks, and run!

setconfig "-logfile logfile -stats ptlsim.stats -trigger -exitend -stopinsns 200m"

Compiler Options and Settings

All options for compiling and running the benchmarks are in spec2000/Makefile.config. All benchmarks are available in both 32-bit x86 and 64-bit x86-64 format. The binaries are named accordingly (e.g. spec2000/gzip/gzip-32bit and spec2000/gzip/gzip-64bit), but the setmode script will link them to their real names (e.g. spec2000/gzip/gzip). If you compile the benchmark suite from source, the correct links will automatically be set up.

In our reference test suite, most benchmarks were compiled using gcc 3.4.4 and statically linked to the default libraries (glibc, libm, etc.) shipped with SuSE 9.3 (either 32-bit x86 with generic i686 optimizations, or 64-bit x86-64 with AMD K8 optimizations). All Fortran benchmarks were compiled with gcc 4.0.2 instead, since this is the only available Fortran 90 compatible compiler. If you build the benchmarks from source, your environment may differ.

The compiler options used for all benchmarks were:

-static -DBENCHMARK -DSPEC_CPU2000 -O3 -march=k8|pentium4 -funroll-loops -fno-trapping-math -mfpmath=sse -funit-at-a-time -ffast-math -fprefetch-loop-arrays -mfpmath=sse -include ../../ptlcalls.h

Notice that the options compile specifically for high performance SSE2 math instead of x87 (the default on 32-bit x86) and schedule the 32-bit binary for Pentium 4, while the 64-bit binary is scheduled for AMD K8.

Setting the benchmark configuration

Go to the ptlsim/spec2000 directory. The setconfig script is used to apply the same PTLsim configuration options to all benchmarks. Here is a good example:

setconfig -logfile logfile -stats ptlsim.stats -trigger -stopinsns 200m -exitend

All the options above and more are described in the PTLsim documentation.

This will run all benchmarks in our test suite for 200 million x86 instructions while logging information to logfile and saving the statistics tree to ptlsim.stats and ptlsim.stats.txt, all under the benchmark directory (e.g. spec2000/gzip/logfile).

It also starts execution at the trigger point we have inserted for you (i.e. at the top of the program's main loop, after initialization). More information on trigger points is in the PTLsim Manual. Finally, the benchmark is terminated after simulation, rather than being allowed to continue in native mode (since some SPEC benchmarks will run for many hours if not stopped).

Running All Benchmarks on a Cluster

The runbench-cluster Perl script runs all 25 benchmarks in our test suite by automatically distributing the load evenly across all machines and CPUs in a Linux cluster so as to minimize total run time.

NOTE: The runbench-cluster script we distribute is configured for our internal x86-64 cluster. You will need to modify the list of cluster nodes and CPU counts to fit your own configuration.

The syntax is given in the following example:

runbench-cluster gzip vpr gcc mcf ...

This script divides up the listed benchmarks into parallel sub-lists. It then uses ssh to connect to each machine and invokes the runbench-local helper script to run a sub-list of a few benchmarks sequentially on each CPU in each cluster machine. Technically this is done by entering the directory for each benchmark and executing make run.

The runbench-cluster script waits until all remote ssh sessions terminate (one ssh session per CPU per machine) before finishing.

For your convenience, runbench-cluster-all will just invoke runbench-cluster with the entire list of available benchmarks.

Here's an example of the output you will see:

yourst [tidalwave /project/ptlsim/spec2000] runbench-cluster-all runbench-cluster: Running 25 benchmarks on 4 nodes with 2 threads per node... Thread 0 on tidalwave cpu 0: [gzip] [perlbmk] [equake] => PID 27645 Thread 1 on typhoon cpu 0: [vpr] [parser] [apsi] => PID 27646 Thread 2 on tsunami cpu 0: [gcc] [wupwise] [sixtrack] => PID 27648 Thread 3 on tornado cpu 0: [mcf] [swim] [ammp] => PID 27650 Thread 4 on tidalwave cpu 1: [crafty] [mgrid] => PID 27651 Thread 5 on typhoon cpu 1: [eon] [applu] => PID 27654 Thread 6 on tsunami cpu 1: [bzip2] [mesa] => PID 27656 Thread 7 on tornado cpu 1: [twolf] [art] => PID 27658 Waiting on 8 threads: Thread 0 (pid 27645) on tidalwave cpu 0...OK (pid 27645), result 0 Thread 1 (pid 27646) on typhoon cpu 0...OK (pid 27646), result 0 Thread 2 (pid 27648) on tsunami cpu 0...OK (pid 27648), result 0 Thread 3 (pid 27650) on tornado cpu 0...OK (pid 27650), result 0 Thread 4 (pid 27651) on tidalwave cpu 1...OK (pid 27651), result 0 Thread 5 (pid 27654) on typhoon cpu 1...OK (pid 27654), result 0 Thread 6 (pid 27656) on tsunami cpu 1...OK (pid 27656), result 0 Thread 7 (pid 27658) on tornado cpu 1...OK (pid 27658), result 0 Done!

You can press Ctrl+C to abort the benchmark runs.

To run benchmarks sequentially, go to the benchmark directory (e.g. spec2000/gzip) and type make run. You can also type make run in the top-level spec2000 directory, but this will run all benchmarks sequentially instead of in parallel.

Files Generated

In each benchmark directory (e.g. ptlsim/spec2000/gzip, you will find several files:

run.log contains the output normally printed to the console, if any.
logfile is PTLsim's log file with the usual information, including how many cycles and commits have executed so far (useful for checking progress).
ptlsim.stats contains the binary format PTLsim statistics tree.
ptlsim.stats.txt contains the textual representation of the PTLsim statistics tree. See the PTLsim User's Guide and Reference for more information.

Obviously these names may be different if you used setconfig to change the filenames.