next up previous contents
Next: Bibliography Up: Appendices Previous: PTLsim uop Reference   Contents

Subsections


Performance Counters

PTLsim maintains hundreds of performance and statistical counters and data points as it simulates user code. In Section 8, the basic mechanisms and data structures through which PTLsim collects these data were disclosed, and a guide to extending the existing set of collection points was presented.

This section is a reference listing of all the current performance counters present in PTLsim by default. The sections below are arranged in a hierarchical tree format, just as the data are represented in PTLsim's data store. The types of data collected closely match the performance counters available on modern Intel and AMD x86 processors, as described in their respective reference manuals.

General

As described in Section 8, PTLsim maintains a hierarchical tree of statistical data, defined in stats.h. The data store contains a potentially large number of snapshots of this tree, numbered starting at 0. The final snapshot, taken just before simulation completes, is labeled as ``final''. Each snapshot branch contains all of the data structures described in the next few sections. Snapshots are enabled with the -snapshot-cycles configuration option (Section 10.3); if they are disabled, only the ``0'' and ``final'' snapshots are provided.

Summary

The summary toplevel branch summarizes information about the simulation run across all cores:

summary: general information

snapshot_uuid: the universally unique ID (UUID) of this snapshot. This number starts from 0 and increases to infinity.

snapshot_name: name of this snapshot, if any. Named snapshots can be taken by the ptlcall_snapshot() call within the virtual machine, or by the -snapshot-now name command.

Simulator

The simulator toplevel branch represents information about PTLsim itself:

version: PTLsim version information

run: runtime environment information

config: the configuration options last passed to PTLsim for this run

performance: PTLsim internal performance data

Decoder

The decoder toplevel branch represents the x86-to-uop decoder, basic block cache, code page cache and other common structures:

throughput: total decoded entities

bb_decode_type: predominant decoder type used for each basic block

page_crossings: alignment of instructions within page

bbcache: basic block cache accesses

pagecache: physical code page cache

reclaim_rounds: number of times the memory manager attempted to reclaim unused basic blocks (possibly with several attempts until enough memory was available)

Out of Order Core

The out of order core is represented by the ooocore toplevel branch of the statistics data store tree:

cycles: total number of processor cycles simulated

fetch: fetch stage statistics

frontend: frontend pipeline (decode, allocate, rename) statistics

dispatch: dispatch unit statistics

issue: issue statistics

writeback: writeback stage statistics

commit: commit unit statistics

branchpred: branch predictor statistics

Cache Subsystem

The cache subsystem is listed under the ooocore/dcache branch.

load: load unit statistics

fetch: instruction fetch unit statistics (Section 17.1)

prefetches: prefetch engine statistics

missbuf: miss buffer performance (Sections 25.2 and 25.3)

lfrq: load fill request queue (LFRQ) performance (Sections 25.2 and 25.3)

store: store unit statistics

simulator: describes the performance of PTLsim itself. Useful for tuning the simulator.

External Events


next up previous contents
Next: Bibliography Up: Appendices Previous: PTLsim uop Reference   Contents
Matt T Yourst 2007-09-26