Next: Out Of Order Core Up: Out of Order Processor Previous: Out of Order Processor Contents

Subsections

Introduction

Out Of Order Core Features

PTLsim completely models a modern out of order x86-64 compatible processor, cache hierarchy and key devices with true cycle accurate simulation. The basic microarchitecture of this model is a combination of design features from the Intel Pentium 4, AMD K8 and Intel Core 2, but incorporates some ideas from IBM Power4/Power5 and Alpha EV8. The following is a summary of the characteristics of this processor model:

The simulator directly fetches pre-decoded micro-operations (Section 17.1) but can simulate cache accesses as if x86 instructions were being decoded on fetch
Branch prediction is configurable; PTLsim currently includes various models including a hybrid g-share based predictor, bimodal predictors, saturating counters, etc.
Register renaming takes into account x86 quirks such as flags renaming (Section 5.4)
Front end pipeline has configurable number of cycles to simulate x86 decoding or other tasks; this is used for adjusting the branch mispredict penalty
Unified physical and architectural register file maps both in-flight uops as well as committed architectural register values. Two rename tables (speculative and committed register rename tables) are used to track which physical registers are currently mapped to architectural registers.
Unified physical register file for both integer and floating point values.
Operands are read from the physical register file immediately before issue. Unlike in some microprocessors, PTLsim does not do speculative scheduling: the schedule and register read loop is assumed to take one cycle.
Issue queues based on a collapsing design use broadcast based matching to wake up instructions.
Clustered microarchitecture is highly configurable, allowing multi-cycle latencies between clusters and multiple issue queues within the same logical cluster.
Functional units, mapping of functional units to clusters, issue ports and issue queues and uop latencies are all configurable.
Speculation recovery from branch mispredictions and load/store aliasing uses the forward walk method to recover the rename tables, then annuls all uops after and optionally including the mis-speculated uop.
Replay of loads and stores after store to load forwarding and store to store merging dependencies are discovered.
Stores may issue even before data to store is known; the store uop is replayed when all operands arrive.
Load and store queues use partial chunk address matching and store merging for high performance and easy circuit implementation.
Prediction of load/store aliasing to avoid mis-speculation recovery overhead.
Prediction and splitting of unaligned loads and stores to avoid mis-speculation overhead
Commit unit supports stalling until all uops in an x86 instruction are complete, to make x86 instruction commitment atomic

The PTLsim model is fully configurable in terms of the sizes of key structures, pipeline widths, latency and bandwidth and numerous other features.

Processor Contexts

PTLsim uses the concept of a VCPU (virtual CPU) to represent one user-visible microprocessor core (or a hardware thread if a SMT machine is being modeled). The Context structure (defined in ptlhwdef.h) maintains all per-VCPU state in PTLsim: this includes both user-visible architectural registers (in the Context.commitarf[] array) as well as all per-core control registers and internal state information. Context only contains general x86-visible context information; specific machine models must maintain microarchitectural state (like physical registers and so forth) in their own internal structures.

The contextof(N) macro is used to return the Context object for a specific VCPU, numbered 0 to contextcount-1. In userspace-only PTLsim, there is only one context, contextof(0). In full system PTLsim/X, there may be up to 32 (i.e. MAX_CONTEXTS) separate contexts (VCPUs).

PTLsim Machine/Core/Thread Class Hierarchy

PTLsim easily supports user defined plug-in machine models. Two of these models, the out of order core (``ooo'') and the sequential in-order core (``seq'') ship with PTLsim; others can be easily added by users. PTLsim implements several C++ classes used to build simulation models by dividing a virtual machine into CPU sockets, cores and threads.

The PTLsimMachine class is at the root of the hierarchy. Every simulation model must subclass PTLsimMachine and define its virtual methods. Adding a machine model to PTLsim is very simple: simply define one instance of your machine class in a source file included in the Makefile. For instance, assuming XYZMachine subclasses PTLsimMachine and will be called ``xyz'':

: XyzMachine xyzmodel(``xyz'');

The constructor for XyzMachine will be called by PTLsim after all other subsystems are brought up. It should use the addmachine(``name'') static method to register the core model's name with PTLsim, so it can be specified using the ``-corexyz'' option.

The machine models included with PTLsim (namely, OutOfOrderMachine and SequentialMachine) have been placed in their own C++ namespace. When adding your own core, copy the example source file(s) to new names and adjust the namespace specifiers to a new name to avoid clashes. You should be able to link any number of machine models defined in this manner into PTLsim all at once.

The PTLsimMachine::init() method is called to initialize each machine model the first time it is used. This function is responsible for dividing the contextcount contexts up into sockets, cores and threads, depending entirely on the machine model's design and any configuration options specified by the config parameter.

PTLsimMachine::run() is called to actually run the simulation; more details will be given on this later.

PTLsimMachine::update_stats() is described in Section 8.

PTLsimMachine::dump_state() is called to aid debugging whenever an assertion fails, the simulator accesses a null pointer or invalid address, or from anywhere else it may be useful.

Next: Out Of Order Core Up: Out of Order Processor Previous: Out of Order Processor Contents

Matt T Yourst 2007-09-26