Next: Scheduling, Dispatch and Issue Up: Out of Order Processor Previous: Fetch Stage Contents

Subsections

Frontend and Key Structures

Resource Allocation

During the Allocate stage, PTLsim dequeues uops from the fetch queue, ensures all resources needed by those uops are free, and assigns resources to each uop as needed. These resources include Reorder Buffer (ROB) slots, physical registers and load store queue (LSQ) entries. In the event that the fetch queue is empty or any of the ROB, physical register file, load queue or store queue is full, the allocation stage stalls until some resources become available.

Reorder Buffer Entries

The Reorder Buffer (ROB) in the PTLsim out of order model works exactly like a traditional ROB: as a queue, entries are allocated from the tail and committed from the head. Each ReorderBufferEntry structure is the central tracking structure for uops in the pipeline. This structure contains a variety of fields including:

The decoded uop (uop field). This is the fully decoded TransOp augmented with fetch-related information like the uop's UUID, RIP and branch predictor information as described in the Fetch stage (Section 17.1).
Current state of the ROB entry and uop (current_state_list; see below)
Pointers to the physical register (physreg), LSQ entry (lsq) and other resources allocated to the uop
Pointers to the three physical register operands to the uop, as well as a possible store dependency used in replay scheduling (described later)
Various cycle counters and related fields for simulating progress through the pipeline

ROB States

Each ROB entry and corresponding uop can be in one of a number of states describing its progress through the simulator state machine. ROBs are linked into linked lists according to their current state; these lists are named rob_statename_list. The current_state_list field specifies the list the ROB is currently on. ROBs can be moved between states using the ROB::changestate(statelist) method. The specific states will be described below as they are encountered.

NOTE: the terms ``ROB entry'' (singular) and ``uop'' are used interchangeably from now on unless otherwise stated, since there is a 1:1 mapping between the two.

Physical Registers

Physical registers are represented in PTLsim by the PhysicalRegister structure. Physical registers store several components:

Index of the physical register (idx) and the physical register file id (rfid) to which it belongs
The actual 64-bit register data
x86 flags: Z, P, S, O, C. These are discussed below in Section 5.4.
Waiting flag (FLAG_WAIT) for results not yet ready
Invalid flag (FLAG_INVAL) for ready results which encountered an exception. The exception code is written to the data field in lieu of the real result
Current state of the physical register (state)
ROB currently owning this physical register, or architectural register mapping this physical register
Reference counter for the physical register. This is required for reasons described in Section 24.5.

Physical Register File

PTLsim uses a flexible physical register file model in which multiple physical register files with different sizes and properties can optionally be defined. Each physical register file in the OutOfOrderCore::physregfiles[] array can be made accessible from one or more clusters. For instance, uops which execute on floating point clusters can be forced to always allocate a register in the floating point register file, or each cluster can have a dedicated register file.

Various heuristics can also be used for selecting the register file into which a result is placed. The default heuristic simply finds the first acceptable physical register file with a free register. Acceptable physical register files are those register files in which the uop being allocated is allowed to write its result; this is configurable based on clustering as described below. Other allocation policies, such as alternation between available register files and dependency based register allocation, are all possible by modifying the rename() function where physical registers are allocated..

In each physical register file, physical register number 0 is defined as the null register: it always contains the value zero and is used as an operand anywhere the zero value (or no value at all) is required.

Physical register files are configured in ooocore.h. The PhysicalRegisterFile[] array is defined to declare each register file by name, register file ID (RFID, from 0 to the number of register files) and size. The MAX_PHYS_REG_FILE_SIZE parameter must be greater than the largest physical register in the processor.

Physical Register States

Each physical register can be in one of several states at any given time. For each physical register file, PTLsim maintains linked lists (the PhysicalRegisterFile.states[statename] lists) to track which registers are in each state. The state field in each physical register specifies its state, and implies that the physical register is on the list physregfiles[physreg.rfid].states[physreg.state]. The valid states are:

free: the register is not allocated to any uop.
waiting: the register has been allocated to a uop but that uop is waiting to issue.
bypass: the uop associated with the register has issued and produced a value (or encountered an exception), but that value is only on the bypass network - it has not actually been written back yet. For simulation purposes only, uops immediately write their results into the physical register as soon as they issue, even though technically the result is still only on the bypass network. This helps simplify the simulator considerably without compromising accuracy.
written: the uop associated with the register has passed through the writeback stage and the value of the physical register is now up to date; all future consumers will read the uop's result from this physical register.
arch: the physical register is currently mapped to one of the architectural registers; it has no associated uop currently in the pipeline
pendingfree: this is a special state described in Section 24.5.

One physical register is allocated to each uop and moved into the waiting state, regardless of which type of uop it is. For integer, floating point and load uops, the physical register holds the actual numerical value generated by the corresponding uop. Branch uops place the target RIP of the branch in a physical register. Store uops place the merged data to store in the register. Technically branches and stores do not need physical registers, but to keep the processor design simple, they are allocated registers anyway.

Load Store Queue Entries

Load Store Queue (LSQ) Entries (the LoadStoreQueueEntry structure in PTLsim) are used to track additional information about loads and stores in the pipeline that cannot be represented by a physical register. Specifically, LSQ entries track:

Physical address of the corresponding load or store
Data field (64 bits) stores the loaded value (for loads) or the value to store (for stores)
Address valid bit flag indicates if the load or store knows its effective physical address yet. If set, the physical address field is valid.
Data valid bit flag indicates if the data field is valid. For loads, this is set when the data has arrived from the cache. For stores, this is set when the data to store becomes ready and is merged.
Invalid bit flag is set if an exception occurs in the corresponding load or store.

The LoadStoreQueueEntry structure is technically a superset of a structure known as an SFR (Store Forwarding Register), which completely represents any load or store and can be passed between PTLsim subsystems easily. One LSQ entry is allocated to each load or store during the Allocate stage.

In real processors, the load queue (LDQ) and store queue (STQ) are physically separate for circuit complexity reasons. However, in PTLsim a unified LSQ is used to make searching operations easier. One additional bit flag (store bit) specifies whether an LSQ entry is a load or store.

Register Renaming

The basic register renaming process in the PTLsim x86 model is very similar to classical register renaming, with the exception of the flags complications described in Section 5.4. Two versions of the register rename table (RRT) are maintained: a speculative RRT which is updated as uops are renamed, and a commit RRT, which is only updated when uops successfully commit. Since the simulator implements a unified physical and architectural register file, the commit process does not actually involve any data movement between physical and architectural registers: only the commit RRT needs to be updated. The commit RRT is used only for exception and branch mispredict recovery, since it holds the last known good mapping of architectural to physical registers.

Each rename table contains 80 entries as shown in Table 18.1. This table maps architectural registers and pseudo-registers to the most up to date physical registers for the following:

16 x86-64 integer registers
16 128-bit SSE registers (represented as separate 64-bit high and low halves)
ZAPS, CF, OF flag sets described in Section 5.4. These rename table entries point to the physical register (with attached flags) of the most recent uop in program order to update any or all of the ZAPS, CF, OF flag sets, respectively.
Various integer and x87 status registers
Temporary pseudo-registers temp0-temp7 not visible to x86 code but required to hold temporaries (e.g. generated addresses or value to swap in xchg instructions).
Special fixed values, e.g. zero, imm (value is in immediate field), mem (destination of stores)

**Table 18.1:** Architectural registers and pseudo-registers used for renaming.
Architectural Registers and Pseudo-Registers

Once the uop's three architectural register sources are mapped to physical registers, these physical registers are placed in the operands[0,1,2] fields. The fourth operand field, operands[3], is used to hold a store buffer dependency for loads and stores; this will be discussed later. The speculative RRT entries for both the destination physical register and any modified flags are then overwritten. Finally, the ROB is moved into the frontend state.

External State

Since the rest of the simulator outside of the out of order core does not know about the RRTs and expects architectural registers to be in a standardized format, the per-core Context structure is used to house the architectural register file. These architectural registers, including REG_flags and REG_rip, are directly updated in program order by the out of order core as instructions commit.

Frontend Stages

To simulate various processor frontend pipeline depths, ROBs are placed in the frontend state for a user-selectable number of cycles. In the frontend() function, the cycles_left field in each ROB is decremented until it becomes zero. At this point, the uop is moved to the ready_to_dispatch state. This feature can be used to simulate various branch mispredict penalties by setting the FRONTEND_STAGES constant.

Next: Scheduling, Dispatch and Issue Up: Out of Order Processor Previous: Fetch Stage Contents

Matt T Yourst 2007-09-26