During the Allocate stage, PTLsim dequeues uops from the fetch queue, ensures all resources needed by those uops are free, and assigns resources to each uop as needed. These resources include Reorder Buffer (ROB) slots, physical registers and load store queue (LSQ) entries. In the event that the fetch queue is empty or any of the ROB, physical register file, load queue or store queue is full, the allocation stage stalls until some resources become available.
The Reorder Buffer (ROB) in the PTLsim out of order model works exactly like a traditional ROB: as a queue, entries are allocated from the tail and committed from the head. Each ReorderBufferEntry structure is the central tracking structure for uops in the pipeline. This structure contains a variety of fields including:
Each ROB entry and corresponding uop can be in one of a number of states describing its progress through the simulator state machine. ROBs are linked into linked lists according to their current state; these lists are named rob_statename_list. The current_state_list field specifies the list the ROB is currently on. ROBs can be moved between states using the ROB::changestate(statelist) method. The specific states will be described below as they are encountered.
NOTE: the terms ``ROB entry'' (singular) and ``uop'' are used interchangeably from now on unless otherwise stated, since there is a 1:1 mapping between the two.
Physical registers are represented in PTLsim by the PhysicalRegister structure. Physical registers store several components:
PTLsim uses a flexible physical register file model in which multiple physical register files with different sizes and properties can optionally be defined. Each physical register file in the OutOfOrderCore::physregfiles[] array can be made accessible from one or more clusters. For instance, uops which execute on floating point clusters can be forced to always allocate a register in the floating point register file, or each cluster can have a dedicated register file.
Various heuristics can also be used for selecting the register file into which a result is placed. The default heuristic simply finds the first acceptable physical register file with a free register. Acceptable physical register files are those register files in which the uop being allocated is allowed to write its result; this is configurable based on clustering as described below. Other allocation policies, such as alternation between available register files and dependency based register allocation, are all possible by modifying the rename() function where physical registers are allocated..
In each physical register file, physical register number 0 is defined as the null register: it always contains the value zero and is used as an operand anywhere the zero value (or no value at all) is required.
Physical register files are configured in ooocore.h. The PhysicalRegisterFile[] array is defined to declare each register file by name, register file ID (RFID, from 0 to the number of register files) and size. The MAX_PHYS_REG_FILE_SIZE parameter must be greater than the largest physical register in the processor.
Each physical register can be in one of several states at any given time. For each physical register file, PTLsim maintains linked lists (the PhysicalRegisterFile.states[statename] lists) to track which registers are in each state. The state field in each physical register specifies its state, and implies that the physical register is on the list physregfiles[physreg.rfid].states[physreg.state]. The valid states are:
Load Store Queue (LSQ) Entries (the LoadStoreQueueEntry structure in PTLsim) are used to track additional information about loads and stores in the pipeline that cannot be represented by a physical register. Specifically, LSQ entries track:
In real processors, the load queue (LDQ) and store queue (STQ) are physically separate for circuit complexity reasons. However, in PTLsim a unified LSQ is used to make searching operations easier. One additional bit flag (store bit) specifies whether an LSQ entry is a load or store.
The basic register renaming process in the PTLsim x86 model is very similar to classical register renaming, with the exception of the flags complications described in Section 5.4. Two versions of the register rename table (RRT) are maintained: a speculative RRT which is updated as uops are renamed, and a commit RRT, which is only updated when uops successfully commit. Since the simulator implements a unified physical and architectural register file, the commit process does not actually involve any data movement between physical and architectural registers: only the commit RRT needs to be updated. The commit RRT is used only for exception and branch mispredict recovery, since it holds the last known good mapping of architectural to physical registers.
Each rename table contains 80 entries as shown in Table 18.1. This table maps architectural registers and pseudo-registers to the most up to date physical registers for the following:
Once the uop's three architectural register sources are mapped to physical registers, these physical registers are placed in the operands[0,1,2] fields. The fourth operand field, operands[3], is used to hold a store buffer dependency for loads and stores; this will be discussed later. The speculative RRT entries for both the destination physical register and any modified flags are then overwritten. Finally, the ROB is moved into the frontend state.
Since the rest of the simulator outside of the out of order core does not know about the RRTs and expects architectural registers to be in a standardized format, the per-core Context structure is used to house the architectural register file. These architectural registers, including REG_flags and REG_rip, are directly updated in program order by the out of order core as instructions commit.
To simulate various processor frontend pipeline depths, ROBs are placed in the frontend state for a user-selectable number of cycles. In the frontend() function, the cycles_left field in each ROB is decremented until it becomes zero. At this point, the uop is moved to the ready_to_dispatch state. This feature can be used to simulate various branch mispredict penalties by setting the FRONTEND_STAGES constant.