next up previous contents
Next: Appendices Up: Out of Order Processor Previous: Cache Hierarchy   Contents

Subsections


Branch Prediction

Introduction

PTLsim provides a variety of branch predictors in branchpred.cpp. The branch prediction subsystem is relatively independent of the core simulator and can be treated as a black box, so long as it implements the interfaces in branchpred.h.

The branch prediction subsystem always contains at least three distinct predictors for the three main classes of branches:

All these predictors are accessed by the core through the BranchPredictorInterface object. Based on the opcode and other uop information, the core determines the type flags of each branch uop:

Multiple flags may be present for each uop (for instance, BRANCH_HINT_RET and BRANCH_HINT_INDIRECT are both used for the jmp uop terminating an x86 ret instruction).

To make a prediction at fetch time, the core calls the BranchPredictorInterface::predict() method, passing it a PredictorUpdate structure. This structure is carried along with each uop until it retires, and contains all the information needed to eventually update the branch predictor at the end of the pipeline. The contents will vary depending on the predictor chosen, but in general this structure contains pointers into internal predictor counter tables and various flags. The predict()method fills in this structure.

As each uop commits, the BranchPredictorInterface::update() method is passed the uop's saved PredictorUpdate structure and the branch outcome (expected target RIP versus real target RIP) so the branch predictor can be updated. In PTLsim, predictor updates only occur at retirement to avoid corruption caused by speculative instructions.

Conditional Branch Predictor

The PTLsim conditional branch predictor is the most flexible predictor, since it can be easily replaced. The default predictor implemented in branchpred.cpp is a selection based predictor. In essence, two separate predictors are maintained. The history predictor hashes a shift register of previously predicted branches into a table slot; this slot returns whether or not the branch with that history is predicted as taken. PTLsim supports various combinations of the history and branch address to provide gshare based semantics. The bimodal predictor is simpler; it uses 2-bit saturating counters to predict if a given branch is likely to be taken. Finally, a selection predictor specifies which of the two predictors is more accurate and should be used for future predictions. This style of predictor, sometimes called a McFarling predictor, has been described extensively in the literature and variations are used by most modern processors.

Through the CombinedPredictor template class, the user can specify the sizes of all the tables (history, bimodal, selector), the history depth, the method in which the global history and branch address are combined and so on. Alternatively, the conditional branch predictor can be replaced with something entirely different if desired.

Branch Target Buffer

The Branch Target Buffer (BTB) is essentially a small cache that maps indirect branch RIP addresses (i.e., jmp uops) into predicted target RIP addresses. It is set associative, with a user configurable number of sets and ways. In PTLsim, the BTB does not take into account any indirect branch history information. The BTB is a nearly universal structure in branch prediction; see the literature for more information.

Return Address Stack

The Return Address Stack (RAS) predicts the target address of indirect jumps marked with the BRANCH_HINT_RET flag. Whenever the BRANCH_HINT_RET flag is passed to the predict() method, the top RAS stack entry is returned as the predicted target, overriding anything in the BTB.

Unlike the conditional branch predictor and BTB, the RAS updated speculatively in the frontend pipeline, before the outcome of calls and returns are known. This allows better performance when closely spaced calls and returns must be predicted as they are fetched, before either the call or corresponding return have actually executed. However, when called with the BRANCH_HINT_RET flag, the predict() method only returns the RIP at the top of the RAS, but does not push or pop the RAS. This must be done after the corresponding bru or jmp (for direct and or indirect calls, respectively) or jmp (for returns) uop is actually allocated in the ROB.

This approach is required since the RAS is speculatively updated: if uops must be annulled (because of branch mispredictions or mis-speculations), the annulment occurs by walking backwards in the ROB until the excepting uop is encountered. However, if the RAS were updated during the fetch stage, some uops may not be in the ROB yet and hence the rollback logic cannot undo speculative changes made to the RAS by these uops. This causes the RAS to get out of alignment and performance suffers.

To solve this problem, the RAS is only updated in the allocate stage immediately after fetch. In the out of order core's rename() function, the BranchPredictorInterface::updateras() method is called to either push or pop an entry from the RAS (calls push entries, returns pop entries). Unlike the conditional branch predictor and BTB, this is the only place the RAS is updated, rather than performing updates at commit time.

If uops must be annulled, the ReorderBufferEntry::annul() method calls the BranchPredictorInterface::annulras() method with the PredictorUpdate structure for each uop it encounters in reverse program order. This method effectively undoes whatever change was made to the RAS when the updateras() method was called with the same PredictorUpdate information during renaming and allocation. This is possible because updateras() saves checkpoint information (namely, the old RAS top of stack and the value at that stack slot) before updating the RAS; this allows the RAS state to be rolled backwards in time as uops are annulled in reverse program order. At the end of the annulment process when fetching is restarted at the correct RIP, the RAS state should be identical to the state that existed before the last uop to be annulled was originally fetched.


next up previous contents
Next: Appendices Up: Out of Order Processor Previous: Cache Hierarchy   Contents
Matt T Yourst 2007-09-26