[2011!03!05] branch prediction

Upload: aemilor

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 [2011!03!05] Branch Prediction

    1/18

    Branch predictionGrigory Rechistov

    With help of Alexander Titov

    MDSP

    March, 2011

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    2/18

    2

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    Agenda Quick introduction

    Reasons to use

    Static (software) prediction

    Saturating counter

    Two-level adaptive predictor Local/global predictors

    Branch target predictor

    Branch predication

  • 8/3/2019 [2011!03!05] Branch Prediction

    3/18

    3

    What is a branch predictor

    A branch predictor is a digital circuit that tries to

    guess which way a jump instruction (e.g. an if-then-else structure) will go before this is knownfor sure.

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    4/18

    4

    Branch instruction types

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

    Conditional UnconditionalDirect if - then- else

    for loops

    (bez, bnez, etc)

    procedure calls (jal)

    goto (j)

    Indirect return (jr)virtual function lookup

    function pointers (jalr)

  • 8/3/2019 [2011!03!05] Branch Prediction

    5/18

    5

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    The 1st reason: the gap in pipelineConditional branch

    instruction

    instruction

    instruction

    instruction

    not taken

    taken

    ...

    FE DEC EXE MEM WB

    FE DEC EXE MEM WB

    FE DEC EXE MEM WB

    FE DEC EXE MEM WB

    FE DEC EXE MEM

    the gap iftaken

    Condition iscalculated only here

    The time that is wasted in case of a branch misprediction is equal to the number of stages in

    the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have

    quite long pipelines so that the misprediction delay is between 10 and 20 cycles (if not

    consider misses in caches).

  • 8/3/2019 [2011!03!05] Branch Prediction

    6/18

    6

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    The 2nd reason: parallel execution A processor use special technique to identify instruction can be executed in parallel, but the

    linear part (instruction between two branches) is to small to select enough instruction (every

    5-8th

    instruction is a branch). The branch prediction can increase the linear part:

    1

    2 3

    4 5 6

    loop,

    2 iter

    critical path

    1 3 6 1 3 6 1 3

    exit

    exit

    Predicted linear part

    The present

    is here

    The linear part increases significantly, which means

    that a processor can find and execute more

    instructions in parallel.

    Branchpredictor

    Control flow graph Prediction

  • 8/3/2019 [2011!03!05] Branch Prediction

    7/187

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    Static (software) prediction There are situations as the branch direction can be statically predicted

    with a good probability: Backwards branches

    Exception and error handling

    Some branches meet with specific heuristics

    The backwards branch is one that has a target address that is lower than

    its own address. Therefore SBP can help with prediction accuracy of

    loops, which are usually backward-pointing branches, and are taken more

    often than not taken.

    Some processors allow branch prediction hints to be inserted into thecode to tell whether the static prediction should be taken or not taken. In

    this case it is also possible to use the profile information (the history of

    execution of a program).

  • 8/3/2019 [2011!03!05] Branch Prediction

    8/188

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    Saturating counter A saturating counter or bimodal predictor is a state machine with four states:

    When a branch is evaluated, the corresponding state machine is updated. Branchesevaluated as not taken (NT) decrement the state towards strongly not taken, andbranches evaluated as taken (T) increment the state towards strongly taken

    The advantage of the two-bit counter over a one-bit scheme is that a conditionaljump has to deviate twice from what it has done most in the past before theprediction changes. For example, a loop-closing conditional jump is mispredictedonce rather than twice.

  • 8/3/2019 [2011!03!05] Branch Prediction

    9/189

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    1 0 0 1 0 0

    0 0 0 0 0 1 0 0 0

    Two level adaptive predictor Conditional jumps that are taken every second time or have some other regularly

    recurring pattern are not predicted well by the saturating counter, but an adaptivepredictor can easily solve this problem

    pastfuture

    present

    History

    buffer

    weakly

    NT

    strongly

    NT

    strongly

    T

    weakly

    T00

    weakly

    NT

    strongly

    NT

    strongly

    T

    weakly

    T01

    weakly

    NT

    strongly

    NT

    strongly

    T

    weakly

    T10

    weakly

    NT

    strongly

    NT

    strongly

    T

    weakly

    T11

    Prediction:

    TRUEResult: FALSE

    Pattern history table

    Branch

    sequence:

  • 8/3/2019 [2011!03!05] Branch Prediction

    10/1810

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    Local predictorLocal prediction

    Local branch predictor has a separatehistory buffer for each conditional jumpinstruction.

    Pattern history table may be separate aswell or it may be shared between allconditional jumps.

    Accuracy of the local predictor is equalabout 97.1%.

  • 8/3/2019 [2011!03!05] Branch Prediction

    11/1811

    Global predictor Global prediction

    Global branch keeps a shared history of all conditional jumps: inthis case any correlation between different conditional jumps isutilized in the prediction making, but when there is nocorrelation the prediction rate can decrease.

    This predictor must have a long history buffer in order to work.The size of the pattern history table grows exponentially withthe size of the history buffer. Therefore, the big pattern historytable is shared between all conditional jumps.

    Accuracy of the global branch predictor is equal about 96.6%,

    which is just a little worse than large local predictors.

    Real branch predictors are designed as a combination of both types.MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    12/1812

    MDSP project, Intel Lab,Moscow Institute of Physics and Technology

    Branch target prediction The address of a dynamic branch isnt known statically in

    the moment of compilation. It is calculated in the momentof execution of the program.

    A branch target predictor is the part of a processor that

    predicts the target address of a taken dynamicconditional branch or an dynamic unconditional branchinstruction before the target address of the branchinstruction is computed by the execution unit of the

    processor.

  • 8/3/2019 [2011!03!05] Branch Prediction

    13/1813

    And now to something completely

    different!

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    14/1814

    Branch predication Convenient machine code with jumps:

    branch if condition to label1

    do_that

    branch to label2

    label1:

    do_this

    label2:...

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    15/1815

    Branch predication (contd.) Predicated code:

    (condition) do_this

    (not condition) do_that

    With branch predication, all possible branch paths areexecuted, the correct path is kept and all others arethrown away.

    Pros: Can be faster if conditional blocks are short enough. Cons: encoding space; unbalanced paths

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    16/1816

    Architectures with branch predication European designs of 1950s: Mailfterl (1955), the Zuse Z22

    (1955), the ZEBRA (1958), and the Electrologica X1 (1958) Intel Itanium almost every instruction is predicated. Non-

    predicated ones have always true predicate

    ARM ISA almost all instructions are predicated (13

    predicates). Even Intel IA-32 has some sort of predicated instructions:CMOVxx (conditional move)

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

  • 8/3/2019 [2011!03!05] Branch Prediction

    17/1817

    Bibliography

    1. http://www.cs.virginia.edu/~skadron/cs654/slid

    es/bpred.ppt2. Daniel A. Jimenez. Fast Path-Based Neural

    Branch Prediction.http://cava.cs.utsa.edu/pdfs/micro03_dist.pdf

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology

    http://www.cs.virginia.edu/~skadron/cs654/slides/bpred.ppthttp://www.cs.virginia.edu/~skadron/cs654/slides/bpred.ppthttp://cava.cs.utsa.edu/pdfs/micro03_dist.pdfhttp://cava.cs.utsa.edu/pdfs/micro03_dist.pdfhttp://www.cs.virginia.edu/~skadron/cs654/slides/bpred.ppthttp://www.cs.virginia.edu/~skadron/cs654/slides/bpred.ppthttp://www.cs.virginia.edu/~skadron/cs654/slides/bpred.ppt
  • 8/3/2019 [2011!03!05] Branch Prediction

    18/1818

    Here be Intel logo

    MDSP project, Intel Lab,

    Moscow Institute of Physics and Technology