2.15 - compiling c and interpreting java

Upload: prakashcv13

Post on 03-Jun-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    1/26

    Advanced Material: Compiling C and

    Interpreting Java

    Tis section gives a brie overview o how the C compiler works and how Javis executed. Because the compiler will signicantly affect the perormance o acomputer, understanding compiler technology today is critical to understandingperormance. Keep in mind that the subject o compiler construction is usuallytaught in a one- or two-semester course, so our introduction will necessarily onlytouch on the basics.

    Te second part o this section, starting on page 2.15-15, is or readers interestedin seeing how an objected-oriented language like Java executes on the MIPSarchitecture. It shows the Java bytecodes used or interpretation and the MIPS code

    or the Java version o some o the C segments in prior sections, including BubblSort. It covers both the Java virtual machine and just-in-time (JI) compilers.

    Compiling C

    Tis rst part o the section introduces the internal anatomy o a compiler. ostart, Figure 2.15.1 shows the structure o recent compilers, and we describe thoptimizations in the order o the passes o that structure.

    Dependencies

    Language dependent;

    machine independent

    Somewhat language dependent;largely machine independent

    Small language dependencies;machine dependencies slight(e.g., register counts/types)

    Highly machine dependent;language independent

    Front end per

    language

    Function

    Transform language to

    common intermediate form

    For example, looptransformations andprocedure inlining(also calledprocedure integration)

    Including global and localoptimizations registerallocation

    Detailed instruction selectioand machine-dependentoptimizations; may includeor be followed by assemble

    High-leveloptimizations

    Globaloptimizer

    Code generator

    Intermediate

    representation

    FIGURE 2.15.1 The structure of a modern optimizing compiler consists of a number opasses or phases.Logically, each pass can be thought o as running to completion beore the next occursIn practice, some passes may handle one procedure at a time, essentially interleaving with another pass.

    5.92.15

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    2/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2

    o illustrate the concepts in this part o this section, we will use the C version oa whileloop rom page 92:

    while (save[i] == k)

    i += 1;

    The Front End

    Te unction o the ront end is to read in a source program; check the syntaxand semantics; and translate the source program to an intermediate orm thatinterprets most o the language-specic operation o the program. As we will see,intermediate orms are usually simple, and some are in act similar to the Javabytecodes (see Figure 2.15.8).

    Te ront end is usually broken into our separate unctions:

    1. Scanning reads in individual characters and creates a string o tokens.

    Examples o tokensare reserved words, names, operators, and punctuationsymbols. In the above example, the token sequence is while,(,save,[,i,],==,k,),i,+=,1. A word like whileis recognized as a reservedword in C, but save,i,and jare recognized as names, and 1is recognizedas a number.

    2. Parsingtakes the token stream, ensures the syntax is correct, and producesan abstract syntax tree, which is a representation o the syntactic structure othe program. Figure 2.15.2 shows what the abstract syntax tree might looklike or this program ragment.

    3. Semantic analysistakes the abstract syntax tree and checks the program orsemantic correctness. Semantic checks normally ensure that variables andtypes are properly declared and that the types o operators and objects match,a step called type checking. During this process, a symbol table representingall the named objectsclasses, variables, and unctionsis usually createdand used to type-check the program.

    4. Generation of the intermediate representation (IR) takes the symbol table andthe abstract syntax tree and generates the intermediate representation that isthe output o the ront end. Intermediate representations usually use simpleoperations on a small set o primitive types, such as integers, characters, andreals. Java bytecodes represent one type o intermediate orm. In moderncompilers, the most common intermediate orm looks much like the MIPS

    instruction set but with an innite number o virtual registers; later, we describehow to map these virtual registers to a nite set o real registers. Figure 2.15.3shows how our example might be represented in such an intermediate orm. Wecapitalize the MIPS instructions in this section when they represent IR orms.

    Te intermediate orm species the unctionality o the program in a mannerindependent o the original source. Afer this ront end has created the intermediateorm, the remaining passes are largely language independent.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    3/26

    2.15-4 2.15 Advanced Material: Compiling C and Interpreting Java

    while statement

    while ydobtnemetatsnoitidnoc

    expression assignment

    comparison left-hand side expression

    identifier factor

    l number

    1

    kyarra expression

    expression expression

    factor factor

    array access identifier

    identifier factor

    save identifier

    i

    FIGURE 2.15.2 An abstract syntax tree for thewhileexample.Te roots o the tree consist othe inormational tokens such as numbers and names. Long chains o straight-line descendents are ofenomitted in constructing the tree.

    High-Level OptimizationsHigh-level optimizations are transormations that are done at something close tothe source level.

    Te most common high-level transormation is probably procedure inliningwhich replaces a call to a unction by the body o the unction, substituting thcallers arguments or the procedures parameters. Other high-level optimizationinvolve loop transormations that can reduce loop overhead, improve memoryaccess, and exploit the hardware more effectively. For example, in loops thaexecute many iterations, such as those traditionally controlled by aforstatementthe optimization o loop-unrollingis ofen useul. Loop-unrolling involves takinga loop, replicating the body multiple times, and executing the transormed loopewer times. Loop-unrolling reduces the loop overhead and provides opportunitieor many other optimizations. Other types o high-level transormations includ

    loop-unrollingA technique to get moreperormance rom loopsthat access arrays, inwhich multiple copies othe loop body are made

    and instructions romdifferent iterations arescheduled together.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    4/26

    # comments are written like this--source code often included # while (save[i] == k)loop: LI R1,save # loads the starting address of save into R1 LW R2,i

    MULT R3,R2,4 # Multiply R2 by 4 ADD R4,R3,R1 LW R5,0(R4) # load save[i] LW R6,k BNE R5,R6,endwhileloop

    # i += 1

    LW R6, i ADD R7,R6,1 # increment SW R7,i

    branch loop # next iteration

    endwhileloop:FIGURE 2.15.3 The while loop example is shown using a typical intermediaterepresentation.In practice, the names save, i, and kwould be replaced by some sort o address, suchas a reerence to either the local stack pointer or a global pointer, and an offset, similar to the way save[i]is accessed. Note that the ormat o the MIPS instructions is different, because they are intermediaterepresentations here: the operations are capitalized and the registers use RXXnotation.

    2.15 Advanced Material: Compiling C and Interpreting Java 2

    sophisticated loop transormations such as interchanging nested loops andblocking loops to obtain better memory behavior; see Chapter 5 or examples.

    Local and Global OptimizationsWithin the pass dedicated to local and global optimization, three classes ooptimizations are perormed:

    1. Local optimizationworks within a single basic block. A local optimizationpass is ofen run as a precursor and successor to global optimization toclean up the code beore and afer global optimization.

    2. Global optimization works across multiple basic blocks; we will see anexample o this shortly.

    3. Global register allocationallocates variables to registers or regions o thecode. Register allocation is crucial to getting good perormance in modernprocessors.

    Several optimizations are perormed both locally and globally, including commonsubexpression elimination, constant propagation, copy propagation, dead storeelimination, and strength reduction. Lets look at some simple examples o theseoptimizations.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    5/26

    2.15-6 2.15 Advanced Material: Compiling C and Interpreting Java

    Common subexpression elimination nds multiple instances of the sameexpression and replaces the second one by a reference to the rst. Consider,for example, a code segment to add 4 to an array element:

    x[i] = x[i] + 4Te address calculation or x[i] occurs twice and is identical since neither thestarting address o xnor the value o ichanges. Tus, the calculation can be reusedLets look at the intermediate code or this ragment, since it allows several otheoptimizations to be perormed. Te unoptimized intermediate code is on the lef. Onthe right is the optimized code, using common subexpression elimination to replacthe second address calculation with the rst. Note that the register allocation hanot yet occurred, so the compiler is using virtual register numbers like R100here.

    # x[i] + 4 # x[i] + 4li R100,x li R100,x

    lw R101,i lw R101,imult R102,R101,4 mult R102,R101,4add R103,R100,R102 add R103,R100,R102lw R104,0(R103) lw R104,0(R103)add R105,R104,4 # value of x[i] is in R104# x[i] = li R106,x add R105,R104,4lw R107,i # x[i] =mult R108,R107,4 sw R105,0(R103)add R109,R106,R107sw R105,0(R109)

    I the same optimization were possible across two basic blocks, it would then be aninstance oglobal common subexpression elimination.

    Lets consider some o the other optimizations:

    Strength reductionreplaces complex operations by simpler ones and can beapplied to this code segment, replacing the MUL by a shif lef.

    Constant propagationand its sibling constant foldingnd constants in codeand propagate them, collapsing constant values whenever possible.

    Copy propagation propagates values that are simple copies, eliminating thneed to reload values and possibly enabling other optimizations, such acommon subexpression elimination.

    Dead store elimination nds stores to values that are not used again andeliminates the store; its cousin is dead code elimination, which nds unusedcodecode that cannot affect the result o the programand eliminates itWith the heavy use o macros, templates, and the similar techniques designedto reuse code in high-level languages, dead code occurs surprisingly ofen.

    Compilers must be conservative. Te rst task o a compiler is to produce correccode; its second task is usually to produce ast code, although other actors, such a

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    6/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2

    code size, may sometimes be important as well. Code that is ast but incorrectorany possible combination o inputsis simply wrong. Tus, when we say a compileris conservative, we mean that it perorms an optimization only i it knows with100% certainty that, no matter what the inputs, the code will perorm as the user

    wrote it. Since most compilers translate and optimize one unction or procedureat a time, most compilers, especially at lower optimization levels, assume the worstabout unction calls and about their own parameters.

    Programmers concerned about perormance o critical loops, especially in real-timeor embedded applications, ofen nd themselves staring at the assembly languageproduced by a compiler and wondering why the compiler ailed to perorm someglobal optimization or to allocate a variable to a register throughout a loop. Teanswer ofen lies in the dictate that the compiler be conservative. Te opportunity or

    improving the code may seem obvious to the programmer, but then the programmerofen has knowledge that the compiler does not have, such as the absence o aliasingbetween two pointers or the absence o side effects by a unction call. Te compilermay indeed be able to perorm the transormation with a little help, which couldeliminate the worst-case behavior that it must assume. Tis insight also illustratesan important observation: programmers who use pointers to try to improveperormance in accessing variables, especially pointers to values on the stack that alsohave names as variables or as elements o arrays, are likely to disable many compileroptimizations. Te end result is that the lower-level pointer code may run no better,or perhaps even worse, than the higher-level code optimized by the compiler.

    Global Code Optimizations

    Many global code optimizations have the same aims as those used in the localcase, including common subexpression elimination, constant propagation, copypropagation, and dead store and dead code elimination.

    Tere are two other important global optimizations: code motion and inductionvariable elimination. Both are loop optimizations; that is, they are aimed at codein loops. Code motion nds code that is loop invariant: a particular piece ocode computes the same value on every iteration o the loop and, hence, may becomputed once outside the loop. Induction variable eliminationis a combination otransormations that reduce overhead on indexing arrays, essentially replacing arrayindexing with pointer accesses. Rather than examine induction variable eliminationin depth, we point the reader to Section 2.14, which compares the use o arrayindexing and pointers; or most loops, the transormation rom the more obviousarray code to the pointer code can be perormed by a modern optimizing compiler.

    Understand

    Program

    Performanc

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    7/26

    2.15-8 2.15 Advanced Material: Compiling C and Interpreting Java

    Implementing Local Optimizations

    Local optimizations are implemented on basic blocks by scanning the basic blockin instruction execution order, looking or optimization opportunities. In theassignment statement example on page 2.15-6, the duplication o the entire addres

    calculation is recognized by a series o sequential passes over the code. Here is howthe process might proceed, including a description o the checks that are needed:

    1. Determine that the two lioperations return the same result by observingthat the operand xis the same and that the value o its address has not beenchanged between the two lioperations.

    2. Replace all uses o R106in the basic block by R101.

    3. Observe that i cannot change between the two LWs that reerence it. Soreplace all uses o R107with R101.

    4. Observe that the multinstructions now have the same input operands, so

    that R108may be replaced by R102.5. Observe that now the two add instructions have identical input operand

    (R100and R102), so replace the R109with R103.

    6. Use dead store code elimination to delete the second set o li, lw, multand addinstructions since their results are unused.

    Troughout this process, we need to know when two instances o an operand havethe same value. Tis is easy to determine when they reer to virtual registers, sincour intermediate representation uses such registers only once, but the problem canbe trickier when the operands are variables in memory, even though we are only

    considering reerences within a basic block.It is reasonably easy or the compiler to make the common subexpressionelimination determination in a conservative ashion in this case; as we will see inthe next subsection, this is more diffi cult when branches intervene.

    Implementing Global Optimizations

    o understand the challenge o implementing global optimizations, lets considea ew examples:

    Consider the case o an opportunity or common subexpression eliminationsay, o an IR statement like ADD Rx, R20, R50. o determine whether two

    such statements compute the same value, we must determine whether thevalues o R20and R50are identical in the two statements. In practice, thimeans that the values o R20 and R50 have not changed between the rsstatement and the second. For a single basic block, this is easy to decide; it imore diffi cult or a more complex program structure involving multiple basicblocks and branches.

    Consider the second LWo iinto R107within the earlier example: how dowe know whether its value is used again? I we consider only a single basic

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    8/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2

    block, and we know that all uses o R107are within that block, it is easy tosee. As optimization proceeds, however, common subexpression eliminationand copy propagation may create other uses o a value. Determining that a

    value is unused and the code is dead is more diffi cult in the case o multiple

    basic blocks.

    Finally, consider the load o k in our loop, which is a candidate or codemotion. In this simple example, we might argue it is easy to see that k isnot changed in the loop and is, hence, loop invariant. Imagine, however, amore complex loop with multiple nestings and ifstatements within the body.Determining that the load o kis loop invariant is harder in such a case.

    Te inormation we need to perorm these global optimizations is similar: weneed to know where each operand in an IR statement could have been changed ordened(use-denition inormation). Te dual o this inormation is also needed:that is, nding all the uses o that changed operand (denition-use inormation).

    Data ow analysisobtains both types o inormation.Global optimizations and data ow analysis operate on acontrol ow graph, wherethe nodes represent basic blocks and the arcs represent control ow between basicblocks. Figure 2.15.4 shows the control ow graph or our simple loop example,with one important transormation introduced. We describe the transormation inthe caption, but see i you can discover it, and why it was done, on your own!

    8. LW R6,i

    9. ADD R7,R6,110. SW R7,i

    1. LI R1,save2. LW R2,i3. SLL R3,R2,24. ADD R4,R3,R15. LW R5,0(R4)6. LW R6,k7. BEQ R5,R6,startwhileloop

    FIGURE 2.15.4 A control flow graph for thewhileloop example.Each node represents a basicblock, which terminates with a branch or by sequential all-through into another basic block that is alsothe target o a branch. Te IR statements have been numbered or ease in reerring to them. Te importanttransormation perormed was to move the whiletest and conditional branch to the end. Tis eliminates theunconditional branch that was ormerly inside the loop and places it beore the loop. Tis transormationis so important that many compilers do it during the generation o the IR. Te MULTwas also replaced with(strength-reduced to) an SLL.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    9/26

    2.15-10 2.15 Advanced Material: Compiling C and Interpreting Java

    Suppose we have computed the use-denition inormation or the controow graph in Figure 2.15.4. How does this inormation allow us to perorm codemotion? Consider IR statements number 1 and 6: in both cases, the use-denitioninormation tells us that there are no denitions (changes) o the operands o these

    statements within the loop. Tus, these IR statements can be moved outside theloop. Notice that i the LIo saveand the LWo kare executed once, just prioto the loop entrance, the computational effect is the same, but the program nowruns aster since these two statements are outside the loop. In contrast, considerIR statement 2, which loads the value o i. Te denitions o i that affect thistatement are both outside the loop, where iis initially dened, and inside the loopin statement 10 where it is stored. Hence, this statement is not loop invariant.

    Figure 2.15.5 shows the code afer perorming both code motion and inductionvariable elimination, which simplies the address calculation. Te variable icanstill be register allocated, eliminating the need to load and store it every time, andwe will see how this is done in the next subsection.

    Beore we turn to register allocation, we need to mention a caveat that alsoillustrates the complexity and diffi culty o optimizers. Remember that the compilemust be conservative. o be conservative, a compiler must consider the ollowingquestion: Is there any way that the variable kcould possibly ever change in thiloop? Unortunately, there is one way. Suppose that the variable kand the variabliactually reer to the same memory location, which could happen i they wereaccessed by pointers or reerence parameters.

    LW R2,iADD R7,R2,1ADD R4,R4,4SW R7,i

    LI R1,save

    LW R6,kLW R2,iSLL R3,R2,2

    ADD R4,R3,R1

    LW R5,0(R4)BEQ R5,R6,startwhileloop

    FIGURE 2.15.5 The control flow graph showing the representation of the while looexample after code motion and induction variable elimination. Te number o instructions ithe inner loop has been reduced rom 10 to 6.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    10/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    I am sure that many readers are saying, Well, that would certainly be a stupidpiece o code! Alas, this response is not open to the compiler, which musttranslate the code as it is written. Recall too that the aliasing inormation mustalso be conservative; thus, compilers ofen nd themselves negating optimization

    opportunities because o a possible alias that exists in one place in the code orbecause o incomplete inormation about aliasing.

    Register Allocation

    Register allocation is perhaps the most important optimization or modernload-store architectures. Eliminating a load or a store eliminates an instruction.Furthermore, register allocation enhances the value o other optimizations, such ascommon subexpression elimination. Fortunately, the trend toward larger registercounts in modern architectures has made register allocation simpler and moreeffective. Register allocation is done on both a local basis and a global basis, that is,across multiple basic blocks but within a single unction. Local register allocationis usually done late in compilation, as the nal code is generated. Our ocus here ison the more challenging and more opportunistic global register allocation.

    Modern global register allocation uses a region-based approach, where aregion (sometimes called a live range) represents a section o code during whicha particular variable could be allocated to a particular register. How is a regionselected? Te process is iterative:

    1. Choose a denition (change) o a variable in a given basic block; add thatblock to the region.

    2. Find any uses o that denition, which is a data ow analysis problem; addany basic blocks that contain such uses, as well as any basic block that the

    value passes through to reach a use, to the region.

    3. Find any other denitions that also can affect a use ound in the previousstep and add the basic blocks containing those denitions, as well as theblocks the denitions pass through to reach a use, to the region.

    4. Repeat steps 2 and 3 using the denitions discovered in step 3 untilconvergence.

    Te set o basic blocks ound by this technique has a special property: i thedesignated variable is allocated to a register in all these basic blocks, then there isno need or loading and storing the variable.

    Modern global register allocators start by constructing the regions or everyvirtual register in a unction. Once the regions are constructed, the key questionis how to allocate a register to each region: the challenge is that certain regionsoverlap and may not use the same register. Regions that do not overlap (i.e., shareno common basic blocks) can share the same register. One way to represent

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    11/26

    2.15-12 2.15 Advanced Material: Compiling C and Interpreting Java

    the intererence among regions is with an interference graph, where each nodrepresents a region, and the arcs between nodes represent that the regions havesome basic blocks in common.

    Once an intererence graph has been constructed, the problem o allocating

    registers is equivalent to a amous problem called graph coloring:nd a color oeach node in a graph such that no two adjacent nodes have the same color. I thenumber o colors equals the number o registers, then coloring an intererencgraph is equivalent to allocating a register or each region! Tis insight was theinitial motivation or the allocation method now known as region-based allocationbut originally called the graph-coloring approach. Figure 2.15.6 shows the owgraph representation o the whileloop example afer register allocation.

    What happens i the graph cannot be colored using the number o registeravailable? Te allocator must spill registers until it can complete the coloring. Bydoing the coloring based on a priority unction that takes into account the numbeo memory reerences saved and the cost o tying up the register, the allocator

    attempts to avoid spilling or the most important candidates.Spilling is equivalent to splitting up a region (or live range); i the region is splitewer other regions will interere with the two separate nodes representing theoriginal region. A process o splitting regions and successive coloring is used toallow the allocation process to complete, at which point all candidates will havebeen allocated a register. O course, whenever a region is split, loads and stores

    ADD $t2,$t2,1ADD $t4,$t4,4

    LI $t0,saveLW $t1,kLW $t2,i

    SLL $t3,$t2,2ADDU $t4,$t3,$t0

    LW $t3,0($t4)BEQ $t3,$t1,startwhileloop

    FIGURE 2.15.6 The control flow graph showing the representation of the while looexample after code motion and induction variable elimination and register allocation

    using the MIPS register names.Te number o IR statements in the inner loop has now dropped tonly our rom six beore register allocation and ten beore any global optimizations. Te value o iresidein $t2at the end o the loop and may need to be stored eventually to maintain the program semantics. Iwere unused afer the loop, not only could the store be avoided, but also the increment inside the loop couldbe eliminated completely!

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    12/26

    Hardware/

    Software

    Interface

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    must be introduced to get the value rom memory or to store it there. Te locationchosen to split a region must balance the cost o the loads and stores that must beintroduced against the advantage o reeing up a register and reducing the numbero intererences.

    Modern register allocators are incredibly effective in using the large registercounts available in modern processors. In many programs, the effectiveness oregister allocation is limited not by the availability o registers but by the possibilitieso aliasing that cause the compiler to be conservative in its choice o candidates.

    Code Generation

    Te nal steps o the compiler are code generation and assembly. Most compilersdo not use a stand-alone assembler that accepts assembly language source code;to save time, they instead perorm most o the same unctions: lling in symbolic

    values and generating the binary code as the nal stage o code generation.In modern processors, code generation is reasonably straightorward, since the

    simple architectures make the choice o instruction relatively obvious. For morecomplex architectures, such as the x86, code generation is more complex sincemultiple IR instructions may collapse into a single machine instruction. In moderncompilers, this compilation process uses pattern matching with either a tree-basedpattern matcher or a pattern matcher driven by a parser.

    During code generation, the nal stages o machine-dependent optimizationare also perormed. Tese include some constant olding optimizations, as well aslocalized instruction scheduling (see Chapter 4).

    Optimization Summary

    Figure 2.15.7 gives examples o typical optimizations, and the last column indicateswhere the optimization is perormed in the gcc compiler. It is sometimes diffi cultto separate some o the simpler optimizationslocal and processor-dependentoptimizationsrom transormations done in the code generator, and someoptimizations are done multiple times, especially local optimizations, which may beperormed beore and afer global optimization as well as during code generation.

    oday, essentially all programming or desktop and server applications is donein high-level languages, as is most programming or embedded applications.Tis development means that since most instructions executed are the output

    o a compiler, an instruction set architecture is essentially a compiler target.With Moores Law comes the temptation o adding sophisticated operationsin an instruction set. Te challenge is that they may not exactly match what thecompiler needs to produce or may be so general that they arent ast. For example,consider special loop instructions ound in some computers. Suppose that instead

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    13/26

    2.15-14 2.15 Advanced Material: Compiling C and Interpreting Java

    o decrementing by one, the compiler wanted to increment by our, or insteado branching on not equal zero, the compiler wanted to branch i the index wasless than or equal to the limit. Te loop instruction may be a mismatch. Whenaced with such objections, the instruction set designer might then generalize the

    operation, adding another operand to speciy the increment and perhaps an optionon which branch condition to use. Ten the danger is that a common case, sayincrementing by one, will be slower than a sequence o simple operations.

    Elaboration: Some more sophisticated compilers, and many research compilers, usean analysis technique called interprocedural analysisto obtain more information abou

    functions and how they are called. Interprocedural analysis attempts to discover wha

    properties remain true across a function call. For example, we might discover that a

    function call can never change any global variables, which might be useful in optimizing

    a loop that calls such a function. Such information is called may-informationor flow

    insensitive informationand can be obtained reasonably efciently, although analyzing

    levelccgnoitanalpxEemannoitazimitpO

    ednirossecorp;levelecruosehtraenrotAlevelhgiH pendent

    3OydoberudecorpybllacerudecorpecalpeRnoitargetnierudecorP

    edocenil-thgiartsnihtiWlacoL

    Common subexpression elimination Replace two instances of the same computa 1Oypocelgnisybnoit

    Constant propagation Replace all instances of a variable that is assigned a constant with theconstant

    O1

    Stack height reduction Rearrange expression tree to minimize resources needed for expression evaluation O1

    hcnarbassorcAlabolG

    Global common subexpressionelimination

    2Osehcnarbsessorcnoisrevsihttub,lacolsaemaS

    elbairavafosecnatsnillaecalpeRnoitagaporpypoC Athat has been assigned X(i.e., A = X) with X O2

    2OpoolehtfonoitaretihcaeeulavemassetupmoctahtpoolamorfedocevomeRnoitomedoC

    Induction variable elimination Simplify/eliminate array addressing calcula 2Ospoolnihtiwsnoit

    Processor dependent Depends on processor knowledge

    Strength reduction Many examples; replace multiply by a con 1Ostfihshtiwtnats

    Pipeline scheduling Reorder instructions to improve pipeline per 1Oecnamrof

    1OtegratsehcaertahttnemecalpsidhcnarbtsetrohsehtesoohCnoitazimitpotesffohcnarB

    FIGURE 2.15.7 Major types of optimizations and explanation of each class.Te third column shows when these occurat different levels o optimization in gcc. Te GNU organization calls the three optimization levels medium (O1), ull (O2), and ull withintegration o small procedures (O3).

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    14/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    a call to a function F requires analyzing all the functions that F calls, which makes

    the process somewhat time consuming for large programs. A more costly property to

    discover is that a function mustalways change some variable; such information is called

    must-information or flow-sensitive information. Recall the dictate to be conservative:

    may-information can never be used as must-informationjust because a function may

    change a variable does not mean that it mustchange it. It is conservative, however, to

    use the negation of may-information, so the compiler can rely on the fact that a function

    willnever change a variable in optimizations around the call site of that function.

    One o the most important uses o interprocedural analysis is to obtain so-called alias inormation. An aliasoccurs when two names may designate the same

    variable. For example, it is quite helpul to know that two pointers passed to aunction may never designate the same variable. Alias inormation is usually ow-insensitive and must be used conservatively.

    Interpreting Java

    Tis second part o the section is or readers interested in seeing how an object-oriented language like Java executes on a MIPS architecture. It shows the Javabytecodes used or interpretation and the MIPS code or the Java version o someo the C segments in prior sections, including Bubble Sort.

    Lets quickly review the Java lingo to make sure we are all on the same page. Tebig idea o object-oriented programming is or programmers to think in terms oabstract objects, and operations are associated with each typeo object. New typescan ofen be thought o as renements to existing types, and so some operationsor the existing types are used by the new type without change. Te hope is thatthe programmer thinks at a higher level, and that code can be reused more readilyi the programmer implements the common operations on many different types.

    Tis different perspective led to a different set o terms. Te type o an objectis a class, which is the denition o a new data type together with the operationsthat are dened to work on that data type. A particular object is then an instanceo a class, and creating an object rom a class is called instantiation. Te operationsin a class are called methods, which are similar to C procedures. Rather than calla procedure as in C, you invokea method in Java. Te other members o a classare elds, which correspond to variables in C. Variables inside objects are calledinstance elds. Rather than access a structure with a pointer, Java uses an objectreferenceto access an object. Te syntax or method invocation is x.y,where xisan object reerence and y is the method name.

    Te parentchild relationship between older and newer classes is capturedby the verb extends: a child class extends (or sub classes) a parent class. Techild class typically will redene some o the methods ound in the parent to matchthe new data type. Some methods work ne, and the child class inherits thosemethods.

    o reduce the number o errors associated with pointers and explicit memorydeallocation, Java automatically rees unused storage, using a separate garbage

    object-orientedlanguageA programming langthat is oriented arouobjects rather thanactions, or data versulogic.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    15/26

    2.15-16 2.15 Advanced Material: Compiling C and Interpreting Java

    collector that rees memory when it is ull. Hence, newcreates a new instance o adynamic object on the heap, but there is no freein Java. Java also requires arraybounds to be checked at runtime to catch another class o errors that can occur inC programs.

    Interpretation

    As mentioned beore, Java programs are distributed as Java bytecodes, and the JavaVirtual Machine (JVM) executes Java byte codes. Te JVM understands a binaryormat called the class leormat. A class le is a stream o bytes or a single classcontaining a table o valid methods with their bytecodes, a pool o constants thaacts in part as a symbol table, and other inormation such as the parent class o thiclass.

    When the JVM is rst started, it looks or the class method main. o start anyJava class, the JVM dynamically loads, links, and initializes a class. Te JVM load

    a class by rst nding the binary representation o the proper class (class le) andthen creating a class rom that binary representation. Linking combines the clasinto the runtime state o the JVM so that it can be executed. Finally, it executes theclass initialization method that is included in every class.

    Figure 2.15.8 shows Java bytecodes and their corresponding MIPS instructionsillustrating ve major differences between the two:

    1. o simpliy compilation, Java uses a stack instead o registers or operandsOperands are pushed on the stack, operated on, and then popped off thestack.

    2. Te designers o the JVM were concerned about code size, so bytecode

    vary in length between one and ve bytes, versus the 4-byte, xed-sizMIPS instructions. o save space, the JVM even has redundant instructiono different lengths whose only difference is size o the immediate. Tidecision illustrates a code size variation o our third design principle: makthe common case small.

    3. Te JVM has saety eatures embedded in the architecture. For examplearray data transer instructions check to be sure that the rst operand is areerence and that the second index operand is within bounds.

    4. o allow garbage collectors to nd all live pointers, the JVM uses differeninstructions to operate on addresses versus integers so that the JVM canknow what operands contain addresses. MIPS generally lumps integers andaddresses together.

    5. Finally, unlike MIPS, there are Java-specic instructions that perorm complexoperations, like allocating an array on the heap or invoking a method.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    16/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    edocetybavaJnoitarepOyrogetaC

    Size

    (bits)

    MIPS

    instr. Meaning

    pop;SON+SOT=SONdda8ddaiddacitemhtirA

    pop;SONSOT=SONbus8busitcartbus

    b8I+]a8I[emarF=]a8I[emarFidda8b8Ia8IcniitnemercniData transfer load local integer/address iload I8/aload I8 16 lw TOS=Frame[I8]

    load local integer/address iload_/aload_{0,1,2,3} 8 lw TOS=Frame[{0,1,2,3}]

    store local integer/address istore I8/astore I8 16 sw Frame[I8]=TOS; pop

    load integer/address from array iaload/aaload 8 lw NOS=*NOS[TOS]; pop

    store integer/address into array iastore/aastore 8 sw *NNOS[NOS]=TOS; pop2

    pop;]SOT[SON*=SONhl8daolasyarramorfflahdaol

    2pop;SOT=]SON[SONN*hs8erotsasyarraotniflaherots

    pop;]SOT[SON*=SONbl8daolabyarramorfetybdaol

    2pop;SOT=]SON[SONN*bs8erotsabyarraotnietyberots

    load immediate bipush I8, sipush I16 16, 24 addi push; TOS=I8 or I16

    load immediate iconst_{1,0,1,2,3,4,5} 8 addi push; TOS={1,0,1,2,3,4,5}

    pop;SON&SOT=SONdna8dnaidnalacigoL

    pop;SON|SOT=SONro8roiro

    pop;SOTSON=SONlrs8rhsuithgirtfihs

    Conditionalbranch

    branch on equal if_icompeq I16 24 beq if TOS == NOS, go to I16; pop2

    branch on not equal if_icompne I16 24 bne if TOS != NOS, go to I16; pop2

    p;61Iotog,SON}=>,>,=

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    17/26

    2.15-18 2.15 Advanced Material: Compiling C and Interpreting Java

    Compiling a whileLoop in Java Using Bytecodes

    Compile the whileloop rom page 92, this time using Java bytecodes:

    while (save[i] == k)i += 1;

    Assume that i, k, and save are the rst three local variables. Show thaddresses o the bytecodes. Te MIPS version o the C loop in Figure2.15.3 took six instructions and twenty-our bytes. How big is the bytecode

    version?

    Te rst step is to put the array reerence in saveon the stack:

    0 aload_3 # Push local variable 3 (save[]) onto stack

    Tis 1-byte instruction inorms the JVM that an address in local variable 3 ibeing put on the stack. Te 0 on the lef o this instruction is the byte addreso this rst instruction; bytecodes or each method start at 0. Te next step ito put the index on the stack:

    1 iload_1 # Push local variable 1 (i) onto stack

    Like the prior instruction, this 1-byte instruction is a short version o a moregeneral instruction that takes 2 bytes to load a local variable onto the stack. Te

    next instruction is to get the value rom the array element:

    2 iaload # Put array element (save[i]) onto stack

    Tis 1-byte instruction checks the prior two operands, pops them off the stackand then puts the value o the desired array element onto the new top o thestack. Next, we place kon the stack:

    3 iload_2 # Push local variable 2 (k) onto stack

    We are now ready or the whiletest:

    4 if_icompne, Exit # Compare and exit if not equalTis 3-byte instruction compares the top two elements o the stack, pops themoff the stack, and branches i they are not equal. We are nally ready or thbody o the loop:

    7 iinc, 1, 1 # Increment local variable 1 by 1 (i+=1)

    EXAMPLE

    ANSWER

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    18/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    Tis unusual 3-byte instruction increments a local variable by 1 without usingthe operand stack, an optimization that again saves space. Finally, we return tothe top o the loop with a 3-byte jump:

    10 go to 0 # Go to top of Loop (byte address 0)Tus, the bytecode version takes seven instructions and thirteen bytes, almosthal the size o the MIPS C code. (As beore, we can optimize this code to jumpless.)

    Compiling for Java

    Since Java is derived rom C and Java has the same built-in types as C, the assignmentstatement examples in Sections 2.2 to 2.6 o Chapter 2 are the same in Java as theyare in C. Te same is true or the ifstatement example in Section 2.7.

    Te Java version o the while loop is different, however. Te designers o C

    leave it up to the programmers to be sure that their code does not exceed the arraybounds. Te designers o Java wanted to catch array bound bugs, and thus requirethe compiler to check or such violations. o check bounds, the compiler needs toknow what they are. Java includes an extra word in every array that holds the upperbound. Te lower bound is dened as 0.

    Compiling a whileLoop in Java

    Modiy the MIPS code or the while loop on page 94 to include the arraybounds checks that are required by Java. Assume that the length o the array islocated just beore the rst element o the array.

    Lets assume that Java arrays reserved the rst two words o arrays beore thedata starts. Well see the use o the rst word soon, but the second word has thearray length. Beore we enter the loop, lets load the length o the array into atemporary register:

    lw $t2,4($s6) # Temp reg $t2 = length of array save

    Beore we multiply iby 4, we must test to see i its less than 0 or greater thanthe last element o the array. Te rst step is to check i iis less than 0:

    Loop: slt $t0,$s3,$zero # Temp reg $t0 = 1 if i < 0

    Register $t0 is set to 1 i iis less than 0. Hence, a branch to see i register$t0 is not equal tozero will give us the effect o branching i i is less than0. Tis pair o instructions, sltand bne, implements branch on less than.

    EXAMPLE

    ANSWER

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    19/26

    2.15-20 2.15 Advanced Material: Compiling C and Interpreting Java

    Register $zeroalways contains 0, so this nal test is accomplished using thebneinstruction and comparing register $t0to register $zero:

    bne $t0,$zero,IndexOutOfBounds # if i= lengthbeq $t0,$zero,IndexOutOfBounds #if i>=length, gotoError

    Note that these two instructions implement branch on greater than or equal toTe next two lines o the MIPS whileloop are unchanged rom the C version:

    sll $t1,$s3,2 # Temp reg $t1 = 4 * iadd $t1,$t1,$s6 # $t1 = address of save[i]

    We need to account or the rst 8 bytes that are reserved in Java. We do that bychanging the address eld o the load rom 0 to 8:

    lw $t0,8($t1) # Temp reg $t0 = save[i]

    Te rest o the MIPS code rom the C whileloop is ne as is:

    bne $t0,$s5, Exit # go to Exit if save[i] ? k

    add $s3,$s3,1 # i = i + 1j Loop # go to LoopExit:

    (See the exercises or an optimization o this sequence.)

    Invoking Methods in Java

    Te compiler picks the appropriate method depending on the type o the object. In ew cases, it is unambiguous, and the method can be invoked with no more overheadthan a C procedure. In general, however, the compiler knows only that a given variablecontains a pointer to an object that belongs to some subtype o a general class. Sinc

    it doesnt know at compile time which subclass the object is, and thus which methodshould be invoked, the compiler will generate code that rst tests to be sure the pointeisnt null and then uses the code to load a pointer to a table with all the legal methodor that type. Te rst word o the object has the method table address, which is whyJava arrays reserve two words. Lets say its using the fh method that was declared othat class. (Te method order is the same or all subclasses.) Te compiler then takethe fh address rom that table and invokes the method at that address.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    20/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    public class sort{

    public staticvoid sort (int[] v) {

    for (int i = 0; i < v.length; i += 1) {

    for (int j = i - 1; j >= 0 && v[j] > v[j + 1]; j = 1) {

    swap(v, j);

    }

    }

    protected static void swap(int[] v, int k) { int temp = v[k];

    v[k] = v[k+1];

    v[k+1] = temp;

    }}

    FIGURE 2.15.9 An initial Java procedure that performs a sort on the array v. Changes romFigures 2.24 and 2.26 are highlighted.

    Te cost o object orientation in general is that method invocation includes 1) aconditional branch to be sure that the pointer to the object is valid; 2) a load to getthe address o the table o available methods; 3) another load to get the address othe proper method; 4) placing a return address into the return register, and nally

    5) a jump register to invoke the method. Te next subsection gives a concreteexample o method invocation.

    A Sort Example in Java

    Figure 2.15.9 shows the Java version o exchange sort. A simple difference is thatthere is no need to pass the length o the array as a separate parameter, since Javaarrays include their length: v.lengthdenotes the length o v.

    A more signicant difference is that Java methods are prepended with keywordsnot ound in the C procedures. Te sortmethod is declared public staticwhile swap is declared protected static. Public means that sort can beinvoked rom any other method, while protectedmeans swapcan only be called byother methods within the same packageand rom methods within derived classes.A static method is another name or a class methodmethods that perormclasswide operations and do not apply to an individual object. Static methods areessentially the same as C procedures.

    Tis straightorward translation rom C into static methods means there is noambiguity on method invocation, and so it can be just as effi cient as C. It also is limitedto sorting integers, which means a different sort has to be written or each data type.

    o demonstrate the object orientation o Java, Figure 2.15.10 shows thenew version with the changes highlighted. First, we declare v to be o the typeComparableand replace v[j] > v[j + 1]with an invocation o compareTo.By changing vto this new class, we can use this code to sortmany data types.

    public A Java keywthat allows a methodbe invoked by any otmethod.

    protected A Java keword that restrictsinvocation o a methto other methods in package.

    package Basically adirectory that contaigroup o related class

    static methodA method that applithe whole class rathean individual object.unrelated to static in

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    21/26

    2.15-22 2.15 Advanced Material: Compiling C and Interpreting Java

    public class sort {

    public static void sort (Comparable[] v) {

    for (int i = 0; i < v.length; i += 1) {

    for (int j = i 1; j >= 0 && v[j].compareTo(v[j + 1]);j = 1) {

    swap(v, j);

    }

    }

    protected static void swap(Comparable[] v, int k) {

    Comparabletemp = v[k];

    v[k] = v[k+1];

    v[k+1] = temp; }}

    public class Comparable {

    public int(compareTo (int x)

    { return value x; }

    public int value;

    }

    FIGURE 2.15.10 A revised Java procedure that sorts on the array v that can take on more types.Changes rom Figur2.15.9 are highlighted.

    Te methodcompareTocompares two elements and returns a value greater than0 i the parameter is larger than the object, 0 i it is equal, and a negative numberi it is smaller than the object. Tese two changes generalize the code so it cansort integers, characters, strings, and so on, i there are subclasses o Comparablewith each o these types and i there is a version o compareTo or each typeFor pedagogic purposes, we redene the class Comparable and the methodcompareTohere to compare integers. Te actual denition o Comparablein thJava library is considerably different.

    Starting rom the MIPS code that we generated or C, we show what changes wemade to create the MIPS code or Java.

    For swap, the only signicant differences are that we must check to be sure the

    object reerence is not null and that each array reerence is within bounds. Te rstest checks that the address in the rst parameter is not zero:

    swap: beq $a0,$zero,NullPointer #if $a0==0,goto Error

    Next, we load the length o v into a register and check that index kis OK.

    lw $t2,4($a0) # Temp reg $t2 = length of array vslt $t0,$a1,$zero # Temp reg $t0 = 1 if k < 0

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    22/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    bne $t0,$zero,IndexOutOfBounds # if k < 0, goto Errorslt $t0,$a1,$t2 # Temp reg $t0 = 0 if k >= lengthbeq $t0,$zero,IndexOutOfBounds #if k>=length,goto Error

    Tis check is ollowed by a check that k+1is within bounds.addi $t1,$a1,1 # Temp reg $t1 = k+1slt $t0,$t1,$zero # Temp reg $t0 = 1 if k+1 < 0bne $t0,$zero,IndexOutOfBounds # if k+1 < 0, goto Errorslt $t0,$t1,$t2 # Temp reg $t0 = 0 if k+1 >= lengthbeq $t0,$zero,IndexOutOfBounds #if k+1>=length,goto Error

    Figure 2.15.11 highlights the extra MIPS instructions in swap that a Javacompiler might produce. We again must adjust the offset in the load and store toaccount or two words reserved or the method table and length.

    Figure 2.15.12 shows the method body or those new instructions or sort. (We

    can take the saving, restoring, and return rom Figure 2.27.)Te rst test is again to make sure the pointer to vis not null:

    beq $a0,$zero,NullPointer #if $a0==0,goto Error

    Next, we load the length o the array (we use register $s3to keep it similar to thecode or the C version o swap):

    1w $s3,4($aO) #$s3 = length of array v

    Bounds check

    swap: beq $a0,$zero,NullPointer #if $a0==0,goto Error

    lw $t2,-4($a0) # Temp reg $t2 = length of array v slt $t0,$a1,$zero # Temp reg $t0 = 1 if k < 0 bne $t0,$zero,IndexOutOfBounds # if k < 0, goto Error slt $t0,$a1,$t2 # Temp reg $t0 = 0 if k >= length beq $t0,$zero,IndexOutOfBounds # if k >= length, goto Error addi $t1,$a1,1 # Temp reg $t1 = k+1 slt $t0,$t1,$zero # Temp reg $t0 = 1 if k+1 < 0 bne $t0,$zero,IndexOutOfBounds # if k+1 < 0, goto Error slt $t0,$t1,$t2 # Temp reg $t0 = 0 if k+1 >= length beq $t0,$zero,IndexOutOfBounds # if k+1 >= length, goto Error

    Method body

    sll $t1, $a1, 2 # reg $t1 = k * 4add $t1, $a0, $t1 # reg $t1 = v + (k * 4)

    ]k[vfosserddaehtsah1t$ger# lw $t0, 8($t1) # reg $t0 (temp) = v[k] lw $t2,12($t1) # reg $t2 = v[k + 1]

    vfotnemeletxenotsrefer# sw $t2, 8($t1) # v[k] = reg $t2 sw $t0, 12($t1) # v[k+1] = reg $t0 (temp)

    Procedure return

    jr $ra # return to calling routine

    FIGURE 2.15.11 MIPS assembly code of the procedure swapin Figure 2.24.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    23/26

    2.15-24 2.15 Advanced Material: Compiling C and Interpreting Java

    Now we must ensure that the index is within bounds. Since the rst test o the inneloop is to test i jis negative, we can skip that initial bound test. Tat leaves the tes

    or too big:slt $t0,$s1,$s3 # Temp reg $t0 = 0 if j >= lengthbeq $t0,$zero,IndexOutOfBounds #if j>=length, goto Error

    Method body

    Move parameters )0a$evas(2s$otni0a$retemarapypoc#0a$,2s$evom

    Test ptr null beq $a0,$zero,NullPointer # if $a0==0, goto Error

    Get array length vyarrafohtgnel=3s$#)0a$(4,3s$wl

    Outer loop0=i#orez$,0s$evom

    for1tst: slt $t0, $s0, $s3 # reg $t0 = 0 if $s0 $s3 (i n) beq $t0, $zero, exit1 # go to exit1 if $s0 $s3 (i n)

    Inner loop start addi $s1, $s0, 1 # j = i 1for2tst: slti $t0, $s1, 0 # reg $t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)

    Test if jtoo big slt $t0,$s1,$s3 # Temp reg $t0 = 0 if j >= length beq $t0,$zero,IndexOutOfBounds # if j >= length, goto Error

    Get v[j] sll $t1, $s1, 2 # reg $t1 = j * 4

    add $t2, $s2, $t1 # reg $t2 = v + (j * 4)lw $t3, 0($t2) # reg $t3 = v[j]

    Test if j+1 < 0or if j+1 too big

    1+j=1t$gerpmeT#1,1s$,1t$idda slt $t0,$t1,$zero # Temp reg $t0 = 1 if j+1 < 0 bne $t0,$zero,IndexOutOfBounds # if j+1 < 0, goto Error slt $t0,$t1,$s3 # Temp reg $t0 = 0 if j+1 >= length beq $t0,$zero,IndexOutOfBounds # if j+1 >= length, goto Error

    Get v[j+1] lw $t4, 4($t2) # reg $t4 = v[j + 1]

    Load method table elbatdohtemfosserdda=5t$#)0a$(0,5t$wl

    Get method addr fifosserdda=5t$#)5t$(8,5t$wl rst method

    Pass parameters ]j[vsioTerapmocforetemarapts1#3t$,0a$evom ]1+j[vsioTerapmocfo.marapdn2#4t$,1a$evom

    Set return addr sserddanruterdaol#1L,ar$al

    Call indirectly oTerapmocrofedocllac#5t$rj

    Test if should skipswap

    L1: slt $t0, $zero, $v0 # reg $t0 = 0 if 0 $v0beq $t0, $zero, exit2 # go to exit2 if $t4 $t3

    Pass parametersand call swap

    vsipawsforetemarapts1#2s$,0a$evomjsipawsforetemarapdn2#1s$,1a$evom

    43.2erugiFninwohsedocpaws#pawslaj

    Inner loop end addi $s1, $s1, 1 # j = 1 j for2tst poolrennifotsetotpmuj#

    Outer loop exit2: addi $s0, $s0, 1 # i += 1 j for1tst poolretuofotsetotpmuj#

    FIGURE 2.15.12 MIPS assembly version of the method body of the Java version of sort. Te new code is highlighted in thisgure. We must still add the code to save and restore registers and the return rom the MIPS code ound in Figure 2.27. o keep the code similarto that gure, we load v.lengthinto $s3instead o into a temporary register. o reduce the number o lines o code, we make the simpliyingassumption that compareTois a lea procedure and we do not need to push registers to be saved on the stack.

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    24/26

    2.15 Advanced Material: Compiling C and Interpreting Java 2.1

    Te code or testing j + 1is quite similar to the code or checking k + 1inswap,so we skip it here.

    Te key difference is the invocation o compareTo. We rst load the address othe table o legal methods, which we assume is two words beore the beginning o

    the array:

    lw $t5,0($a0) # $t5 = address of method table

    Given the address o the method table or this object, we then get the desiredmethod. Lets assume compareTois the third method in the Comparableclass. opick the address o the third method, we load that address into a temporary register:

    lw $t5,8($t5) # $t5 = address of third method

    We are now ready to call compareTo. Te next step is to save the necessaryregisters on the stack. Fortunately, we dont need the temporary registers or

    argument registers afer the method invocation, so there is nothing to save. Tus,we simply pass the parameters or compareTo:

    move $a0, $t3 # 1st parameter of compareTo is v[j]move $a1, $t4 # 2nd parameter of compareTo is v[j+1]

    Since we are using a jump register to invoke compareTo, we need to pass thereturn address explicitly. We use the pseudoinstruction load address (la) and labelwhere we want to return, and then do the indirect jump:

    la $ra,L1 # load return addressjr $t5 # to code for compareTo

    Te method returns, with $v0determining which o the two elements is larger.I $v0 > 0, then v[j] >v[j+1], and we need to swap. Tus, to skip the swap,we need to test i $v0 0, which is the same as 0 $v0. We also need to includethe label or the return address:

    L1: slt $t0, $zero, $v0 # reg $t0 = 0 if 0 $v0beq $t0, $zero, exit2 # go to exit2 if v[j+1] v[j]

    Te MIPS code or compareTois lef as an exercise.

    Te main changes or the Java versions o sortand swapare testing or null objectreerences and index out-o-bounds errors, and the extra method invocation togive a more general compare. Tis method invocation is more expensive than aC procedure call, since it requires a load, a conditional branch, a pair o chainedloads, and an indirect jump. As we see in Chapter 4, dependent loads and indirect

    jumps can be relatively slow on modern processors. Te increasing popularity

    Hardware/

    Software

    Interface

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    25/26

    2.15-26 2.15 Advanced Material: Compiling C and Interpreting Java

    o Java suggests that many programmers today are willing to leverage the highperormance o modern processors to pay or error checking and code reuse.

    Elaboration:Although we test each reference to jand j1 to be sure that theseindices are within bounds, an assembly language programmer might look at the code

    and reason as follows:

    1. Te innerforloop is only executed i j0 and sincej1 j, there is noneed to testj1 to see i it is less than 0.

    2. Since itakes on the values, 0, 1, 2, . . . , (data.length1) and sincejtakes onthe values i1, i2, . . . , 2, 1, 0, there is no need to test ijdata.lengthsince the largest valuejcan be is data.length2.

    3. Following the same reasoning, there is no need to test whetherj1data

    length since the largest value oj1 is data.length1.

    There are coding tricks in Chapter 2 and superscalar execution in Chapter 4 tha

    lower the effective cost of such bounds checking, but only high optimizing compilers

    can reason this way. Note that if the compiler inlined the swap method into sort, man

    checks would be unnecessary.

    Elaboration: Look carefully at the code for swap in Figure 2.15.11. See anythingwrong in the code, or at least in the explanation of how the code works? It implicitly

    assumes that each Comparableelement in vis 4 bytes long. Surely, you need much more

    than 4 bytes for a complex subclass of Comparable, which could contain any numbe

    of elds. Surprisingly, this code does work, because an important property of Java

    semantics forces the use of the same, small representation for all variables, elds, anarray elements that belong to Comparableor its subclasses.

    Java types are divided into primitive typesthe predened types for numbers

    characters, and Booleansand reference typesthe built-in classes like String

    user-dened classes, and arrays. Values of reference types are pointers (also called

    references) to anonymous objects that are themselves allocated in the heap. For the

    programmer, this means that assigning one variable to another does not create a new

    object, but instead makes both variables refer to the same object. Because these

    objects are anonymous and programs therefore have no way to refer to them directly

    a program must use indirection through a variable to read or write any objects elds

    (variables). Thus, because the data structure allocated for the array vconsists entirel

    of pointers, it is safe to assume they are all the same size, and the same swapping code

    works for all of Comparables subtypes.To write sorting and swapping functions for arrays of primitive types requires tha

    we write new versions of the functions, one for each type. This replication is for two

    reasons. First, primitive type values do not include the references to dispatching tables

    that we used on Comparablesto determine at runtime how to compare values. Second

    primitive values come in different sizes: 1, 2, 4, or 8 bytes.

    The pervasive use of pointers in Java is elegant in its consistency, with the penalty

    being a level of indirection and a requirement that objects be allocated on the heap

    Furthermore, in any language where the lifetimes of the heap-allocated anonymous

  • 8/13/2019 2.15 - Compiling c and Interpreting Java

    26/26

    2.21 Historical Perspective and Further Reading 2.1

    objects are independent of the lifetimes of the named variables, elds, and array

    elements that reference them, programmers must deal with the problem of deciding

    when it is safe to deallocate heap-allocated storage. Javas designers chose to use

    garbage collection. Of course, use of garbage collection rather than explicit user memory

    management also improves program safety.

    Cprovides an interesting contrast. Although programmers can write essentially

    the same pointer-manipulating solution in C, there is another option. In C,

    programmers can elect to forgo the level of indirection and directly manipulate an array

    of objects, rather than an array of pointers to those objects. To do so, Cprogrammers

    would typically use the template capability, which allows a class or function to be

    parameterized by the typeof data on which it acts. Templates, however, are compiled

    using the equivalent of macro expansion. That is, if we declared an instance of sort

    capable of sorting types X and Y, Cwould create two copies of the code for the class:

    one for sortX and one for sortY, each specialized accordingly. This solution

    increases code size in exchange for making comparison faster (since the function calls

    would not be indirect, and might even be subject to inline expansion). Of course, the

    speed advantage would be canceled if swapping the objects required moving large

    amounts of data instead of just single pointers. As always, the best design depends on

    the details of the problem.