2.15 - compiling c and interpreting java
TRANSCRIPT
-
8/13/2019 2.15 - Compiling c and Interpreting Java
1/26
Advanced Material: Compiling C and
Interpreting Java
Tis section gives a brie overview o how the C compiler works and how Javis executed. Because the compiler will signicantly affect the perormance o acomputer, understanding compiler technology today is critical to understandingperormance. Keep in mind that the subject o compiler construction is usuallytaught in a one- or two-semester course, so our introduction will necessarily onlytouch on the basics.
Te second part o this section, starting on page 2.15-15, is or readers interestedin seeing how an objected-oriented language like Java executes on the MIPSarchitecture. It shows the Java bytecodes used or interpretation and the MIPS code
or the Java version o some o the C segments in prior sections, including BubblSort. It covers both the Java virtual machine and just-in-time (JI) compilers.
Compiling C
Tis rst part o the section introduces the internal anatomy o a compiler. ostart, Figure 2.15.1 shows the structure o recent compilers, and we describe thoptimizations in the order o the passes o that structure.
Dependencies
Language dependent;
machine independent
Somewhat language dependent;largely machine independent
Small language dependencies;machine dependencies slight(e.g., register counts/types)
Highly machine dependent;language independent
Front end per
language
Function
Transform language to
common intermediate form
For example, looptransformations andprocedure inlining(also calledprocedure integration)
Including global and localoptimizations registerallocation
Detailed instruction selectioand machine-dependentoptimizations; may includeor be followed by assemble
High-leveloptimizations
Globaloptimizer
Code generator
Intermediate
representation
FIGURE 2.15.1 The structure of a modern optimizing compiler consists of a number opasses or phases.Logically, each pass can be thought o as running to completion beore the next occursIn practice, some passes may handle one procedure at a time, essentially interleaving with another pass.
5.92.15
-
8/13/2019 2.15 - Compiling c and Interpreting Java
2/26
2.15 Advanced Material: Compiling C and Interpreting Java 2
o illustrate the concepts in this part o this section, we will use the C version oa whileloop rom page 92:
while (save[i] == k)
i += 1;
The Front End
Te unction o the ront end is to read in a source program; check the syntaxand semantics; and translate the source program to an intermediate orm thatinterprets most o the language-specic operation o the program. As we will see,intermediate orms are usually simple, and some are in act similar to the Javabytecodes (see Figure 2.15.8).
Te ront end is usually broken into our separate unctions:
1. Scanning reads in individual characters and creates a string o tokens.
Examples o tokensare reserved words, names, operators, and punctuationsymbols. In the above example, the token sequence is while,(,save,[,i,],==,k,),i,+=,1. A word like whileis recognized as a reservedword in C, but save,i,and jare recognized as names, and 1is recognizedas a number.
2. Parsingtakes the token stream, ensures the syntax is correct, and producesan abstract syntax tree, which is a representation o the syntactic structure othe program. Figure 2.15.2 shows what the abstract syntax tree might looklike or this program ragment.
3. Semantic analysistakes the abstract syntax tree and checks the program orsemantic correctness. Semantic checks normally ensure that variables andtypes are properly declared and that the types o operators and objects match,a step called type checking. During this process, a symbol table representingall the named objectsclasses, variables, and unctionsis usually createdand used to type-check the program.
4. Generation of the intermediate representation (IR) takes the symbol table andthe abstract syntax tree and generates the intermediate representation that isthe output o the ront end. Intermediate representations usually use simpleoperations on a small set o primitive types, such as integers, characters, andreals. Java bytecodes represent one type o intermediate orm. In moderncompilers, the most common intermediate orm looks much like the MIPS
instruction set but with an innite number o virtual registers; later, we describehow to map these virtual registers to a nite set o real registers. Figure 2.15.3shows how our example might be represented in such an intermediate orm. Wecapitalize the MIPS instructions in this section when they represent IR orms.
Te intermediate orm species the unctionality o the program in a mannerindependent o the original source. Afer this ront end has created the intermediateorm, the remaining passes are largely language independent.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
3/26
2.15-4 2.15 Advanced Material: Compiling C and Interpreting Java
while statement
while ydobtnemetatsnoitidnoc
expression assignment
comparison left-hand side expression
identifier factor
l number
1
kyarra expression
expression expression
factor factor
array access identifier
identifier factor
save identifier
i
FIGURE 2.15.2 An abstract syntax tree for thewhileexample.Te roots o the tree consist othe inormational tokens such as numbers and names. Long chains o straight-line descendents are ofenomitted in constructing the tree.
High-Level OptimizationsHigh-level optimizations are transormations that are done at something close tothe source level.
Te most common high-level transormation is probably procedure inliningwhich replaces a call to a unction by the body o the unction, substituting thcallers arguments or the procedures parameters. Other high-level optimizationinvolve loop transormations that can reduce loop overhead, improve memoryaccess, and exploit the hardware more effectively. For example, in loops thaexecute many iterations, such as those traditionally controlled by aforstatementthe optimization o loop-unrollingis ofen useul. Loop-unrolling involves takinga loop, replicating the body multiple times, and executing the transormed loopewer times. Loop-unrolling reduces the loop overhead and provides opportunitieor many other optimizations. Other types o high-level transormations includ
loop-unrollingA technique to get moreperormance rom loopsthat access arrays, inwhich multiple copies othe loop body are made
and instructions romdifferent iterations arescheduled together.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
4/26
# comments are written like this--source code often included # while (save[i] == k)loop: LI R1,save # loads the starting address of save into R1 LW R2,i
MULT R3,R2,4 # Multiply R2 by 4 ADD R4,R3,R1 LW R5,0(R4) # load save[i] LW R6,k BNE R5,R6,endwhileloop
# i += 1
LW R6, i ADD R7,R6,1 # increment SW R7,i
branch loop # next iteration
endwhileloop:FIGURE 2.15.3 The while loop example is shown using a typical intermediaterepresentation.In practice, the names save, i, and kwould be replaced by some sort o address, suchas a reerence to either the local stack pointer or a global pointer, and an offset, similar to the way save[i]is accessed. Note that the ormat o the MIPS instructions is different, because they are intermediaterepresentations here: the operations are capitalized and the registers use RXXnotation.
2.15 Advanced Material: Compiling C and Interpreting Java 2
sophisticated loop transormations such as interchanging nested loops andblocking loops to obtain better memory behavior; see Chapter 5 or examples.
Local and Global OptimizationsWithin the pass dedicated to local and global optimization, three classes ooptimizations are perormed:
1. Local optimizationworks within a single basic block. A local optimizationpass is ofen run as a precursor and successor to global optimization toclean up the code beore and afer global optimization.
2. Global optimization works across multiple basic blocks; we will see anexample o this shortly.
3. Global register allocationallocates variables to registers or regions o thecode. Register allocation is crucial to getting good perormance in modernprocessors.
Several optimizations are perormed both locally and globally, including commonsubexpression elimination, constant propagation, copy propagation, dead storeelimination, and strength reduction. Lets look at some simple examples o theseoptimizations.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
5/26
2.15-6 2.15 Advanced Material: Compiling C and Interpreting Java
Common subexpression elimination nds multiple instances of the sameexpression and replaces the second one by a reference to the rst. Consider,for example, a code segment to add 4 to an array element:
x[i] = x[i] + 4Te address calculation or x[i] occurs twice and is identical since neither thestarting address o xnor the value o ichanges. Tus, the calculation can be reusedLets look at the intermediate code or this ragment, since it allows several otheoptimizations to be perormed. Te unoptimized intermediate code is on the lef. Onthe right is the optimized code, using common subexpression elimination to replacthe second address calculation with the rst. Note that the register allocation hanot yet occurred, so the compiler is using virtual register numbers like R100here.
# x[i] + 4 # x[i] + 4li R100,x li R100,x
lw R101,i lw R101,imult R102,R101,4 mult R102,R101,4add R103,R100,R102 add R103,R100,R102lw R104,0(R103) lw R104,0(R103)add R105,R104,4 # value of x[i] is in R104# x[i] = li R106,x add R105,R104,4lw R107,i # x[i] =mult R108,R107,4 sw R105,0(R103)add R109,R106,R107sw R105,0(R109)
I the same optimization were possible across two basic blocks, it would then be aninstance oglobal common subexpression elimination.
Lets consider some o the other optimizations:
Strength reductionreplaces complex operations by simpler ones and can beapplied to this code segment, replacing the MUL by a shif lef.
Constant propagationand its sibling constant foldingnd constants in codeand propagate them, collapsing constant values whenever possible.
Copy propagation propagates values that are simple copies, eliminating thneed to reload values and possibly enabling other optimizations, such acommon subexpression elimination.
Dead store elimination nds stores to values that are not used again andeliminates the store; its cousin is dead code elimination, which nds unusedcodecode that cannot affect the result o the programand eliminates itWith the heavy use o macros, templates, and the similar techniques designedto reuse code in high-level languages, dead code occurs surprisingly ofen.
Compilers must be conservative. Te rst task o a compiler is to produce correccode; its second task is usually to produce ast code, although other actors, such a
-
8/13/2019 2.15 - Compiling c and Interpreting Java
6/26
2.15 Advanced Material: Compiling C and Interpreting Java 2
code size, may sometimes be important as well. Code that is ast but incorrectorany possible combination o inputsis simply wrong. Tus, when we say a compileris conservative, we mean that it perorms an optimization only i it knows with100% certainty that, no matter what the inputs, the code will perorm as the user
wrote it. Since most compilers translate and optimize one unction or procedureat a time, most compilers, especially at lower optimization levels, assume the worstabout unction calls and about their own parameters.
Programmers concerned about perormance o critical loops, especially in real-timeor embedded applications, ofen nd themselves staring at the assembly languageproduced by a compiler and wondering why the compiler ailed to perorm someglobal optimization or to allocate a variable to a register throughout a loop. Teanswer ofen lies in the dictate that the compiler be conservative. Te opportunity or
improving the code may seem obvious to the programmer, but then the programmerofen has knowledge that the compiler does not have, such as the absence o aliasingbetween two pointers or the absence o side effects by a unction call. Te compilermay indeed be able to perorm the transormation with a little help, which couldeliminate the worst-case behavior that it must assume. Tis insight also illustratesan important observation: programmers who use pointers to try to improveperormance in accessing variables, especially pointers to values on the stack that alsohave names as variables or as elements o arrays, are likely to disable many compileroptimizations. Te end result is that the lower-level pointer code may run no better,or perhaps even worse, than the higher-level code optimized by the compiler.
Global Code Optimizations
Many global code optimizations have the same aims as those used in the localcase, including common subexpression elimination, constant propagation, copypropagation, and dead store and dead code elimination.
Tere are two other important global optimizations: code motion and inductionvariable elimination. Both are loop optimizations; that is, they are aimed at codein loops. Code motion nds code that is loop invariant: a particular piece ocode computes the same value on every iteration o the loop and, hence, may becomputed once outside the loop. Induction variable eliminationis a combination otransormations that reduce overhead on indexing arrays, essentially replacing arrayindexing with pointer accesses. Rather than examine induction variable eliminationin depth, we point the reader to Section 2.14, which compares the use o arrayindexing and pointers; or most loops, the transormation rom the more obviousarray code to the pointer code can be perormed by a modern optimizing compiler.
Understand
Program
Performanc
-
8/13/2019 2.15 - Compiling c and Interpreting Java
7/26
2.15-8 2.15 Advanced Material: Compiling C and Interpreting Java
Implementing Local Optimizations
Local optimizations are implemented on basic blocks by scanning the basic blockin instruction execution order, looking or optimization opportunities. In theassignment statement example on page 2.15-6, the duplication o the entire addres
calculation is recognized by a series o sequential passes over the code. Here is howthe process might proceed, including a description o the checks that are needed:
1. Determine that the two lioperations return the same result by observingthat the operand xis the same and that the value o its address has not beenchanged between the two lioperations.
2. Replace all uses o R106in the basic block by R101.
3. Observe that i cannot change between the two LWs that reerence it. Soreplace all uses o R107with R101.
4. Observe that the multinstructions now have the same input operands, so
that R108may be replaced by R102.5. Observe that now the two add instructions have identical input operand
(R100and R102), so replace the R109with R103.
6. Use dead store code elimination to delete the second set o li, lw, multand addinstructions since their results are unused.
Troughout this process, we need to know when two instances o an operand havethe same value. Tis is easy to determine when they reer to virtual registers, sincour intermediate representation uses such registers only once, but the problem canbe trickier when the operands are variables in memory, even though we are only
considering reerences within a basic block.It is reasonably easy or the compiler to make the common subexpressionelimination determination in a conservative ashion in this case; as we will see inthe next subsection, this is more diffi cult when branches intervene.
Implementing Global Optimizations
o understand the challenge o implementing global optimizations, lets considea ew examples:
Consider the case o an opportunity or common subexpression eliminationsay, o an IR statement like ADD Rx, R20, R50. o determine whether two
such statements compute the same value, we must determine whether thevalues o R20and R50are identical in the two statements. In practice, thimeans that the values o R20 and R50 have not changed between the rsstatement and the second. For a single basic block, this is easy to decide; it imore diffi cult or a more complex program structure involving multiple basicblocks and branches.
Consider the second LWo iinto R107within the earlier example: how dowe know whether its value is used again? I we consider only a single basic
-
8/13/2019 2.15 - Compiling c and Interpreting Java
8/26
2.15 Advanced Material: Compiling C and Interpreting Java 2
block, and we know that all uses o R107are within that block, it is easy tosee. As optimization proceeds, however, common subexpression eliminationand copy propagation may create other uses o a value. Determining that a
value is unused and the code is dead is more diffi cult in the case o multiple
basic blocks.
Finally, consider the load o k in our loop, which is a candidate or codemotion. In this simple example, we might argue it is easy to see that k isnot changed in the loop and is, hence, loop invariant. Imagine, however, amore complex loop with multiple nestings and ifstatements within the body.Determining that the load o kis loop invariant is harder in such a case.
Te inormation we need to perorm these global optimizations is similar: weneed to know where each operand in an IR statement could have been changed ordened(use-denition inormation). Te dual o this inormation is also needed:that is, nding all the uses o that changed operand (denition-use inormation).
Data ow analysisobtains both types o inormation.Global optimizations and data ow analysis operate on acontrol ow graph, wherethe nodes represent basic blocks and the arcs represent control ow between basicblocks. Figure 2.15.4 shows the control ow graph or our simple loop example,with one important transormation introduced. We describe the transormation inthe caption, but see i you can discover it, and why it was done, on your own!
8. LW R6,i
9. ADD R7,R6,110. SW R7,i
1. LI R1,save2. LW R2,i3. SLL R3,R2,24. ADD R4,R3,R15. LW R5,0(R4)6. LW R6,k7. BEQ R5,R6,startwhileloop
FIGURE 2.15.4 A control flow graph for thewhileloop example.Each node represents a basicblock, which terminates with a branch or by sequential all-through into another basic block that is alsothe target o a branch. Te IR statements have been numbered or ease in reerring to them. Te importanttransormation perormed was to move the whiletest and conditional branch to the end. Tis eliminates theunconditional branch that was ormerly inside the loop and places it beore the loop. Tis transormationis so important that many compilers do it during the generation o the IR. Te MULTwas also replaced with(strength-reduced to) an SLL.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
9/26
2.15-10 2.15 Advanced Material: Compiling C and Interpreting Java
Suppose we have computed the use-denition inormation or the controow graph in Figure 2.15.4. How does this inormation allow us to perorm codemotion? Consider IR statements number 1 and 6: in both cases, the use-denitioninormation tells us that there are no denitions (changes) o the operands o these
statements within the loop. Tus, these IR statements can be moved outside theloop. Notice that i the LIo saveand the LWo kare executed once, just prioto the loop entrance, the computational effect is the same, but the program nowruns aster since these two statements are outside the loop. In contrast, considerIR statement 2, which loads the value o i. Te denitions o i that affect thistatement are both outside the loop, where iis initially dened, and inside the loopin statement 10 where it is stored. Hence, this statement is not loop invariant.
Figure 2.15.5 shows the code afer perorming both code motion and inductionvariable elimination, which simplies the address calculation. Te variable icanstill be register allocated, eliminating the need to load and store it every time, andwe will see how this is done in the next subsection.
Beore we turn to register allocation, we need to mention a caveat that alsoillustrates the complexity and diffi culty o optimizers. Remember that the compilemust be conservative. o be conservative, a compiler must consider the ollowingquestion: Is there any way that the variable kcould possibly ever change in thiloop? Unortunately, there is one way. Suppose that the variable kand the variabliactually reer to the same memory location, which could happen i they wereaccessed by pointers or reerence parameters.
LW R2,iADD R7,R2,1ADD R4,R4,4SW R7,i
LI R1,save
LW R6,kLW R2,iSLL R3,R2,2
ADD R4,R3,R1
LW R5,0(R4)BEQ R5,R6,startwhileloop
FIGURE 2.15.5 The control flow graph showing the representation of the while looexample after code motion and induction variable elimination. Te number o instructions ithe inner loop has been reduced rom 10 to 6.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
10/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
I am sure that many readers are saying, Well, that would certainly be a stupidpiece o code! Alas, this response is not open to the compiler, which musttranslate the code as it is written. Recall too that the aliasing inormation mustalso be conservative; thus, compilers ofen nd themselves negating optimization
opportunities because o a possible alias that exists in one place in the code orbecause o incomplete inormation about aliasing.
Register Allocation
Register allocation is perhaps the most important optimization or modernload-store architectures. Eliminating a load or a store eliminates an instruction.Furthermore, register allocation enhances the value o other optimizations, such ascommon subexpression elimination. Fortunately, the trend toward larger registercounts in modern architectures has made register allocation simpler and moreeffective. Register allocation is done on both a local basis and a global basis, that is,across multiple basic blocks but within a single unction. Local register allocationis usually done late in compilation, as the nal code is generated. Our ocus here ison the more challenging and more opportunistic global register allocation.
Modern global register allocation uses a region-based approach, where aregion (sometimes called a live range) represents a section o code during whicha particular variable could be allocated to a particular register. How is a regionselected? Te process is iterative:
1. Choose a denition (change) o a variable in a given basic block; add thatblock to the region.
2. Find any uses o that denition, which is a data ow analysis problem; addany basic blocks that contain such uses, as well as any basic block that the
value passes through to reach a use, to the region.
3. Find any other denitions that also can affect a use ound in the previousstep and add the basic blocks containing those denitions, as well as theblocks the denitions pass through to reach a use, to the region.
4. Repeat steps 2 and 3 using the denitions discovered in step 3 untilconvergence.
Te set o basic blocks ound by this technique has a special property: i thedesignated variable is allocated to a register in all these basic blocks, then there isno need or loading and storing the variable.
Modern global register allocators start by constructing the regions or everyvirtual register in a unction. Once the regions are constructed, the key questionis how to allocate a register to each region: the challenge is that certain regionsoverlap and may not use the same register. Regions that do not overlap (i.e., shareno common basic blocks) can share the same register. One way to represent
-
8/13/2019 2.15 - Compiling c and Interpreting Java
11/26
2.15-12 2.15 Advanced Material: Compiling C and Interpreting Java
the intererence among regions is with an interference graph, where each nodrepresents a region, and the arcs between nodes represent that the regions havesome basic blocks in common.
Once an intererence graph has been constructed, the problem o allocating
registers is equivalent to a amous problem called graph coloring:nd a color oeach node in a graph such that no two adjacent nodes have the same color. I thenumber o colors equals the number o registers, then coloring an intererencgraph is equivalent to allocating a register or each region! Tis insight was theinitial motivation or the allocation method now known as region-based allocationbut originally called the graph-coloring approach. Figure 2.15.6 shows the owgraph representation o the whileloop example afer register allocation.
What happens i the graph cannot be colored using the number o registeravailable? Te allocator must spill registers until it can complete the coloring. Bydoing the coloring based on a priority unction that takes into account the numbeo memory reerences saved and the cost o tying up the register, the allocator
attempts to avoid spilling or the most important candidates.Spilling is equivalent to splitting up a region (or live range); i the region is splitewer other regions will interere with the two separate nodes representing theoriginal region. A process o splitting regions and successive coloring is used toallow the allocation process to complete, at which point all candidates will havebeen allocated a register. O course, whenever a region is split, loads and stores
ADD $t2,$t2,1ADD $t4,$t4,4
LI $t0,saveLW $t1,kLW $t2,i
SLL $t3,$t2,2ADDU $t4,$t3,$t0
LW $t3,0($t4)BEQ $t3,$t1,startwhileloop
FIGURE 2.15.6 The control flow graph showing the representation of the while looexample after code motion and induction variable elimination and register allocation
using the MIPS register names.Te number o IR statements in the inner loop has now dropped tonly our rom six beore register allocation and ten beore any global optimizations. Te value o iresidein $t2at the end o the loop and may need to be stored eventually to maintain the program semantics. Iwere unused afer the loop, not only could the store be avoided, but also the increment inside the loop couldbe eliminated completely!
-
8/13/2019 2.15 - Compiling c and Interpreting Java
12/26
Hardware/
Software
Interface
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
must be introduced to get the value rom memory or to store it there. Te locationchosen to split a region must balance the cost o the loads and stores that must beintroduced against the advantage o reeing up a register and reducing the numbero intererences.
Modern register allocators are incredibly effective in using the large registercounts available in modern processors. In many programs, the effectiveness oregister allocation is limited not by the availability o registers but by the possibilitieso aliasing that cause the compiler to be conservative in its choice o candidates.
Code Generation
Te nal steps o the compiler are code generation and assembly. Most compilersdo not use a stand-alone assembler that accepts assembly language source code;to save time, they instead perorm most o the same unctions: lling in symbolic
values and generating the binary code as the nal stage o code generation.In modern processors, code generation is reasonably straightorward, since the
simple architectures make the choice o instruction relatively obvious. For morecomplex architectures, such as the x86, code generation is more complex sincemultiple IR instructions may collapse into a single machine instruction. In moderncompilers, this compilation process uses pattern matching with either a tree-basedpattern matcher or a pattern matcher driven by a parser.
During code generation, the nal stages o machine-dependent optimizationare also perormed. Tese include some constant olding optimizations, as well aslocalized instruction scheduling (see Chapter 4).
Optimization Summary
Figure 2.15.7 gives examples o typical optimizations, and the last column indicateswhere the optimization is perormed in the gcc compiler. It is sometimes diffi cultto separate some o the simpler optimizationslocal and processor-dependentoptimizationsrom transormations done in the code generator, and someoptimizations are done multiple times, especially local optimizations, which may beperormed beore and afer global optimization as well as during code generation.
oday, essentially all programming or desktop and server applications is donein high-level languages, as is most programming or embedded applications.Tis development means that since most instructions executed are the output
o a compiler, an instruction set architecture is essentially a compiler target.With Moores Law comes the temptation o adding sophisticated operationsin an instruction set. Te challenge is that they may not exactly match what thecompiler needs to produce or may be so general that they arent ast. For example,consider special loop instructions ound in some computers. Suppose that instead
-
8/13/2019 2.15 - Compiling c and Interpreting Java
13/26
2.15-14 2.15 Advanced Material: Compiling C and Interpreting Java
o decrementing by one, the compiler wanted to increment by our, or insteado branching on not equal zero, the compiler wanted to branch i the index wasless than or equal to the limit. Te loop instruction may be a mismatch. Whenaced with such objections, the instruction set designer might then generalize the
operation, adding another operand to speciy the increment and perhaps an optionon which branch condition to use. Ten the danger is that a common case, sayincrementing by one, will be slower than a sequence o simple operations.
Elaboration: Some more sophisticated compilers, and many research compilers, usean analysis technique called interprocedural analysisto obtain more information abou
functions and how they are called. Interprocedural analysis attempts to discover wha
properties remain true across a function call. For example, we might discover that a
function call can never change any global variables, which might be useful in optimizing
a loop that calls such a function. Such information is called may-informationor flow
insensitive informationand can be obtained reasonably efciently, although analyzing
levelccgnoitanalpxEemannoitazimitpO
ednirossecorp;levelecruosehtraenrotAlevelhgiH pendent
3OydoberudecorpybllacerudecorpecalpeRnoitargetnierudecorP
edocenil-thgiartsnihtiWlacoL
Common subexpression elimination Replace two instances of the same computa 1Oypocelgnisybnoit
Constant propagation Replace all instances of a variable that is assigned a constant with theconstant
O1
Stack height reduction Rearrange expression tree to minimize resources needed for expression evaluation O1
hcnarbassorcAlabolG
Global common subexpressionelimination
2Osehcnarbsessorcnoisrevsihttub,lacolsaemaS
elbairavafosecnatsnillaecalpeRnoitagaporpypoC Athat has been assigned X(i.e., A = X) with X O2
2OpoolehtfonoitaretihcaeeulavemassetupmoctahtpoolamorfedocevomeRnoitomedoC
Induction variable elimination Simplify/eliminate array addressing calcula 2Ospoolnihtiwsnoit
Processor dependent Depends on processor knowledge
Strength reduction Many examples; replace multiply by a con 1Ostfihshtiwtnats
Pipeline scheduling Reorder instructions to improve pipeline per 1Oecnamrof
1OtegratsehcaertahttnemecalpsidhcnarbtsetrohsehtesoohCnoitazimitpotesffohcnarB
FIGURE 2.15.7 Major types of optimizations and explanation of each class.Te third column shows when these occurat different levels o optimization in gcc. Te GNU organization calls the three optimization levels medium (O1), ull (O2), and ull withintegration o small procedures (O3).
-
8/13/2019 2.15 - Compiling c and Interpreting Java
14/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
a call to a function F requires analyzing all the functions that F calls, which makes
the process somewhat time consuming for large programs. A more costly property to
discover is that a function mustalways change some variable; such information is called
must-information or flow-sensitive information. Recall the dictate to be conservative:
may-information can never be used as must-informationjust because a function may
change a variable does not mean that it mustchange it. It is conservative, however, to
use the negation of may-information, so the compiler can rely on the fact that a function
willnever change a variable in optimizations around the call site of that function.
One o the most important uses o interprocedural analysis is to obtain so-called alias inormation. An aliasoccurs when two names may designate the same
variable. For example, it is quite helpul to know that two pointers passed to aunction may never designate the same variable. Alias inormation is usually ow-insensitive and must be used conservatively.
Interpreting Java
Tis second part o the section is or readers interested in seeing how an object-oriented language like Java executes on a MIPS architecture. It shows the Javabytecodes used or interpretation and the MIPS code or the Java version o someo the C segments in prior sections, including Bubble Sort.
Lets quickly review the Java lingo to make sure we are all on the same page. Tebig idea o object-oriented programming is or programmers to think in terms oabstract objects, and operations are associated with each typeo object. New typescan ofen be thought o as renements to existing types, and so some operationsor the existing types are used by the new type without change. Te hope is thatthe programmer thinks at a higher level, and that code can be reused more readilyi the programmer implements the common operations on many different types.
Tis different perspective led to a different set o terms. Te type o an objectis a class, which is the denition o a new data type together with the operationsthat are dened to work on that data type. A particular object is then an instanceo a class, and creating an object rom a class is called instantiation. Te operationsin a class are called methods, which are similar to C procedures. Rather than calla procedure as in C, you invokea method in Java. Te other members o a classare elds, which correspond to variables in C. Variables inside objects are calledinstance elds. Rather than access a structure with a pointer, Java uses an objectreferenceto access an object. Te syntax or method invocation is x.y,where xisan object reerence and y is the method name.
Te parentchild relationship between older and newer classes is capturedby the verb extends: a child class extends (or sub classes) a parent class. Techild class typically will redene some o the methods ound in the parent to matchthe new data type. Some methods work ne, and the child class inherits thosemethods.
o reduce the number o errors associated with pointers and explicit memorydeallocation, Java automatically rees unused storage, using a separate garbage
object-orientedlanguageA programming langthat is oriented arouobjects rather thanactions, or data versulogic.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
15/26
2.15-16 2.15 Advanced Material: Compiling C and Interpreting Java
collector that rees memory when it is ull. Hence, newcreates a new instance o adynamic object on the heap, but there is no freein Java. Java also requires arraybounds to be checked at runtime to catch another class o errors that can occur inC programs.
Interpretation
As mentioned beore, Java programs are distributed as Java bytecodes, and the JavaVirtual Machine (JVM) executes Java byte codes. Te JVM understands a binaryormat called the class leormat. A class le is a stream o bytes or a single classcontaining a table o valid methods with their bytecodes, a pool o constants thaacts in part as a symbol table, and other inormation such as the parent class o thiclass.
When the JVM is rst started, it looks or the class method main. o start anyJava class, the JVM dynamically loads, links, and initializes a class. Te JVM load
a class by rst nding the binary representation o the proper class (class le) andthen creating a class rom that binary representation. Linking combines the clasinto the runtime state o the JVM so that it can be executed. Finally, it executes theclass initialization method that is included in every class.
Figure 2.15.8 shows Java bytecodes and their corresponding MIPS instructionsillustrating ve major differences between the two:
1. o simpliy compilation, Java uses a stack instead o registers or operandsOperands are pushed on the stack, operated on, and then popped off thestack.
2. Te designers o the JVM were concerned about code size, so bytecode
vary in length between one and ve bytes, versus the 4-byte, xed-sizMIPS instructions. o save space, the JVM even has redundant instructiono different lengths whose only difference is size o the immediate. Tidecision illustrates a code size variation o our third design principle: makthe common case small.
3. Te JVM has saety eatures embedded in the architecture. For examplearray data transer instructions check to be sure that the rst operand is areerence and that the second index operand is within bounds.
4. o allow garbage collectors to nd all live pointers, the JVM uses differeninstructions to operate on addresses versus integers so that the JVM canknow what operands contain addresses. MIPS generally lumps integers andaddresses together.
5. Finally, unlike MIPS, there are Java-specic instructions that perorm complexoperations, like allocating an array on the heap or invoking a method.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
16/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
edocetybavaJnoitarepOyrogetaC
Size
(bits)
MIPS
instr. Meaning
pop;SON+SOT=SONdda8ddaiddacitemhtirA
pop;SONSOT=SONbus8busitcartbus
b8I+]a8I[emarF=]a8I[emarFidda8b8Ia8IcniitnemercniData transfer load local integer/address iload I8/aload I8 16 lw TOS=Frame[I8]
load local integer/address iload_/aload_{0,1,2,3} 8 lw TOS=Frame[{0,1,2,3}]
store local integer/address istore I8/astore I8 16 sw Frame[I8]=TOS; pop
load integer/address from array iaload/aaload 8 lw NOS=*NOS[TOS]; pop
store integer/address into array iastore/aastore 8 sw *NNOS[NOS]=TOS; pop2
pop;]SOT[SON*=SONhl8daolasyarramorfflahdaol
2pop;SOT=]SON[SONN*hs8erotsasyarraotniflaherots
pop;]SOT[SON*=SONbl8daolabyarramorfetybdaol
2pop;SOT=]SON[SONN*bs8erotsabyarraotnietyberots
load immediate bipush I8, sipush I16 16, 24 addi push; TOS=I8 or I16
load immediate iconst_{1,0,1,2,3,4,5} 8 addi push; TOS={1,0,1,2,3,4,5}
pop;SON&SOT=SONdna8dnaidnalacigoL
pop;SON|SOT=SONro8roiro
pop;SOTSON=SONlrs8rhsuithgirtfihs
Conditionalbranch
branch on equal if_icompeq I16 24 beq if TOS == NOS, go to I16; pop2
branch on not equal if_icompne I16 24 bne if TOS != NOS, go to I16; pop2
p;61Iotog,SON}=>,>,=
-
8/13/2019 2.15 - Compiling c and Interpreting Java
17/26
2.15-18 2.15 Advanced Material: Compiling C and Interpreting Java
Compiling a whileLoop in Java Using Bytecodes
Compile the whileloop rom page 92, this time using Java bytecodes:
while (save[i] == k)i += 1;
Assume that i, k, and save are the rst three local variables. Show thaddresses o the bytecodes. Te MIPS version o the C loop in Figure2.15.3 took six instructions and twenty-our bytes. How big is the bytecode
version?
Te rst step is to put the array reerence in saveon the stack:
0 aload_3 # Push local variable 3 (save[]) onto stack
Tis 1-byte instruction inorms the JVM that an address in local variable 3 ibeing put on the stack. Te 0 on the lef o this instruction is the byte addreso this rst instruction; bytecodes or each method start at 0. Te next step ito put the index on the stack:
1 iload_1 # Push local variable 1 (i) onto stack
Like the prior instruction, this 1-byte instruction is a short version o a moregeneral instruction that takes 2 bytes to load a local variable onto the stack. Te
next instruction is to get the value rom the array element:
2 iaload # Put array element (save[i]) onto stack
Tis 1-byte instruction checks the prior two operands, pops them off the stackand then puts the value o the desired array element onto the new top o thestack. Next, we place kon the stack:
3 iload_2 # Push local variable 2 (k) onto stack
We are now ready or the whiletest:
4 if_icompne, Exit # Compare and exit if not equalTis 3-byte instruction compares the top two elements o the stack, pops themoff the stack, and branches i they are not equal. We are nally ready or thbody o the loop:
7 iinc, 1, 1 # Increment local variable 1 by 1 (i+=1)
EXAMPLE
ANSWER
-
8/13/2019 2.15 - Compiling c and Interpreting Java
18/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
Tis unusual 3-byte instruction increments a local variable by 1 without usingthe operand stack, an optimization that again saves space. Finally, we return tothe top o the loop with a 3-byte jump:
10 go to 0 # Go to top of Loop (byte address 0)Tus, the bytecode version takes seven instructions and thirteen bytes, almosthal the size o the MIPS C code. (As beore, we can optimize this code to jumpless.)
Compiling for Java
Since Java is derived rom C and Java has the same built-in types as C, the assignmentstatement examples in Sections 2.2 to 2.6 o Chapter 2 are the same in Java as theyare in C. Te same is true or the ifstatement example in Section 2.7.
Te Java version o the while loop is different, however. Te designers o C
leave it up to the programmers to be sure that their code does not exceed the arraybounds. Te designers o Java wanted to catch array bound bugs, and thus requirethe compiler to check or such violations. o check bounds, the compiler needs toknow what they are. Java includes an extra word in every array that holds the upperbound. Te lower bound is dened as 0.
Compiling a whileLoop in Java
Modiy the MIPS code or the while loop on page 94 to include the arraybounds checks that are required by Java. Assume that the length o the array islocated just beore the rst element o the array.
Lets assume that Java arrays reserved the rst two words o arrays beore thedata starts. Well see the use o the rst word soon, but the second word has thearray length. Beore we enter the loop, lets load the length o the array into atemporary register:
lw $t2,4($s6) # Temp reg $t2 = length of array save
Beore we multiply iby 4, we must test to see i its less than 0 or greater thanthe last element o the array. Te rst step is to check i iis less than 0:
Loop: slt $t0,$s3,$zero # Temp reg $t0 = 1 if i < 0
Register $t0 is set to 1 i iis less than 0. Hence, a branch to see i register$t0 is not equal tozero will give us the effect o branching i i is less than0. Tis pair o instructions, sltand bne, implements branch on less than.
EXAMPLE
ANSWER
-
8/13/2019 2.15 - Compiling c and Interpreting Java
19/26
2.15-20 2.15 Advanced Material: Compiling C and Interpreting Java
Register $zeroalways contains 0, so this nal test is accomplished using thebneinstruction and comparing register $t0to register $zero:
bne $t0,$zero,IndexOutOfBounds # if i= lengthbeq $t0,$zero,IndexOutOfBounds #if i>=length, gotoError
Note that these two instructions implement branch on greater than or equal toTe next two lines o the MIPS whileloop are unchanged rom the C version:
sll $t1,$s3,2 # Temp reg $t1 = 4 * iadd $t1,$t1,$s6 # $t1 = address of save[i]
We need to account or the rst 8 bytes that are reserved in Java. We do that bychanging the address eld o the load rom 0 to 8:
lw $t0,8($t1) # Temp reg $t0 = save[i]
Te rest o the MIPS code rom the C whileloop is ne as is:
bne $t0,$s5, Exit # go to Exit if save[i] ? k
add $s3,$s3,1 # i = i + 1j Loop # go to LoopExit:
(See the exercises or an optimization o this sequence.)
Invoking Methods in Java
Te compiler picks the appropriate method depending on the type o the object. In ew cases, it is unambiguous, and the method can be invoked with no more overheadthan a C procedure. In general, however, the compiler knows only that a given variablecontains a pointer to an object that belongs to some subtype o a general class. Sinc
it doesnt know at compile time which subclass the object is, and thus which methodshould be invoked, the compiler will generate code that rst tests to be sure the pointeisnt null and then uses the code to load a pointer to a table with all the legal methodor that type. Te rst word o the object has the method table address, which is whyJava arrays reserve two words. Lets say its using the fh method that was declared othat class. (Te method order is the same or all subclasses.) Te compiler then takethe fh address rom that table and invokes the method at that address.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
20/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
public class sort{
public staticvoid sort (int[] v) {
for (int i = 0; i < v.length; i += 1) {
for (int j = i - 1; j >= 0 && v[j] > v[j + 1]; j = 1) {
swap(v, j);
}
}
protected static void swap(int[] v, int k) { int temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}}
FIGURE 2.15.9 An initial Java procedure that performs a sort on the array v. Changes romFigures 2.24 and 2.26 are highlighted.
Te cost o object orientation in general is that method invocation includes 1) aconditional branch to be sure that the pointer to the object is valid; 2) a load to getthe address o the table o available methods; 3) another load to get the address othe proper method; 4) placing a return address into the return register, and nally
5) a jump register to invoke the method. Te next subsection gives a concreteexample o method invocation.
A Sort Example in Java
Figure 2.15.9 shows the Java version o exchange sort. A simple difference is thatthere is no need to pass the length o the array as a separate parameter, since Javaarrays include their length: v.lengthdenotes the length o v.
A more signicant difference is that Java methods are prepended with keywordsnot ound in the C procedures. Te sortmethod is declared public staticwhile swap is declared protected static. Public means that sort can beinvoked rom any other method, while protectedmeans swapcan only be called byother methods within the same packageand rom methods within derived classes.A static method is another name or a class methodmethods that perormclasswide operations and do not apply to an individual object. Static methods areessentially the same as C procedures.
Tis straightorward translation rom C into static methods means there is noambiguity on method invocation, and so it can be just as effi cient as C. It also is limitedto sorting integers, which means a different sort has to be written or each data type.
o demonstrate the object orientation o Java, Figure 2.15.10 shows thenew version with the changes highlighted. First, we declare v to be o the typeComparableand replace v[j] > v[j + 1]with an invocation o compareTo.By changing vto this new class, we can use this code to sortmany data types.
public A Java keywthat allows a methodbe invoked by any otmethod.
protected A Java keword that restrictsinvocation o a methto other methods in package.
package Basically adirectory that contaigroup o related class
static methodA method that applithe whole class rathean individual object.unrelated to static in
-
8/13/2019 2.15 - Compiling c and Interpreting Java
21/26
2.15-22 2.15 Advanced Material: Compiling C and Interpreting Java
public class sort {
public static void sort (Comparable[] v) {
for (int i = 0; i < v.length; i += 1) {
for (int j = i 1; j >= 0 && v[j].compareTo(v[j + 1]);j = 1) {
swap(v, j);
}
}
protected static void swap(Comparable[] v, int k) {
Comparabletemp = v[k];
v[k] = v[k+1];
v[k+1] = temp; }}
public class Comparable {
public int(compareTo (int x)
{ return value x; }
public int value;
}
FIGURE 2.15.10 A revised Java procedure that sorts on the array v that can take on more types.Changes rom Figur2.15.9 are highlighted.
Te methodcompareTocompares two elements and returns a value greater than0 i the parameter is larger than the object, 0 i it is equal, and a negative numberi it is smaller than the object. Tese two changes generalize the code so it cansort integers, characters, strings, and so on, i there are subclasses o Comparablewith each o these types and i there is a version o compareTo or each typeFor pedagogic purposes, we redene the class Comparable and the methodcompareTohere to compare integers. Te actual denition o Comparablein thJava library is considerably different.
Starting rom the MIPS code that we generated or C, we show what changes wemade to create the MIPS code or Java.
For swap, the only signicant differences are that we must check to be sure the
object reerence is not null and that each array reerence is within bounds. Te rstest checks that the address in the rst parameter is not zero:
swap: beq $a0,$zero,NullPointer #if $a0==0,goto Error
Next, we load the length o v into a register and check that index kis OK.
lw $t2,4($a0) # Temp reg $t2 = length of array vslt $t0,$a1,$zero # Temp reg $t0 = 1 if k < 0
-
8/13/2019 2.15 - Compiling c and Interpreting Java
22/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
bne $t0,$zero,IndexOutOfBounds # if k < 0, goto Errorslt $t0,$a1,$t2 # Temp reg $t0 = 0 if k >= lengthbeq $t0,$zero,IndexOutOfBounds #if k>=length,goto Error
Tis check is ollowed by a check that k+1is within bounds.addi $t1,$a1,1 # Temp reg $t1 = k+1slt $t0,$t1,$zero # Temp reg $t0 = 1 if k+1 < 0bne $t0,$zero,IndexOutOfBounds # if k+1 < 0, goto Errorslt $t0,$t1,$t2 # Temp reg $t0 = 0 if k+1 >= lengthbeq $t0,$zero,IndexOutOfBounds #if k+1>=length,goto Error
Figure 2.15.11 highlights the extra MIPS instructions in swap that a Javacompiler might produce. We again must adjust the offset in the load and store toaccount or two words reserved or the method table and length.
Figure 2.15.12 shows the method body or those new instructions or sort. (We
can take the saving, restoring, and return rom Figure 2.27.)Te rst test is again to make sure the pointer to vis not null:
beq $a0,$zero,NullPointer #if $a0==0,goto Error
Next, we load the length o the array (we use register $s3to keep it similar to thecode or the C version o swap):
1w $s3,4($aO) #$s3 = length of array v
Bounds check
swap: beq $a0,$zero,NullPointer #if $a0==0,goto Error
lw $t2,-4($a0) # Temp reg $t2 = length of array v slt $t0,$a1,$zero # Temp reg $t0 = 1 if k < 0 bne $t0,$zero,IndexOutOfBounds # if k < 0, goto Error slt $t0,$a1,$t2 # Temp reg $t0 = 0 if k >= length beq $t0,$zero,IndexOutOfBounds # if k >= length, goto Error addi $t1,$a1,1 # Temp reg $t1 = k+1 slt $t0,$t1,$zero # Temp reg $t0 = 1 if k+1 < 0 bne $t0,$zero,IndexOutOfBounds # if k+1 < 0, goto Error slt $t0,$t1,$t2 # Temp reg $t0 = 0 if k+1 >= length beq $t0,$zero,IndexOutOfBounds # if k+1 >= length, goto Error
Method body
sll $t1, $a1, 2 # reg $t1 = k * 4add $t1, $a0, $t1 # reg $t1 = v + (k * 4)
]k[vfosserddaehtsah1t$ger# lw $t0, 8($t1) # reg $t0 (temp) = v[k] lw $t2,12($t1) # reg $t2 = v[k + 1]
vfotnemeletxenotsrefer# sw $t2, 8($t1) # v[k] = reg $t2 sw $t0, 12($t1) # v[k+1] = reg $t0 (temp)
Procedure return
jr $ra # return to calling routine
FIGURE 2.15.11 MIPS assembly code of the procedure swapin Figure 2.24.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
23/26
2.15-24 2.15 Advanced Material: Compiling C and Interpreting Java
Now we must ensure that the index is within bounds. Since the rst test o the inneloop is to test i jis negative, we can skip that initial bound test. Tat leaves the tes
or too big:slt $t0,$s1,$s3 # Temp reg $t0 = 0 if j >= lengthbeq $t0,$zero,IndexOutOfBounds #if j>=length, goto Error
Method body
Move parameters )0a$evas(2s$otni0a$retemarapypoc#0a$,2s$evom
Test ptr null beq $a0,$zero,NullPointer # if $a0==0, goto Error
Get array length vyarrafohtgnel=3s$#)0a$(4,3s$wl
Outer loop0=i#orez$,0s$evom
for1tst: slt $t0, $s0, $s3 # reg $t0 = 0 if $s0 $s3 (i n) beq $t0, $zero, exit1 # go to exit1 if $s0 $s3 (i n)
Inner loop start addi $s1, $s0, 1 # j = i 1for2tst: slti $t0, $s1, 0 # reg $t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)
Test if jtoo big slt $t0,$s1,$s3 # Temp reg $t0 = 0 if j >= length beq $t0,$zero,IndexOutOfBounds # if j >= length, goto Error
Get v[j] sll $t1, $s1, 2 # reg $t1 = j * 4
add $t2, $s2, $t1 # reg $t2 = v + (j * 4)lw $t3, 0($t2) # reg $t3 = v[j]
Test if j+1 < 0or if j+1 too big
1+j=1t$gerpmeT#1,1s$,1t$idda slt $t0,$t1,$zero # Temp reg $t0 = 1 if j+1 < 0 bne $t0,$zero,IndexOutOfBounds # if j+1 < 0, goto Error slt $t0,$t1,$s3 # Temp reg $t0 = 0 if j+1 >= length beq $t0,$zero,IndexOutOfBounds # if j+1 >= length, goto Error
Get v[j+1] lw $t4, 4($t2) # reg $t4 = v[j + 1]
Load method table elbatdohtemfosserdda=5t$#)0a$(0,5t$wl
Get method addr fifosserdda=5t$#)5t$(8,5t$wl rst method
Pass parameters ]j[vsioTerapmocforetemarapts1#3t$,0a$evom ]1+j[vsioTerapmocfo.marapdn2#4t$,1a$evom
Set return addr sserddanruterdaol#1L,ar$al
Call indirectly oTerapmocrofedocllac#5t$rj
Test if should skipswap
L1: slt $t0, $zero, $v0 # reg $t0 = 0 if 0 $v0beq $t0, $zero, exit2 # go to exit2 if $t4 $t3
Pass parametersand call swap
vsipawsforetemarapts1#2s$,0a$evomjsipawsforetemarapdn2#1s$,1a$evom
43.2erugiFninwohsedocpaws#pawslaj
Inner loop end addi $s1, $s1, 1 # j = 1 j for2tst poolrennifotsetotpmuj#
Outer loop exit2: addi $s0, $s0, 1 # i += 1 j for1tst poolretuofotsetotpmuj#
FIGURE 2.15.12 MIPS assembly version of the method body of the Java version of sort. Te new code is highlighted in thisgure. We must still add the code to save and restore registers and the return rom the MIPS code ound in Figure 2.27. o keep the code similarto that gure, we load v.lengthinto $s3instead o into a temporary register. o reduce the number o lines o code, we make the simpliyingassumption that compareTois a lea procedure and we do not need to push registers to be saved on the stack.
-
8/13/2019 2.15 - Compiling c and Interpreting Java
24/26
2.15 Advanced Material: Compiling C and Interpreting Java 2.1
Te code or testing j + 1is quite similar to the code or checking k + 1inswap,so we skip it here.
Te key difference is the invocation o compareTo. We rst load the address othe table o legal methods, which we assume is two words beore the beginning o
the array:
lw $t5,0($a0) # $t5 = address of method table
Given the address o the method table or this object, we then get the desiredmethod. Lets assume compareTois the third method in the Comparableclass. opick the address o the third method, we load that address into a temporary register:
lw $t5,8($t5) # $t5 = address of third method
We are now ready to call compareTo. Te next step is to save the necessaryregisters on the stack. Fortunately, we dont need the temporary registers or
argument registers afer the method invocation, so there is nothing to save. Tus,we simply pass the parameters or compareTo:
move $a0, $t3 # 1st parameter of compareTo is v[j]move $a1, $t4 # 2nd parameter of compareTo is v[j+1]
Since we are using a jump register to invoke compareTo, we need to pass thereturn address explicitly. We use the pseudoinstruction load address (la) and labelwhere we want to return, and then do the indirect jump:
la $ra,L1 # load return addressjr $t5 # to code for compareTo
Te method returns, with $v0determining which o the two elements is larger.I $v0 > 0, then v[j] >v[j+1], and we need to swap. Tus, to skip the swap,we need to test i $v0 0, which is the same as 0 $v0. We also need to includethe label or the return address:
L1: slt $t0, $zero, $v0 # reg $t0 = 0 if 0 $v0beq $t0, $zero, exit2 # go to exit2 if v[j+1] v[j]
Te MIPS code or compareTois lef as an exercise.
Te main changes or the Java versions o sortand swapare testing or null objectreerences and index out-o-bounds errors, and the extra method invocation togive a more general compare. Tis method invocation is more expensive than aC procedure call, since it requires a load, a conditional branch, a pair o chainedloads, and an indirect jump. As we see in Chapter 4, dependent loads and indirect
jumps can be relatively slow on modern processors. Te increasing popularity
Hardware/
Software
Interface
-
8/13/2019 2.15 - Compiling c and Interpreting Java
25/26
2.15-26 2.15 Advanced Material: Compiling C and Interpreting Java
o Java suggests that many programmers today are willing to leverage the highperormance o modern processors to pay or error checking and code reuse.
Elaboration:Although we test each reference to jand j1 to be sure that theseindices are within bounds, an assembly language programmer might look at the code
and reason as follows:
1. Te innerforloop is only executed i j0 and sincej1 j, there is noneed to testj1 to see i it is less than 0.
2. Since itakes on the values, 0, 1, 2, . . . , (data.length1) and sincejtakes onthe values i1, i2, . . . , 2, 1, 0, there is no need to test ijdata.lengthsince the largest valuejcan be is data.length2.
3. Following the same reasoning, there is no need to test whetherj1data
length since the largest value oj1 is data.length1.
There are coding tricks in Chapter 2 and superscalar execution in Chapter 4 tha
lower the effective cost of such bounds checking, but only high optimizing compilers
can reason this way. Note that if the compiler inlined the swap method into sort, man
checks would be unnecessary.
Elaboration: Look carefully at the code for swap in Figure 2.15.11. See anythingwrong in the code, or at least in the explanation of how the code works? It implicitly
assumes that each Comparableelement in vis 4 bytes long. Surely, you need much more
than 4 bytes for a complex subclass of Comparable, which could contain any numbe
of elds. Surprisingly, this code does work, because an important property of Java
semantics forces the use of the same, small representation for all variables, elds, anarray elements that belong to Comparableor its subclasses.
Java types are divided into primitive typesthe predened types for numbers
characters, and Booleansand reference typesthe built-in classes like String
user-dened classes, and arrays. Values of reference types are pointers (also called
references) to anonymous objects that are themselves allocated in the heap. For the
programmer, this means that assigning one variable to another does not create a new
object, but instead makes both variables refer to the same object. Because these
objects are anonymous and programs therefore have no way to refer to them directly
a program must use indirection through a variable to read or write any objects elds
(variables). Thus, because the data structure allocated for the array vconsists entirel
of pointers, it is safe to assume they are all the same size, and the same swapping code
works for all of Comparables subtypes.To write sorting and swapping functions for arrays of primitive types requires tha
we write new versions of the functions, one for each type. This replication is for two
reasons. First, primitive type values do not include the references to dispatching tables
that we used on Comparablesto determine at runtime how to compare values. Second
primitive values come in different sizes: 1, 2, 4, or 8 bytes.
The pervasive use of pointers in Java is elegant in its consistency, with the penalty
being a level of indirection and a requirement that objects be allocated on the heap
Furthermore, in any language where the lifetimes of the heap-allocated anonymous
-
8/13/2019 2.15 - Compiling c and Interpreting Java
26/26
2.21 Historical Perspective and Further Reading 2.1
objects are independent of the lifetimes of the named variables, elds, and array
elements that reference them, programmers must deal with the problem of deciding
when it is safe to deallocate heap-allocated storage. Javas designers chose to use
garbage collection. Of course, use of garbage collection rather than explicit user memory
management also improves program safety.
Cprovides an interesting contrast. Although programmers can write essentially
the same pointer-manipulating solution in C, there is another option. In C,
programmers can elect to forgo the level of indirection and directly manipulate an array
of objects, rather than an array of pointers to those objects. To do so, Cprogrammers
would typically use the template capability, which allows a class or function to be
parameterized by the typeof data on which it acts. Templates, however, are compiled
using the equivalent of macro expansion. That is, if we declared an instance of sort
capable of sorting types X and Y, Cwould create two copies of the code for the class:
one for sortX and one for sortY, each specialized accordingly. This solution
increases code size in exchange for making comparison faster (since the function calls
would not be indirect, and might even be subject to inline expansion). Of course, the
speed advantage would be canceled if swapping the objects required moving large
amounts of data instead of just single pointers. As always, the best design depends on
the details of the problem.