implementation of the python bytecode compiler
Post on 14-Jan-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
Implementation of the Python Bytecode Compiler
Jeremy HyltonGoogle
What to expect from this talk
• Intended for developers• Explain key data structures and control
flow• Lots of code on slides
The New Bytecode Compiler
• Rewrote compiler from scratch for 2.5– Emphasizes modularity– Work was almost done for Python 2.4– Still uses original parser, pgen
• Traditional compiler abstractions– Abstract Syntax Tree (AST)– Basic blocks
• Goals– Ease maintenance, extensibility– Expose AST to Python programs
Compiler Architecture
Tokenizer
Parser
AST Converter
Code Generator
Assembler
Peephole Optimizer
Source Text Tokens
Parse Tree
AST
__future__ Symbol Table
Blocks
bytecode
bytecode bytecode
Compiler Organization
compile.c 4,200
infrastructure 700
code generator 2,400
assembler 500
peephole optimizer 600
asdl.c,.h <100
pyarena.c 100
future.c 100
ast.c 3,000
symtable.c 1,400
Python-ast.c,.h 1,900 (generated)
Total 10,800
Tokenize, Parse, AST
• Simple, hand-coded tokenizer– Synthesizes INDENT and DEDENT tokens
• pgen: parser generator– Input in Grammar/Grammar– Extended LL(1) grammar
• ast conversion– Collapses parse tree into abstract form– Future: extend pgen to generator ast directly
Grammar vs. Abstract Syntax
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | …if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]suite: simple_stmt | NEWLINE INDENT stmt+ DEDENTtest: and_test ('or' and_test)* | lambdefand_test: not_test ('and' not_test)*not_test: 'not' not_test | comparisoncomparison: expr (comp_op expr)*comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not‘
stmt = For(expr target, expr iter, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | …
expr = BinOp(expr left, operator op, expr right) | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords,
expr? starargs, expr? kwargs) | …
AST node types
• Modules (mod)• Statements (stmt)• Expressions (expr)
– Expressions allowed on LHS have context slot
• Extras– Slots, comprehension, excepthandler,
arguments– Operator types
• FunctionDef is complex– Children in two namespaces
Example Code
L = []for x in range(10):if x > 5:
L.append(x * 2)else:
L.append(x + 2)
Concrete Syntax Example
(if_stmt, (1, 'if'), (test, (and_test, (not_test, (comparison, (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (1, 'x')))))))))), (comp_op, (21, '>')), (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (2, '5')))))))))))))), (11, ':'), …
Abstract Syntax Example
For(Name('x', Load), Call(Name('range', Load), [Num(10)]),
[If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Mult, Num(2))])] [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Add, Num(2))])])])
Our Goal: Bytecode 2 0 BUILD_LIST 0 3 STORE_FAST 1 (L) 3 6 SETUP_LOOP 71 (to 80) 9 LOAD_GLOBAL 1 (range) 12 LOAD_CONST 1 (10) 15 CALL_FUNCTION 1 18 GET_ITER >> 19 FOR_ITER 57 (to 79) 22 STORE_FAST 0 (x)
4 25 LOAD_FAST 0 (x) 28 LOAD_CONST 2 (5) 31 COMPARE_OP 4 (>) 34 JUMP_IF_FALSE 21 (to 58) 37 POP_TOP
5 38 LOAD_FAST 1 (L) 41 LOAD_ATTR 3 (append) 44 LOAD_FAST 0 (x) 47 LOAD_CONST 3 (2) 50 BINARY_MULTIPLY 51 CALL_FUNCTION 1 54 POP_TOP 55 JUMP_ABSOLUTE 19 >> 58 POP_TOP
7 59 LOAD_FAST 1 (L) 62 LOAD_ATTR 3 (append) 65 LOAD_FAST 0 (x) 68 LOAD_CONST 3 (2) 71 BINARY_ADD 72 CALL_FUNCTION 1 75 POP_TOP 76 JUMP_ABSOLUTE 19 >> 79 POP_BLOCK
Strategy for Compilation
• Module-wide analysis– Check future statements– Build symbol table
• For variable, is it local, global, free?• Makes two passes over block structure
• Compile one function at a time– Generate basic blocks– Assemble bytecode– Optimize generated code (out of order)– Code object stored in parent’s constant pool
Symbol Table
• Collect basic facts about symbols, block– Variables assigned, used; params, global stmts– Check for import *, unqualified exec, yield– Other tricky details
• Identify free, cell variables in second pass– Parent passes bound names down– Child passes free variables up– Implicit vs. explicit global vars
Name operations
• Five different load name opcodes– LOAD_FAST: array access for function locals– LOAD_GLOBAL: dict lookups for globals, builtins– LOAD_NAME: dict lookups for locals, globals– LOAD_DEREF: load free variable– LOAD_CLOSURE: loads cells to make closure
• Cells– Separate allocation for mutable variable– Stored in flat closure list– Separately garbage collected
Class namespaces
class Spam:id = id(1)
1 0 LOAD_GLOBAL 0 (__name__) 3 STORE_NAME 1 (__module__)
2 6 LOAD_NAME 2 (id) 9 LOAD_CONST 1 (1) 12 CALL_FUNCTION 1 15 STORE_NAME 2 (id) 18 LOAD_LOCALS 19 RETURN_VALUE
Closuresdef make_adder(n):
x = ndef adder(y):
return x + yreturn adder
return make_adder
def make_adder(n): 2 0 LOAD_FAST 0 (n) 3 STORE_DEREF 0 (x) 3 6 LOAD_CLOSURE 0 (x) 9 LOAD_CONST 1 (<code>) 12 MAKE_CLOSURE 0 15 STORE_FAST 2 (adder) 5 18 LOAD_FAST 2 (adder) 21 RETURN_VALUE
def adder(y): 4 0 LOAD_DEREF 0 (x) 3 LOAD_FAST 0 (y) 6 BINARY_ADD 7 RETURN_VALUE
Code generation input
• Discriminated unions– One for each AST type– Struct for each option– Constructor functions
• Literals– Stored as PyObject*– ast pass parses
• Identifiers– Also PyObject* – string
typedef struct _stmt *stmt_ty;struct _stmt { enum { ..., For_kind=8, While_kind=9, If_kind=10, ... } kind; union { struct { expr_ty target; expr_ty iter; asdl_seq *body; asdl_seq *orelse; } For; struct { expr_ty test; asdl_seq *body; asdl_seq *orelse; } If; } int lineno;};
Code generation output
• Basic blocks– Start with jump target– Ends if there is a jump– Function is graph of blocks
• Instructions– Opcode + argument– Jump targets are pointers
• Helper functions– Create new blocks– Add instr to current block
struct instr {unsigned char i_opcode;int i_oparg;struct basicblock_ *i_target; int i_lineno;
// plus some one-bit flags};
struct basicblock_ {int b_iused;int b_ialloc;struct instr *b_instr;struct basicblock_ *b_next;int b_startdepth;int b_offset;// several details elided
};
Code generation
• One visitor function for each AST type– Switch on kind enum– Emit bytecodes– Return immediately on error
• Heavy use of C macros– ADDOP(), ADDOP_JREL(), …– VISIT(), VISIT_SEQ(), …– Hides control flow
Code generation example
static int compiler_if(struct compiler *c, stmt_ty s) {
basicblock *end, *next;
if (!(end = compiler_new_block(c)))
return 0;
if (!(next = compiler_new_block(c)))
return 0;
VISIT(c, expr, s->v.If.test);
ADDOP_JREL(c, JUMP_IF_FALSE, next);
ADDOP(c, POP_TOP);
VISIT_SEQ(c, stmt, s->v.If.body);
ADDOP_JREL(c, JUMP_FORWARD, end);
compiler_use_next_block(c, next);
ADDOP(c, POP_TOP);
if (s->v.If.orelse)
VISIT_SEQ(c, stmt, s->v.If.orelse);
compiler_use_next_block(c, end);
return 1;
}
Assembler
• Lots of fiddly details– Linearize code– Compute stack space needed– Compute line number table (lnotab)– Compute jump offsets– Call PyCode_New()
• Peephole optimizer– Integrated at wrong end of assembler– Constant folding, simplify jumps
AST transformation
• Expose AST to Python programmers– Simplify analysis of programs– Generate code from modified AST
• Example:– Implement with statement as AST transform
• Ongoing work– BOF this afternoon at 3:15, Preston Trail
Loose ends
• compiler package– Should revise to support new AST types– Tricky compatibility issue
• Revise pgen to generate AST directly• Develop toolkit for AST transforms• Extend analysis, e.g. PEP 267
top related