40-414 compiler design introduction to...

1

40-414 Compiler Design

Introduction to Parsing

Lecture 4

Prof. Aiken 2

Languages and Automata

• Formal languages are very important in CS– Especially in programming languages

• Regular languages– The weakest formal languages widely used

– Many applications

• We will also study context-free languages

Prof. Aiken 3

Beyond Regular Languages

• Many languages are not regular

• Strings of balanced parentheses are not regular:

( ) | 0i i i

Prof. Aiken 4

A 1

1

00

B

What Can Regular Languages Express?

Prof. Aiken 5

What Can Regular Languages Express?

• Languages requiring counting modulo a fixed integer

• Intuition: A finite automaton that runs long enough must repeat states

• Finite automaton can’t remember # of times it has visited a particular state

Prof. Aiken 6

The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: parse tree of the program(But some parsers never produce a parse tree . . .)

Prof. Aiken 7

Example

• Cool

if x = y then 1 else 2 fi

• Parser input

IF ID = ID THEN INT ELSE INT FI

• Parser outputIF-THEN-ELSE

=

ID ID

INTINT

Prof. Aiken 8

Comparison with Lexical Analysis

Phase Input Output

Lexer String of characters

String of tokens

Parser String of tokens Parse tree

(may be implicit)

Prof. Aiken 9

The Role of the Parser

• Not all strings of tokens are programs . . .

• . . . parser must distinguish between valid and invalid strings of tokens

• We need– A language for describing valid strings of tokens

– A method for distinguishing valid from invalid strings of tokens

10

Chomsky Hierarchy

0 Unrestricted A

1 Context-Sensitive | LHS | | RHS |

2 Context-Free |LHS | = 1

3 Regular |RHS| = 1 or 2 , A a | aB, or

A a | Ba

Prof. Aiken [slightly modified] 11

Context-Free Grammars

• Programming language constructs have recursive structure

• An STMT isif EXPR then STMT else STMT

while EXPR do STMT end

…

• Context-free grammars are a natural notation for this recursive structure

Prof. Aiken 12

CFGs (Cont.)

• A CFG consists of– A set of terminals T

– A set of non-terminals N

– A start symbol S (a non-terminal)

– A set of productions

Prof. Aiken 13

Notational Conventions

• In these lecture notes– Non-terminals are written upper-case

– Terminals are written lower-case

– The start symbol is the left-hand side of the first production

Prof. Aiken 14

Example of CFGs

Simple arithmetic expressions:

E E E

| E + E

| E

| id

Prof. Aiken 15

The Language of a CFG

Read productions as replacement rules:

X Y1…Yn

Means X can be replaced by Y1…Yn

Prof. Aiken 16

Key Idea

1. Begin with a string consisting of the start symbol “S”

2. Replace any non-terminal X in the string by a the right-hand side of some production

X Y1…Yn

3. Repeat (2) until there are no non-terminals in the string

Prof. Aiken 17

The Language of a CFG (Cont.)

More formally, write

if there is a production

Prof. Aiken 18

The Language of a CFG (Cont.)

Write

if

in 0 or more steps


The Language of a CFG

Let G be a context-free grammar with start symbol S. Then the language L(G) of G is:

The sentential forms SF(G) of G is:

X1 … Xn S X1 … Xn and every Xi is a terminal or non-terminal

Therefore: L(G) SF(G)

Prof. Aiken 20

Terminals

• Terminals are so-called because there are no rules for replacing them

• Once generated, terminals are permanent

• Terminals ought to be tokens of the language

Prof. Aiken 21

Examples

L(G) is the language of CFG G

Strings of balanced parentheses

Two grammars:

( )S S

S

( )

|

S S

( ) | 0i i i

OR

Prof. Aiken 22

Arithmetic Example

Simple arithmetic expressions:

Some elements of the language:

E E+E | E E | (E) | id

id id + id

(id) id id

(id) id id (id)

The idea of a CFG is a big step. But:

• Membership in a language is “yes” or “no”; also need parse tree of the input

• Must handle errors gracefully

• Need an implementation of CFG’s (e.g., bison)

23

Notes

Prof. Aiken

Prof. Aiken 24

Derivations and Parse Trees

A derivation is a sequence of productions

S … … … … …

A derivation can be drawn as a tree– Start symbol is the tree’s root

– For a production X Y1…Yn add children Y1…Yn to node X

Prof. Aiken 25

Derivation Example

• Grammar

• String

E E+E | E E | (E) | id

id id + id

Prof. Aiken 26

Derivation Example (Cont.)

E

E+E

E E+E

id E + E

id id + E

id id + id

E

E

E E

E+

id*

idid

Prof. Aiken 27

Derivation in Detail (1)

E

E

Prof. Aiken 28


E

E+E

E

E E+

Prof. Aiken 29


E E

E

E+E

E +

E

E

E E

E+

*

Prof. Aiken 30


E

E+E

E E+E

id E + E

E

E

E E

E+

*

id

Prof. Aiken 31


E

E+E

E E+E

id E +

id id +

E

E

E

E

E E

E+

*

idid

Prof. Aiken 32


E

E+E

E E+E

id E + E

id id + E

id id + id

E

E

E E

E+

id*

idid

Prof. Aiken 33

Notes on Derivations

• A parse tree has– Terminals at the leaves

– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the original input

• The parse tree shows the association of operations, the input string does not

Prof. Aiken 34

Left-most and Right-most Derivations

• The example was a left-most derivation– At each step, replaced the

left-most non-terminal

• There is an equivalent notion of a right-most derivation

E

E+E

E+id

E E + id

E id + id

id id + id

Prof. Aiken 35

Right-most Derivation in Detail (1)

E

E

Prof. Aiken 36


E

E+E

E

E E+

Prof. Aiken 37


id

E

E+E

E+

E

E E+

id

Prof. Aiken 38


E

E+E

E+id

E E + id

E

E

E E

E+

id*

Prof. Aiken 39


E

E+E

E+id

E E

E

+ id

id + id

E

E

E E

E+

id*

id

Prof. Aiken 40


E

E+E

E+id

E E + id

E id + id

id id + id

E

E

E E

E+

id*

idid

Prof. Aiken 41

Derivations and Parse Trees

• Note that right-most and left-most derivations have the same parse tree

• The difference is the order in which branches are added

Prof. Aiken 42

Summary of Derivations

• We are not just interested in whether s L(G)

– We need a parse tree for s

• A derivation defines a parse tree– But one parse tree may have many derivations

• Left-most and right-most derivations are important in parser implementation

Prof. Aiken 43

Ambiguity

• Grammar

• String id * id + id

E E + E | E * E | ( E ) | id

Prof. Aiken 44

Ambiguity (Cont.)

This string has two parse trees

E

E

E E

E*

id +

idid

E

E

E E

E+

id*

idid

Prof. Aiken 45

Ambiguity (Cont.)

• A grammar is ambiguous if it has more than one parse tree for some string– Equivalently, there is more than one right-most or

left-most derivation for some string

• Ambiguity is BAD– Leaves meaning of some programs ill-defined

• There are several ways to handle ambiguity

• Most direct method is to rewrite grammar unambiguously

• Enforces precedence of * over +Prof. Aiken [slightly modified] 46

Dealing with Ambiguity

E E + T | TT T * F | FF ( E ) | id

Prof. Aiken 47

Ambiguity in Arithmetic Expressions

• Recall the grammar

E E + E | E * E | ( E ) | id

• The string id * id + id has two parse trees:

E

E

E E

E*

id +

idid

E

E

E E

E+

id*

idid

Prof. Aiken 48

Ambiguity in Arithmetic Expressions

• Recall the grammar

E E + E | E * E | ( E ) | id

• The string id * id + id has two parse trees:

E

E

E E

E*

id +

idid

E

E

E E

E+

id*

idid

Prof. Aiken 49

Ambiguity: The Dangling Else

• Consider the grammarS if E then S

| if E then S else S

| OTHER

• This grammar is also ambiguous

50

The Dangling Else: Example

• The expressionif E1 then if E2 then S else S

has two parse trees

S

E1 S

E2 S S

S

E1 S

E2 S

S

• Typically we want the second form

thenif else

thenif

thenif

if then else

Prof. Aiken [slightly modified]

Prof. Aiken 51

The Dangling Else: A Fix

• else matches the closest unmatched then• We can describe this in the grammar

S MIF /* all then are matched */ | UIF /* some then is unmatched */

MIF if E then MIF else MIF

| OTHER

UIF if E then S| if E then MIF else UIF

• Describes the same set of strings

52

The Dangling Else: Example Revisited

UIF

E1MIF

E2 MIF MIF

UIF

E1 MIF

E2 S

UIFthenif else

thenif

thenif

if then else

S

• A valid parse tree (for a UIF)

• Not valid because the then expression is not a MIF

• The expression if E1 then if E2 then S else S

S S


Prof. Aiken 53

Ambiguity

• No general techniques for handling ambiguity

• Impossible to convert automatically an ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the grammar– Sometimes allows more natural definitions

– We need disambiguation mechanisms

54

Introduction to Top-Down Parsing

Recursive Descent

Prof. Aiken 55

Intro to Top-Down Parsing: The Idea

• The parse tree is constructed– From the top

– From left to right

• Terminals are seen in order of appearance in the token stream:

t2 t5 t6 t8 t9

1

t2 3

4

t5

7

t6

t9

t8

Prof. Aiken 56

Recursive Descent Parsing

• Consider the grammarE T |T + E

T int | int * T | ( E )

• Token stream is: ( int )

• Start with top-level non-terminal E– Try the rules for E in order

Prof. Aiken 57


E T |T + E

T int | int * T | ( E )

E

( int )

Prof. Aiken 58


E T |T + E

T int | int * T | ( E )

E

T

( int )

Prof. Aiken 59


E T |T + E

T int | int * T | ( E )

int

( int )

E

T

Mismatch: int is not ( !Backtrack …

Prof. Aiken 60


E T |T + E

T int | int * T | ( E )

E

T

( int )

Prof. Aiken 61


E T |T + E

T int | int * T | ( E )

E

T

( int )

int * TMismatch: int is not ( !Backtrack …

Prof. Aiken 62


E T |T + E

T int | int * T | ( E )

E

T

( int )

Prof. Aiken 63


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )Match! Advance input.

Prof. Aiken 64


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )

Prof. Aiken 65


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )

T

Prof. Aiken66


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )

T

int

Match! Advance input.

67


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )

T

int

Match! Advance input.

Prof. Aiken

68


E T |T + E

T int | int * T | ( E )

E

T

( int )

( E )

T

int

End of input, accept.

Prof. Aiken

69

Problems with Top Down Parsing

• Left Recursion in CFG may cause parser to loop forever!

• In there is a production of form AA, we say the grammar has left recursion

• Solution: Remove Left Recursion...

– without changing the Language defined by the Grammar.

E E + int | int

70

Problems with Top Down Parsing (Example)

E E + int | int

int + int

E +

E

int

71


E E + int | int

int + int

E +

E

int

72


E E + int | int

int + int

E +

E

int

E + int

Prof. Aiken 73

Elimination of Left Recursion

• Consider the left-recursive grammarS S |

• S generates all strings starting with a and followed by a number of

• Can rewrite using right-recursionS S’

S’ S’ |

Prof. Aiken 74

More Elimination of Left-Recursion

• In general

S S 1 | … | S n | 1 | … | m

• All strings derived from S start with one of 1,…,m and continue with several instances of1,…,n

• Rewrite asS 1 S’ | … | m S’

S’ 1 S’ | … | n S’ |

Prof. Aiken 75

General Left Recursion

• The grammar S A |

A S

is also left-recursive because

S + S

• This left-recursion can also be eliminated

• See Dragon Book for general algorithm– Section 4.3.3

76

Predictive Top-Down Parsing

Parser Never Backtracks

For Example, Consider:

Type Simple

| id

| array [ Simple ] of Type

Simple integer

| char

| num dotdot num

Suppose input is :

array [ num dotdot num ] of integer

Parsing would begin with

Type ???

Start symbol

77

Predictive Parsing Example

Type simple

| id


Simple integer

| char

| num dotdot num

Start symbol

of[ Type]Simplearray

Type

[ Type]Simple ofarray

Type

numnum dotdot

Input : array [ num dotdot num ] of integer

Lookahead symbol

Type

?


Lookahead symbol

78

Predictive Parsing Example

Type Simple

| id


Simple integer

| char

| num dotdot num

Start symbol


Lookahead symbol


Lookahead symbol


Type

numnum dotdot Simple


Type

numnum dotdot Simple

integer

79

Predictive Recursive Descent

• Parser is implemented by N + 1 subroutines, where N is the number grammar non-terminals

• There is one subroutine for attempting to Match tokens in the input stream

• There is also one subroutine for each non-terminal with two tasks:

1. Deciding on the next production to use

2. Applying the selected production

80

Predictive Recursive Descent (Cont.)

Procedure “Match” checks if the token matches the expected input

procedure Match ( expected_token ) ;

{

if lookahead = expected_token then

lookahead := get_next_token

else error

}

81


• The subroutine for each non-terminals has two tasks:

1. Selecting the appropriate production

2. Applying the chosen production

• Selection is done based on the result of a number of if-then-else statements

• Applying a production is implemented by calling the match procedure or other subroutines, based on the rhs of the production

82


Subroutine “Simple” for the given example:

procedure Simple ;{ if lookahead = integer then call Match ( integer );

else if lookahead = char then call Match ( char );else if lookahead = num

then { call Match ( num ); call Match ( dotdot );call Match ( num ) }

else error}

Type Simple

| id


Simple integer

| char

| num dotdot num

83


Subroutine “Type” for the given example:

Type Simple

| id


Simple integer

| char

| num dotdot num

procedure Type ;{if lookahead is in { integer, char, num } then call Simple;

else if lookahead = ‘’ then { call Match ( ‘’ ) ; call Match( id ) }else if lookahead = array

then { call Match( array ); call Match( ‘[‘ ); call Simple;call Match( ‘]’ ); call Match( of ); call Type }

else error}

84

How to write tests for selecting the appropriate production rule?

Basic Tools:

First: Let be a string of grammar symbols. First() is the set that includes every terminal that appears leftmost in or in any string originating from . NOTE: If , then is First( ).

Follow: Let A be a non-terminal. Follow(A) is the set of terminals athat can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A).

85

Computing First Sets

Definition

First(X) = { t | X * t} { | X * }

Algorithm sketch:

1. First(t) = { t }

2. First(X)• if X

• if X A1 … An and First(Ai) for 1 i n

3. First() First(X) if X A1 … An

– and First(Ai) for 1 i n

Prof. Aiken

86

Computing First(X) for all Grammar Symbols

1. If X is a terminal, First(X) = {X}

2. If X is a production rule, add to First(X)

3. If X is a non-terminal, and X Y1Y2…Yk is a production rule

Place First(Y1) - in First(X)

if Y1 , Place First(Y2) - in First(X)

if Y2 , Place First(Y3) - in First(X)

…

if Yk-1 , Place First(Yk) in First(X)

NOTE: As soon as Yi , Stop.

Repeat above steps until no more elements are added to any First set.

*

87

First Sets. Example

• Recall the grammar E T X X + E |

T ( E ) | int Y Y * T |

• First sets

First( ( ) = { ( } First( T ) = {int, ( }

First( ) ) = { ) } First( E ) = {int, ( }

First( int) = { int } First( X ) = {+, }

First( + ) = { + } First( Y ) = {*, }

First( * ) = { * }

Prof. Aiken

88

Computing Follow Sets

• Definition:

Follow(X) = { t | S * X t }

• Intuition– If X A B then First(B) Follow(A) and

Follow(X) Follow(B)

• if B * then Follow(X) Follow(A)

– If S is the start symbol then $ Follow(S)

Prof. Aiken

89

Computing Follow Sets (Cont.)

Algorithm sketch:

1. $ Follow(S)

2. First() - {} Follow(X)– For each production A X

3. Follow(A) Follow(X)– For each production A X where First()

Prof. Aiken

90

Follow Sets. Example

• Recall the grammar E T X X + E |

T ( E ) | int Y Y * T |

• Follow sets

Follow( E ) = {), $}

Follow( T ) = {+, ) , $}

Follow( Y ) = {+, ) , $}

Follow( X ) = {), $}


91

Conditions for R.D. Parsing to be Predictive

R.D. parsing is Predictive when for all A

1. First( ) First( ) = ;besides, only one of or can derive

2. if derives , then Follow( A ) First( ) =


Predictive Parsing and Left Factoring

• Consider the grammarE T + E | T

T int | int * T | ( E )

• Hard to predict because– For T two productions start with int

– For E it is not clear how to predict

• We need to left-factor the grammar

Prof. Aiken 94

Error Handling

• Purpose of the compiler is– To detect non-valid programs

– To translate the valid ones

• Many kinds of possible errors (e.g. in C)

Error kind Example Detected by …Lexical … $ … Lexer

Syntax … x *% … Parser

Semantic … int x; y = x(3); … Type checker

Correctness your favorite program Tester/User

Prof. Aiken 95

Syntax Error Handling

• Error handler should– Report errors accurately and clearly

– Recover from an error quickly

– Not slow down compilation of valid code

• Good error handling is not easy to achieve


Approaches to Syntax Error Recovery

• From simple to complex– Panic mode

– Error productions

– Automatic local or global correction

Prof. Aiken 97

Error Recovery: Panic Mode

• Simplest, most popular method

• When an error is detected:– Discard tokens until one with a clear role is found

– Continue from there

• Such tokens are called synchronizing tokens– Typically the statement or expression terminators

Prof. Aiken 98

Syntax Error Recovery: Error Productions

• Idea: specify in the grammar known common mistakes

• Essentially promotes common errors to alternative syntax

• Example: – Write 5 x instead of 5 * x– Add the production E … | E E

• Disadvantage– Complicates the grammar

Prof. Aiken 99

Error Recovery: Local and Global Correction

• Idea: find a correct “nearby” program – Try token insertions and deletions

– Exhaustive search

• Disadvantages: – Hard to implement

– Slows down parsing of correct programs

– “Nearby” is not necessarily “the intended” program

Prof. Aiken 100

Syntax Error Recovery: Past and Present

• Past– Slow recompilation cycle (even once a day)– Find as many errors in one cycle as possible

• Present– Quick recompilation cycle– Users tend to correct one error/cycle– Complex error recovery is less compelling– Panic-mode seems enough

101

Implementing Panic-Mode Recovery

• The choice for the synchronizing set is important for improving the performance of the panic mode method.

• We define First(A) Follow(A) as the synchronizing set of non-terminal A.

102

Implementing Panic-Mode Recovery (Cont.)

Suppose the parser is in subroutine A, the current token is a,and a syntax error is detected:

1. if a Follow (A), Report the error by ‘illegal a found on line N’,where N is the line number of token a,then get the next token from the scanner, and then call A.

2. if a Follow (A),Report the error by: ‘missing b on line N’,where b is a simple token that can be derived from A,and N is the line number of token a,then resume parsing by exiting from A.

103

Implementing Panic-Mode Recovery (Cont.)

• Suppose the parser is in subroutine Match, the current token is a, and the expected token is b:

• Report the error by the message:

– ‘missing b on line N, where N is the line number of token a, and

– then resume parsing by exiting from Match.

104

Example of PM Error Recovery

procedure Type ;{ if lookahead is in { integer, char, num } then call Simple;

else if lookahead = ‘’ then { call Match ( ‘’ ) ; call Match( id ) }else if lookahead = array then {

call Match( array ); call Match( ‘[‘ ); call Simple; call Match( ‘]’ ); call Match( of ); call Type }

else if lookahead is in { $ } then print ( ‘missing integer on line N’ )else { print ( ‘illegal lookahead on line N’ );

lookahed := get_next_token;call Type }

}

Follow( Type ) = { $ }

105

Example of PM Error Recovery (Cont.)

procedure Simple ;{ if lookahead = integer then call Match ( integer );

else if lookahead = char then call Match ( char );else if lookahead = num

then { call Match ( num );call Match ( dotdot );call Match ( num )}

else if lookahead is in { $, ] }then print ( ‘missing integer on line N’ )else { print ( ‘illegal lookahead on line N’ );

lookahed := get_next_token;call Simple }

}

Follow( Simple ) = { $, ] }

106

Example of PM Error Recovery (Cont.)

procedure Match ( expected_token ) ;{

if lookahead = expected_token thenlookahead := get_next_token

else print ( ‘missing expected_token on line N’ )}

107

Question?

• Consider the grammar E T X X + E |

T ( E ) | int Y Y * T |

• Write Recursive Descent Procedures including panic mode error recovery for all non-terminals.

• Write a step-by-step parsing of input ‘int * int’

• Draw the parse tree of the input

Prof. Aiken 108

Question?

7

2

15

A B B

B C C

C 1 | 2

How many strings does the following grammar generate?

8

16

4

Prof. Aiken 109

Question?

16

15

31

A B B

B C C

C 1 | 2 |

How many strings does the following grammar generate?

12

64

63

32

11

Prof. Aiken 110

aXa

abYa

acXca

acca

Which of the following is a valid

derivation of the givengrammar?

S

S aXa

X | bY

Y | cXc |d

S

aXa

abYa

abcXca

abcbYca

abcbdca

S

aXa

abYa

abcXcda

abccda

S

aa

Question?

Prof. Aiken 111

Question?

a

b

a

S

X

Y

c X c

Which of the following is a valid

parse tree for the givengrammar?

S

a X a

b Y

c X c d

a

Y

a

S

X

b

c X c d

a

b Y

a

S

X

S aXa

X | bY

Y | cXc |d

Prof. Aiken 113

Question?

D A

S A

A B | C

B ( C )

C B + C | D

D 1 | 0

A D

D B

Consider the following grammar. Adding which one of the following rules will cause the grammar to be left-recursive? [Choose all that apply]

B C

C 1 C

Prof. Aiken 115

Question?

S Sa | Sb | S SS’

S’a | b

S Sa | SbS S | S’

S’a | b

Choose the unambiguous version

of the givenambiguous grammar: S SS| a | b |

Prof. Aiken 116

Question?

2

7

1

E E * E | E + E | ( E ) | int

Consider the following grammar. How many unique parse trees are there for the string 5 * 3 + (2 * 7) + 4?

8

5

4

117

Question?

Which of the following statements are true about the given grammar?

S a T U b | T c U c | b U b | a U aU S b | c c

The follow set of S is { $, b }

The first set of U is { a, b, c }

The first set of S is { , a, b }

The follow set of T is { a, b, c }

Choose all that are correct.

118

Question?

Consider the following grammar:

S A ( S ) B | A S | S B x | B S B | y

What are the first and follow sets of S

First: { x, } Follow: { $, y, x, (, ) }

First: { y, (, } Follow: { $, y, ( , )}

First: { x, y, (, } Follow: { $, y, x, (, ) }

First: { x, y, ( } Follow: { $, y, x, (, ) }

First: { x, ( } Follow: { $, y, x }

First: { x, y, (, } Follow: { y, x, (, ) }

Prof. Aiken 119

Question?

E

E’

E’ + E

id + E

id + E’

id +id

Choose the derivation that is a valid recursive descent

parse for the string id + id in the given grammar. Moves

that are followed by backtracking are given in red. E E’ | E’ + E

E’ - E’ | id | (E)

E

E’

id

E’+ E

id + E

id + E’

id +id

E

E’ + E

id + E

id + E’

id +id

E

E’

-E’

id

(E)

E’ + E

-E’ +E

id + E

id + E’

id +-E’

id +id

121

Question?

Prof. Aiken

Choose the alternative that correctlyleft

factors “if” statements in the given grammar

EXPR if true then {EXPR }

| if false then { EXPR}

| iftrue then { EXPR } else { EXPR }

| if false then { EXPR } else { EXPR }

| …

EXPR

|

EXPR’

BOOL

if BOOL then { EXPR } EXPR’

…

else { EXPR } |

true | false

EXPR if BOOL then { EXPR}

| if BOOL then { EXPR } else { EXPR}

| …

BOOL true | false

EXPR

EXPR’

|

BOOL

EXPR’ | EXPR’ else { EXPR}

if BOOL then { EXPR}

…

true | false

EXPR if BOOLEXPR’

| …

EXPR’ then { EXPR}

| then { EXPR } else { EXPR}

BOOL true | false

40-414 compiler design introduction to...

Documents