[yidlug] programming languages differences, the underlying implementation

59
Virtual Memory Layout Operatin g System Reserved .data Break (Heap) .text .bss S t a c k 0 FFFFFF FF

Upload: moshe-bt

Post on 22-Nov-2014

403 views

Category:

Technology


0 download

DESCRIPTION

Presented BY: Yoel Halberstam Sr Software Engineer, DBA and Web Admin Yoel is familiar with assembly language, C, C++, c#, java, JavaScript, PHP, VB, vb.net, batch file scripting, shell scripting, and bash Description: Many of us are frustrated when dealing with a language outside the scope of one's experience, this is especially true when it comes down to differences that are merely a result from the underlying implementation or ideology. YouTube Link: http://youtu.be/sQqXWLpz9hQ YIDLUG is a Yiddish speaking Linux (and other open source software) Users Group, check us out at http://www.meetup.com/yidlug

TRANSCRIPT

Page 1: [Yidlug] programming languages differences, the underlying implementation

Virtual Memory Layout

Operating System Reserved

.data

Break (Heap)

.text

.bss

Stack

0

FFFFFFFF

Page 2: [Yidlug] programming languages differences, the underlying implementation

How The Sections Work

.text.data

Instruction1Instruction2OpCode Operand1 Operand2ADD 01010101 11110000

a = 1 b = 1 a + b

00000001

00000001

Variable a @(01010101)

Variable b @ (11110000)

Code

Page 3: [Yidlug] programming languages differences, the underlying implementation

Indirection .text

.dataInstruction1Instruction2OpCode Operand1 Operand2ADD 11110011 11111111

a = 1 b = 1 c* = &a d* = &b c* + d*

00000001

00000001

Variable a @(01010101)

Variable b @ (11110000)Code

01010101Variable c @ (11110011)

11110000Variable d @ (11111111)

Indirect

Page 4: [Yidlug] programming languages differences, the underlying implementation

Example A “C” “Array”

myArray = array() i = 1 myArray[i] = 0

00000001 i @(01010101)

myArray @ (11110000)

Code

myArray[i] == myArray[1] == myArray + 1 @ (11110001)

00000000

Index value (indirection)

Page 5: [Yidlug] programming languages differences, the underlying implementation

Back To Memory

a = 0 p* =&a

11110001

Whole_Memory_Array

Pointer p @(01010101)

Code

a =Actual pointed variable = Whole_Memory_Array[p]@ (11110001) 00000000

Pointer value (indirection)

Page 6: [Yidlug] programming languages differences, the underlying implementation

Intro To BinarySwitch = Bit

8 Switches = 8 Bits = Byte

1 = On0 = Off

We can have only 0 or 1 Is there any way to get real numbers?

Page 7: [Yidlug] programming languages differences, the underlying implementation

Lets Take an Example From Regular (Decimal) Numbers

• 0• 1• 2• 3• 4• 5• 6• 7• 8• 9• ?????

We have only 9 numerals So how do we proceed?

Page 8: [Yidlug] programming languages differences, the underlying implementation

Lets Take an Example From Regular (Decimal) Numbers

• 0

• 1• 2• 3• 4• 5• 6• 7• 8• 9• ?????

The answer is combination

• 10

• 11• 12• …• 20• 21• ….• 30• ….• 99• ?????

• 100

• 101• ….• 200• …..

Page 9: [Yidlug] programming languages differences, the underlying implementation

Back To Binary

• 0

• 1• ???

The answer is combination

• 10

• 11• ?????

• 100• 101• 110• 111• ??????

• 1000• 1001• 1010• 1100• 1101• 1110• 1111• ??????

Page 10: [Yidlug] programming languages differences, the underlying implementation

Binary Compared To Decimal

• 0

• 1• 2• 3• 4• 5• 6• 7• 8• 9• ……

• 00000000• 00000001• 00000010• 00000011• 00000100• 00000101• 00000110• 00000111• 00001000• 00001001• ……..

• 0

• 1• 2• 3• 4• 5• 6• 7• 10• 11• ……

Decimal Binary Octal

Page 11: [Yidlug] programming languages differences, the underlying implementation

Negative Numbers

• Rule 1: Negative has a 1 in front Ex: 100000001

• Rule 2: The 2’s complement1. The 1’s Complement – Xor all bits

Ex: 000000001 - (Decimal “1”) Xor: 111111110

2. The 2’s Complement – Add 1Ex: 000000001 - (Decimal “1”)Xor: 111111110Add 1: 111111110 - (Decimal “-1”)

Page 12: [Yidlug] programming languages differences, the underlying implementation

Converting Between Smaller And Larger Types

byte a = 1 short b = a

Code 000000010000000000000001

Correct

unsigned byte a = 255 short b = a

111111110000000011111111

Correct

byte a = -1 short b = a

111111110000000011111111

Wrong

111111111111111111111111Correct

Positive Values

Negative Values

Page 13: [Yidlug] programming languages differences, the underlying implementation

Identifiers Turned Into Memory Addresses

1. The identifiers that are being turned into memory addresses:– Global Variables– Functions– Labels

2. The identifiers that are NOT being turned into memory addresses, (are only used to measure the size to reserve):– Custom types– Struct– Class names

3. The identifiers that are used as offsets:– Array index– Class (and struct) field (NOT function) member– Local variables

Page 14: [Yidlug] programming languages differences, the underlying implementation

Variables

• Size of the reserved memory is based on the type• There might be the following types

1. Built-in type – which is just a group of bytes, based on the compiler and/or platform

2. Custom type – which is just a series of built in types3. Array (in C and C++) – which is just a series of the same built-in type

(but no one keeps track of the length, so it is the programmers job)4. Pointer – system defined to hold memory addresses5. Reference – a pointer without the possibility to access the memory

address6. Handle – a value supplied by the operating system (like the index of an

array)7. Typedef and Define – available in C and C++ to rename existing types

Variables are a “higher language – human readable” name for a memory address

Page 15: [Yidlug] programming languages differences, the underlying implementation

Labels

• There is no size associated with it• Location– In assembly it might be everywhere

• In fact in assembly it is generally the only way to declare a variable, function, loop, or “else”

• In Dos batch it is the only way to declare a function– In C and C++ it might be only in a function – but is only

recommended to break out of a nested loop– In VB it is used for error handling “On error goto”– In Java it might only be before a loop

Labels are a “higher language – human readable” name for a memory address

Page 16: [Yidlug] programming languages differences, the underlying implementation

Label Sample In Assembly.data

.intvar1:

1var2:

10.text

.globalstart:

mov var1 %eaxcall myfuncjmp myfunc

myfunc:mov var2 %ebxadd %eax %ebxret

Page 17: [Yidlug] programming languages differences, the underlying implementation

Label Sample In Java (Or C)outerLabel:

while(1==1){

while(2==2){

//Java syntaxbreak outerLabel;

//C syntax (not the same as before, as it will cause the loop again)

goto outerLabel;}

}

Page 18: [Yidlug] programming languages differences, the underlying implementation

Boolean Type• False == 0 (all switches are of)• True == 1 (switch is on, and also matches

Boolean algebra)• All other numbers are also considered true (as

there are switches on)• There are languages that require conversion

between numbers and Boolean (and other are doing it behind the scenes))

However TWO languages are an exception

Page 19: [Yidlug] programming languages differences, the underlying implementation

Why Some Languages Require conversion•Consider the following C code, all of them are perfectly valid:

if(1==1) //trueif(1) //true as wellif(a) //Checks if “a” is non-zeroif(a==b) //Compares “a” to “b”if(a=b) //Sets “a” to the value of “b”, and then

//checks “a” if it is non-zero, Is this by //intention or typo?

•However in Java the last statement would not compile, as it is not a Boolean operation

•For C there is some avoidance by having constants to the left, ie. if(20==a) Instead of if(a==20)Because

if(20=a) Is a syntax errorWhile if(a=20) Is perfectly valid

Page 20: [Yidlug] programming languages differences, the underlying implementation

Implicit Conversion• However some languages employ implicit

conversion• JavaScript considers – If (1) : true– If (0) : false– If (“”) : false– If (“0”) : true

• Php Considers• If (“0”) : false• If (array()) : false

Page 21: [Yidlug] programming languages differences, the underlying implementation

Boolean In Visual Basic• True in VB = -1• To see why let us see it in binary– 1 = 00000001– 1’s complement = 1111111110– 2’s complement = 1111111111

• So all switches are on

But why different than all others?

To answer that we need to understand the difference between Logical and Bitwise operators, and why do

Logical operators short circuit?

Page 22: [Yidlug] programming languages differences, the underlying implementation

Logical vs bitwise

• Logical– Step 1 – Check the left side for true– Step 2 – If still no conclusion check the right side– Step 3 – Compare both sides and give the answer

• Bitwise– Step 1 – Translate both sides into binary– Step 2 – Compare both sides bit per bit– Step 3 – Provide the soltuion

Page 23: [Yidlug] programming languages differences, the underlying implementation

Example Bitwise vs Logical• Example 1

If 1==1 AND 2==2 Logical And Bitwise AndStep 1: 1==1 is true 1==1 is true=00000001Step 2: 2 ==2 is true 2==2 is true=00000001Step 3:

True 00000001And True 00000001 = True 00000001

Page 24: [Yidlug] programming languages differences, the underlying implementation

More Examples

Page 25: [Yidlug] programming languages differences, the underlying implementation

Bitwise vs Logical OperatorsOperator C Basic VB.Net

(Logical)

Logical AND && N/A AndAlso

Logical OR || N/A OrElse

Logical NOT ! N/A N/A

(Bitwise)

Bitwise AND & AND AND

Bitwise OR | OR OR

Bitwise XOR ^ XOR XOR

Bitwise NOT ~ NOT NOT

Page 26: [Yidlug] programming languages differences, the underlying implementation

Back To VB

1 = 00000001 = True NOT = 11111110 = True

-1 = 11111111 = True NOT = 00000000 = False

Page 27: [Yidlug] programming languages differences, the underlying implementation

Boolean In Bash Shell Scripting# if(true) then echo “works”; fi# works## if(false) then echo “works”; fi## if(test 1 –eq 1) then echo “works”; fi# works## if(test 1 –eq 2) then echo “works”; fi## echo test 1 –eq 1# # test 1 –eq 1# echo $?# 0# test 1 –eq 2# echo $?# 1## true# echo #?# 0# false# echo #?# 1

Page 28: [Yidlug] programming languages differences, the underlying implementation

Simple Function CallStack

CodeFunction func1(){

func2();}

Function func2(){

return;}

Base Pointer

Stack Pointer

Page 29: [Yidlug] programming languages differences, the underlying implementation

Simple Function Call – Assembly CodeStack

CodeFunction func1(){

func2();}

Function func2(){

//Some Codereturn;

}

Base Pointer

Stack Pointer

Assembly CodeFunc1: #labelJmp func2

Func2: #labelPush BasePointerBasePointer = StackPointer#some codePop BasePointer

9997

9995

Page 30: [Yidlug] programming languages differences, the underlying implementation

Simple Function Call – Step 1Stack

CodeFunction func1(){

func2();}

Function func2(){

//Some Codereturn;

}

Base Pointer

Stack Pointer

Assembly CodeFunc1: #labelJmp func2

Func2: #labelPush BasePointerBasePointer = StackPointer#some codePop BasePointer

9997

9994

9997

Page 31: [Yidlug] programming languages differences, the underlying implementation

Simple Function Call – Step 2Stack

CodeFunction func1(){

func2();}

Function func2(){

//Some Codereturn;

}

Base Pointer

Stack Pointer

Assembly CodeFunc1: #labelJmp func2

Func2: #labelPush BasePointerBasePointer = StackPointer#some codePop BasePointer

9997

9994

Page 32: [Yidlug] programming languages differences, the underlying implementation

Simple Function Call – Step 3Stack

CodeFunction func1(){

func2();}

Function func2(){

//Some Codereturn;

}

Base Pointer

Stack Pointer

Assembly CodeFunc1: #labelJmp func2

Func2: #labelPush BasePointerBasePointer = StackPointer#some codePop BasePointer

9995

9997

Page 33: [Yidlug] programming languages differences, the underlying implementation

Function CallStack

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Base Pointer

Stack Pointer

Page 34: [Yidlug] programming languages differences, the underlying implementation

Function Call – Assembly CodeStack

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Base Pointer

Stack Pointer

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

9997

9995

Page 35: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 1Stack

Base Pointer

Stack Pointer

9993

9997

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5

Page 36: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 2Stack

Base Pointer

Stack Pointer

9992

9997

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5 9997

Page 37: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 3Stack

Base Pointer

Stack Pointer

9992

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5 9997

Page 38: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 4Stack

Base Pointer

Stack Pointer

9992

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5 9997

9981

Page 39: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 5Stack

Base Pointer

Stack Pointer

9992

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5 9997

Page 40: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 6Stack

Base Pointer

Stack Pointer

9993

9997

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2Add 2

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1Sub 10#some codeAdd 10Add 1Pop BasePointer

10

5

Page 41: [Yidlug] programming languages differences, the underlying implementation

Function Call – Step 6Stack

Base Pointer

Stack Pointer

9995

9997

CodeFunction func1(){

func2(5, 10);}

Function func2(int x, int y){

int a;int[10] b;//Some codereturn;

}

Assembly CodeFunc1: #labelPush 10Push 5Jmp func2Add 2 StackPointer

Func2: #labelPush BasePointerBasePointer = StackPointerSub 1 StackPointerSub 10 StackPointer#some codeAdd 10 StackPointerAdd 1 StackPointerPop BasePointer

Page 42: [Yidlug] programming languages differences, the underlying implementation

Function call - summary

1. Arguments are copied (passed by value) on the stack from right to left (In the C calling convention)

2. The return address is pushed3. The old base pointer is pushed4. We make place for local variables (but is not intialized and contans “Garbage”)

Call

1. Remove the space from the local variables (everything there is destroyed)2. The old base pointer is restored3. The return address is poped and jumped to.4. Parameters are destroyed

Return

1. All access to local variables or parameters are relative to the base pointer

2. i.e the compiler does not transalate “localVar” to “FF55”, but to “BasePointer - 1”

Access

Page 43: [Yidlug] programming languages differences, the underlying implementation

The Heap

Operating System Reserved

.data

Break (Heap)

.text

.bss

Stack

0

FFFFFFFF

C CodeInt* myPtr = NULL;myPtr = alloc(sizeof(int));If(myPtr == NULL) exit(1);//Some codefree(myPtr);

C++ CodeInt* myPtr = NULL;myPtr = new(sizeof(int));If(myPtr == NULL) exit(1);//Some codedelete(myPtr);

myPtr

Points here after malloc and new

?????

Page 44: [Yidlug] programming languages differences, the underlying implementation

Heap vs Stack Summary• All types of objects can be allocated both on the

stack and on the heap• To use the heap we must have a pointer and then

call “malloc” or “new”• Objects created on the stack are automatically

destroyed when the function exists (unless with the keyword static in C which actually creates in the data section)

• Objects created on the heap, must be explicitly freed with “free”, or “malloc”– IF the pointer is destroyed then we are stuck (or if we

just forgat)

Page 45: [Yidlug] programming languages differences, the underlying implementation

Copying

.data

a = 1 b = a c* = &a d* = c

00000001

00000001

Variable a @(01010101)

Variable b @ (11110000)Code

01010101Variable c @ (11110011)

01010101Variable d @ (11111111)

Indirect

Page 46: [Yidlug] programming languages differences, the underlying implementation

Copying And Then Changing

.data

a = 1 b = a b = 2 //Not affecting a c* = &a d* = c

00000001

00000010

Variable a @(01010101)

Variable b @ (11110000)Code

01010101Variable c @ (11110011)

01010101Variable d @ (11111111)

Indirect

Page 47: [Yidlug] programming languages differences, the underlying implementation

Copying Pointers And Then Changing Value

.data

a = 1 b = a b = 2 //Not affecting a c* = &a d* = c (*d) = 3

00000011

00000010

Variable a @(01010101)

Variable b @ (11110000)Code

01010101Variable c @ (11110011)

01010101Variable d @ (11111111)

Indirect

Page 48: [Yidlug] programming languages differences, the underlying implementation

Copying Pointers And Then Changing Pointer Value

.data

a = 1 b = a b = 2 //Not affecting a c* = &a d* = c (*d) = 3 d = &b(*d) = 4

00000011

00000100

Variable a @(01010101)

Variable b @ (11110000)Code

01010101Variable c @ (11110011)

01010101Variable d @ (11111111)

Indirect

Page 49: [Yidlug] programming languages differences, the underlying implementation

Copying Summary• Copying value variables, just copy the value,

any change after to one does NOT affect to the other

• Copying pointer variables, both point to the same value variable– Any change to the value variable via one of the

pointers IS reflected to the other– Any changes to the pointer value of one is NOT

reflected by the other pointer, and also not affect the original value variable

Page 50: [Yidlug] programming languages differences, the underlying implementation

Pointer vs Reference• Problems with pointers, stack and heap so far:– A pointer can be directly (by intention or

accidently) changed to access any memory– Failure to free the allocated heap results in a

memory leak– Forgetting to free the allocated heap till the

function returns, then there is no longer a way to free it

– Creating a large object on the stack is not worth, and of course not to pass by value

Page 51: [Yidlug] programming languages differences, the underlying implementation

References• Reference is the same as a pointer but without

the ability to change the memory address, only possible to change pointing from one object of the same type to another

Codeobject myVar = new object();object myVar2 = myVar;//myVar2 points to the same object as myVarmyVar = new object();//Now myVar points to a new object, while myVar2 points to the old

• A primitive variable is in general a value type and cannot be NULL

• A object is generally a reference and is NULL as long there is no call to “new()”

• Now the compiler can keep track of what is referenced by what

Page 52: [Yidlug] programming languages differences, the underlying implementation

Garbage Collection• Garbage Collection means to free heap memory

no longer in use• In COM and VB6 (based on COM) it is done by

keeping a count of the variables that still reference the object– For each new reference it is incremented, and for

each out of scope or null or just change in reference it is decremented

– When the reference count reaches 0 it is deleted– Problem if two objects just reference each other, they

will never be deleted• In Java and .Net the JVM and CLR keep track of

objects that are not referenced by real references

Page 53: [Yidlug] programming languages differences, the underlying implementation

Null Pointer And DB Null• A pointer or reference not pointing anywhere is

pointing to null• In C NULL is a constant with the value 0• 0 == Null (in C) and null == null• NULL + “SomeString” = “SomeString” – (but not in Linq To SQL!!! WATCH OUT)

• In Database it is different NULL is nothing • NULL = NULL is also NULL, use IS NULL• NULL + “someString” is NULL• WHERE NOT IN(NULL) is corrupting everything• SQL Server has an option “SET ANSI NULLS OFF”, and

MySQl has an operator <=> to compare Nulls

Page 54: [Yidlug] programming languages differences, the underlying implementation

NaN• Division of an integer by 0 is an error• Division of a floating point by 0 results in Nan

• Check for Nan by checking

• This does not work for a database NULL

IEEE

NaN == NaN = falseNaN != NaN = false

Java

NaN == NaN = falseNaN != NaN = true

If((myVar == myVar) == false)If(myVar.IsNaN())

NULL == NULL = NULLNULL != NULL = NULL

Page 55: [Yidlug] programming languages differences, the underlying implementation

Characters

Page 56: [Yidlug] programming languages differences, the underlying implementation

String

Page 57: [Yidlug] programming languages differences, the underlying implementation

String Quoting

Page 58: [Yidlug] programming languages differences, the underlying implementation

Pass By Value vs Pass By Reference

Page 59: [Yidlug] programming languages differences, the underlying implementation

Immutable Strings