50.530: software engineering sun jun sutd. week 1: introduction
TRANSCRIPT
50.530: Software Engineering
Sun JunSUTD
Week 1: Introduction
ABOUT THIS COURSE
Dr. Sun, Jun
• Software Engineering
• Formal Methods
• Program Analysis
• Cyber-Security
Undergrad, PhD from NUS
LKY Postdoc
Assistant Prof., ISTD
[email protected] 3, room 9Facebook: sunjunhqqweChat: sunjunProf
Course Communication
• All class materials are on the course website – Lecture slides– Course project
• Q&A– https://piazza.com/class/hyz5ohayntd5bk– Email/WeChat/Facebook
• [email protected]• WeChat: sunjunprof• Facebook: sunjunhqq
Course Structure
• Cohort class– Every Monday 10-12
• Recitation– Every Monday: 3-4
• Course Project: (60%) • Final Exam: 10 – 12, Dec 19 (30%)
INTRODUCTION TO SOFTWARE ENGINEERING
Software Engineering
User Requirements
System Implementation
the magical programming machine
***The synthesis problem (i.e., synthesizing a program from a specification automatically) is undecidable
Software Engineering
User Requirements
System Implementation
The species we called programmers
A Programmer’s Life
Staged Approach
User Requirements
System Specification
System Design
System Implementation
Specification is equivalent to requirements?
Design satisfies the specification?
The design is correctly implemented?
***The verification problem (i.e., verifying whether a program satisfies certain property) is undecidable too – but easier than the synthesis problem.
Are we getting the right requirements?
12
Requirements
• During the requirements workflow, the primary activities include – Listing candidate requirements– Understanding the system context through domain
modelling and business modelling– Capturing functional as well as non-functional Requirements
• Requirements should be captured in the language of the user. – Use cases help distil the essence of requirements as sets of
action-response transactions between the user and the system.
13
Requirements
14
Analysis
• A key theme of the analysis workflow is to understand how and where requirements interact and what it means for the system.
• Analysis also involves – Detecting and removing ambiguities and
inconsistencies amongst requirements– Developing an internal view of the system– Identifying the analysis classes and their collaborations
• Analysis classes are preliminary placeholders of functionality
15
Analysis
Related Research
• Proposing formal specification languages – The Z language, VDM, the B language, etc. – CSP, CCS, etc.
• Providing facilities for programmers to write specification– Java modeling language
So far nothing has been working.
17
Design
• Deciding on the collaboration between components lies at the heart of software design.– A component fulfils its own responsibility through
the code it contains.– A component exchanges information by calling
methods on other components, or when other components call its own methods.
18
Design
• The design workflow involves – Considering specific technologies– Decomposing the system into implementation
units, – Engaging in high-level and low-level designs
19
Design
20
Implementation
• A large part of implementation is programming.
• Implementation also involves – Unit testing– Planning system integrations– Devising the deployment model
21
Implementation
22
Testing
• The primary activities of the test workflow include – Creating test cases,– Running test procedures, and analysing test
results. • Due to its very nature, testing is never
complete.
23
Test
Real-World Bugs
http://en.wikipedia.org/wiki/List_of_software_bugs
Staged Approach
User Requirements
System Specification
System Design
System Implementation
Missing
Missing
Ad Hoc
BUGGY
Research Questions
• How do we facilitate users to write the specification?
• How do we help users to formally document system designs?
People tried and people failed
Research Questions
• How do we help programmers debugging? • How do we verify a given program?
The course is about debugging and verification, and many smaller questions that are related.
A Big Viewthe space of all program behaviors
the behaviors we wanted
The synthesis problem: How do we find a program to cover (part of) A?
A Big Viewthe space of all program behaviors
the behaviors we wanted
The verification problem: Is C empty?
A BC
the behaviors we have
A Big Viewthe space of all program behaviors
the behaviors we wanted
The Debugging problem: how to find where the problem is and change the program so that C is empty?
A BC
the behaviors we have
COURSE PLANNING
Date Topic RemarksSep 15 IntroductionSep 22 Automatic Testing Sep 29 Delta DebuggingOct 13 Bug LocalizationOct 20 Specification MiningNov 3 Race Detection Nov 10 Hoare Logic and ProvingNov 17 Invariant GenerationNov 24 Symbolic ExecutionDec 1 Software Model Checking Dec 8 Assume Guarantee ReasoningDec 19 Final Exam
Course Outline
Debugging
Verification
• Monday 10 – 12: I will introduce one or two approaches proposed (for the topic that week) in the literature. – In class exercises will be there
• Monday 3 – 4: We will discuss:– When the approaches work – When they do not work– How to make them better
Class Format
• Pick one of the topics covered in the following 10 classes;
• Conduct a survey on related work on that topic;
• Propose an improved approach;• Write a research paper;
Project
• Title/Abstract– catchy, to the point, not too abstract or detailed
• Section 1: Introduction– Start with motivation – Explain your approach at a high level intuitively
• Section 2: A Running Example – Use an interesting example to illustrate your approach step-by-step
• Section 3: Detailed Approach– Explain how each step of the approach is done; highlight the technical challenges and
remedies• Section 4: Evaluation
– Show evidence on how the proposed approach would work on real-world programs – (Optional) Implementation of your approach
• Section 5: Related Work– Survey related work and make a fair comparison with the proposed one
Research Paper
Real-world Examples
• For debugging, – http://sir.unl.edu/content/sir.php
• For verification,– http://sv-comp.sosy-lab.org/2015/
• For some other topics, – http://find-your-own.com
Project Due Dec 18
UNDERSTANDING PROGRAMMING
Programs
p(i) = o
program input output
Programs
Java Programs
Bytecode
JVM
Physical Machine
Motivational Example
NSA actually intercepted a RSA-encrypted secrete message which tells the location of a terrorist act, we believe that the act is going to happen one week from now, we need your help in decrypting the message.
Task: Write a Java program to factor a number as the product of two prime numbers.
Task Breakdown
• Requirements/Specification – given a semi-prime, your program outputs its
prime factors within certain time
green: pre-condition red: post-conditionpurple: non-functional requirement
Correctness: pre-condition => post-condition
Task Breakdown
• Design– Use the trial division method – Read: http://en.wikipedia.org/wiki/Trial_division– More: http://
en.wikipedia.org/wiki/Integer_factorization• Implementation– “Enough talk, let’s fight” (Kong Fu Panda)
Exercise 2
Write a Java program such that given a semi-prime, outputs its prime factors.
Hint: You need to use the BigInteger class.
FactorPrime.java
Task Breakdown
• Testing– 4294967297 (famous Fermat Number)– 1127451830576035879– 160731047637009729259688920385507056726966793490579598495689711866432421212774967029895340327
197901756096014299132623454583177072050452755510701340673282385647899694083881316194642417451570483466327782135730575564856185546487053034404560063433614723836456790266457438831626375556854133866958349817172727462462516466898479574402841071703909138062456567624565784254101568378407242273207660892036869708190688033351601539401621576507964841597205952722487750670904522932328731530640706457382162644738538813247139315456213401586618820517823576427094125197001270350087878270889717445401145792231674098948416888868250143592026973853973785120217077951766546939577520897245392186547279572494177680291506578508962707934879124914880885500726439625033021936728949277390185399024276547035995915648938170415663757378637207011391538009596833354107737156273037494727858302028663366296943925008647348769272035532265048049709827275179381252898675965528510619258376779171030556482884535728812916216625430187039533668677528079544176897647303445153643525354817413650848544778690688201005274443717680593899
• Verification: how to show it always works?
Understanding Sequential Programs
“A program consisted of a sequence of instructions (and a memory), where each instruction executed one after the other (to modify the memory, etc.). It ran from start to finish on a single processor.”
“The sequential paradigm has the following two characteristics: the textual order of statements specifies their order of execution; successive statements must be executed without any overlap (in time) with one another.”
int previousMax;
public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length; i++) { if (max < list[i]) { max = list[i]; } }
previousMax = max; return max;}
The Illusionint previousMax;
0. public int max (int[] list) {1. int max = list[0]; 2. for (int i = 1; 3. i < list.length; 4. i++) {5. if (max < list[i]) {6. max = list[i];7. }8. }
9. previousMax = max;10. return max;11. }
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
Control Flow Graph
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax …
input …
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 2
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 2
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 2
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 2
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 4
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 4
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 4
i 1
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 4
i 2
System Execution
list = …
0
max = list[0]
1
2
3
i = 1
5
9i >= list.length
i < list.length
4max >= list[i]
6
max < list[i]
max = list[i] 78
i++10
previous=max
11
return max
…
previousMax …memorypreviousMax 0
input [2,4]
list [2,4]
max 4
i 2
The Trace
• With input = [2,4]
0 1 2 3 5 6
7843910
11 …
i : a configuration of the program with control at line i
The Trace
• With input = [4,2]
0 1 2 3 5
7843910
11 …
i : a configuration of the program with control at line i
Sequential Programming is Easy
• It is deterministic: with one input, there is one deterministic path through control flow graph
0
1
2
3
input1
0
1
2
3
input2
0
1
2
3
input3
0
1
2
3
input4
0
1
2
3
input5
…
Testing is to find the ‘right’ input
Concurrent Programs
p(i, sc) = o
program input output
scheduling
Concurrency: Benefit
• Better resource utilization – With k processors, ideally we can be k times faster,
if the task can be broken into k independent pieces and if we ignore the cost of task decomposition and communication between the processors
Read file A Process A Read file B Process B
time
Processor:
We can factorize the semi-prime faster with multiple computers or cores
Concurrency: Benefit
• Better resource utilization – With k processors, ideally we can be k times faster,
if the task can be broken into k independent pieces and if we ignore the cost of task decomposition and communication between the processors
• Can we get better performance with 1 processor only?
Read file A
Process A
Read file B
Process B
time
Processor 1:
Processor 2:
Read file A
Process A
Read file B
Process B
time
Processor:
Concurrency: Cost
• More complex design, implement, testing, verification
public class Holder { private int n; public Holder(int n) { this.n = n; } public void assertSanity() { if (n != n) throw new AssertionError("This statement is false."); } }
• Overhead in task decomposition, communication, context switch
• Increased resource consumption
Will the exception occur?
Distributed Systems
• Each process has its own memory and processes communicate through messaging.
CPU
Memory
CPU
Memory
CPU
Memory
…
…
Network
messagesmessagesmessages
Multi-core Processors
• Each thread has its cache and threads communicate through a shared memory.
CPU
Cache
CPU
Cache
CPU
Cache
…
…
Memory
Multi-core Computer: More Like This
Multi-Threaded Program
• Write a program such that N threads concurrently increment a static variable (initially 0) by 1. Set N to be 2 and see what is the value of the variable after all threads are done.
FirstBlood.java
Scheduling
threads
Scheduler
Thread1 Thread2 Thread3 Thread4
The scheduler is ‘un-predictable’
Scheduling/Interleaving
0
1
2
3
thread1
0
1
2
3
thread200
01 10
02 11 20
03 12 21 30
22 3113
23 32
33
There are exponentially many sequences.
Is This Real?
0
1
Thread10
1
Thread2
count++ count++
00
01 10
11
count = 0
count = 1 count = 1
count = 2
This is assuming that count++ is one step. Or is it?
Reality is Messy
Java Programs
Bytecode
JVM
Physical Machine
What are the atomic steps?
What are the order of execution?
What and where are the variable values?
What Really Happened?
0
1
2
3
Thread1
read value of Count and assign it to a register
Increment the register
Write the register value back to Count
0
1
2
3
Thread2
read value of Count and assign it to a register
Increment the register
Write the register value back to Count
For double type, even read/write is not atomic!
What Really Happened?
0
1
2
3
Thread1
r1
i1
w1
0
1
2
3
Thread2
r2
i2
w2
00
01 10
02 11 20
03 12 21 30
22 3113
23 32
33
r2
i2
w2
r1
i1
w1
r1
i1
w1
r2
i2
w2
What Really Happened?
0
1
2
3
Thread1
r1
i1
w1
0
1
2
3
Thread2
r2
i2
w2
00
01 10
02 11 20
03 12 21 30
22 3113
23 32
33
r2
i2
w1
r1
i1
w2
count=1
Is this correct?
Concurrency is Hard
• Heisenbug– is a computer programming jargon term for a
software bug that seems to disappear or alter its behavior when one attempts to study it.
• How do we find bugs in a multi-threaded program or show that there is no bug?
Date Topic RemarksSep 15 IntroductionSep 22 Automatic Testing Sep 29 Delta DebuggingOct 13 Bug LocalizationOct 20 Specification MiningNov 3 Race Detection Concurrency*Nov 10 Hoare Logic and ProvingNov 17 Invariant GenerationNov 24 Symbolic ExecutionDec 1 Software Model Checking Dec 8 Assume Guarantee Reasoning Concurrency*Dec 19 Final Exam Project Due Dec 18
Course Outline
Exercise 3
• Write a multi-threaded program to factor semi-prime. Argue that it is correct.
FactorThread.java
Reading Materials
• References:– “Checking a Large Routine” by Turing– “The Humble Programmer” by Dijkstra– “
No Silver Bullet: Essence and Accidents of Software Engineering” by Brooks