aunifiedplatformforinteractivenetworkforensics matthias ... · 1258531.. cteurv1.. 192.168.1.103...
TRANSCRIPT
VASTA Unified Platform for Interactive Network Forensics
Matthias Vallentin1,2 Vern Paxson1,2 Robin Sommer2,3
1UC Berkeley
2International Computer Science Institute (ICSI)
3Lawrence Berkeley National Laboratory (LBNL)
March 17, 2016
USENIX NSDI
1 / 28
Omnipresent Data Breaches
2 / 28
Breach Timeline
Compromise Forensics
Detection
Time
3 / 28
Breach Timeline
Compromise
Detection
Time
3 / 28
Breach Timeline
Compromise
Detection
Time
?
3 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
Organization
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Log Example — Bro Connection Log
#separator \x09#set_separator ,#empty_field (empty)#unset_field -#path conn#open 2016-01-06-15-28-58#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_..#types time string addr port addr port enum string interval count count string bool bool count string1258531.. Cz7SRx3.. 192.168.1.102 68 192.168.1.1 67 udp dhcp 0.163820 301 300 SF - - 0 Dd 1 329 1 328 (empty)1258531.. CTeURV1.. 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CUAVTq1.. 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CYoxAZ2.. 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - - 0 D 3 644 0 0 (empty)1258531.. CvabDq2.. 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - - 0 D 2 404 0 0 (empty)1258531.. CViJEOm.. 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CSC2Hd4.. 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - - 0 D 3 633 0 0 (empty)1258531.. Cd3RNm1.. 192.168.1.103 68 192.168.1.1 67 udp dhcp 0.044779 303 300 SF - - 0 Dd 1 331 1 328 (empty)1258531.. CEwuIl2.. 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - - 0 D 1 229 0 0 (empty)1258532.. CXxLc94.. 192.168.1.104 68 192.168.1.1 67 udp dhcp 0.002103 311 300 SF - - 0 Dd 1 339 1 328 (empty)1258532.. CIFDQJV.. 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - - 0 Dd 1 64 1 243 (empty)1258532.. CXFISh5.. 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - - 0 Dd 1 64 1 243 (empty)1258532.. CQJw4C3.. 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - - 0 D 2 329 0 0 (empty)1258532.. ClfEd43.. fe80::219:e3ff:fee7:5d23 5353 ff02::fb 5353 udp dns 0.100371 273 0 S0 - - 0 D 2 369 0 01258532.. C67zf02.. 192.168.1.103 137 192.168.1.255 137 udp dns 3.873818 350 0 S0 - - 0 D 7 546 0 0 (empty)1258532.. CG1FKF1.. 192.168.1.102 137 192.168.1.255 137 udp dns 3.748891 350 0 S0 - - 0 D 7 546 0 0 (empty)1258532.. CNFkeF2.. 192.168.1.103 138 192.168.1.255 138 udp - 2.257840 348 0 S0 - - 0 D 2 404 0 0 (empty)1258532.. Cq4eis4.. 192.168.1.102 1173 192.168.1.1 53 udp dns 0.000267 33 497 SF - - 0 Dd 1 61 1 525 (empty)1258532.. CHpqv31.. 192.168.1.102 138 192.168.1.255 138 udp - 2.248843 348 0 S0 - - 0 D 2 404 0 0 (empty)1258532.. CFoJjT3.. 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.099824 273 0 S0 - - 0 D 2 329 0 0 (empty)1258532.. Cc3Ayyz.. fe80::219:e3ff:fee7:5d23 5353 ff02::fb 5353 udp dns 0.099813 273 0 S0 - - 0 D 2 369 0 0
5 / 28
Existing Solutions
MapReduce (Hadoop)
3 Scalability
7 Batch-oriented: no iterative, exploratory analysis
In-Memory Cluster Computing (Spark)
3 E�cient & complex analysis
7 Thrashing when working set does not fit in aggregate memory
6 / 28
Existing Solutions
MapReduce (Hadoop)
3 Scalability
7 Batch-oriented: no iterative, exploratory analysis
In-Memory Cluster Computing (Spark)
3 E�cient & complex analysis
7 Thrashing when working set does not fit in aggregate memory
6 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
VAST Architecture — Single Machine
8 / 28
VAST Architecture — Single Machine
importer
archive
index
exporter
node
source sink
10.0.0.1 10.0.0.254 53/udp10.0.0.2 10.0.0.254 80/tcp
8 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
index
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
index
10.0.0.2 80/tcp
append datato bitmap index
10.0.0.1 53/udp
type
9 / 28
VAST Architecture — Index
partition
index
partition partition
meta index
10 / 28
VAST Architecture — Index
partition
index
partition partition
meta index
conn
10.0.0.2 53/udp 8.8.4.4 53/udp “dns”
indexer
10 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
sink
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
sink
10.0.0.1 53/udp10.0.0.2 80/tcp…
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
render results
11 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
Indexing Basics — Tree Indexes
13 / 28
Indexing Basics — Composition
( )� �
14 / 28
Indexing Basics — Composition
( )� �
14 / 28
Indexing Basics — Inverted Index
10 2 3 4 5 6 7 8 9
3
1
4
8
9
5
0
4
2
5
6
2
A B C D
15 / 28
Indexing Basics — Bitmap Index
0 1 1 0
1
10 2 3 4 5 6 7 8 9
0
0
1
1
0
1
2
3
4
5
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
A B C D
16 / 28
Indexing Basics — Bitmap Index
10 2 3 4 5 6 7 8 9
012345
A B C D
16 / 28
Indexing Basics — Bitmap Composition
17 / 28
Indexing Basics — Bitmap Composition
X 2 192.168.0.0/24 Y � 60s
17 / 28
Indexing Basics — Bitmap Composition
X 2 192.168.0.0/24 Y � 60s
�
17 / 28
Indexing Challenges
High-cardinality valuesI Represent millions of distinct values compactly
I Provide low-latency lookups
High-level operationsI Support type-specific operations
I Relational operators: {<, , =, 6=, �, >, 2, /2}
18 / 28
Query Language
Boolean ExpressionsI Conjunctions &&
I Disjunctions ||
I Negations !I Predicates
I LHS op RHSI (expr)
ExamplesI A && B || !(C && D)
I orig h in 10.0.0.1 && &time < now - 2h
I &type == "conn" || "foo" in :string
I duration > 60s && service == "tcp"
ExtractorsI &tag
I x.y.z
I :type
Relational OperatorsI <, <=, ==, >=, >
I in, ni, [+, +]
I !in, !ni, [-, -]
I ⇠, !⇠
ValuesI T, F
I +42, 1337, 3.14
I "foo"
I 10.0.0.0/8
I 80/tcp, 53/?
I {1, 2, 3}19 / 28
Data Model
TYPE
record
vector set
table
KEY VALUE
TYPETYPE
field 1
TYPE
field n
TYPE
…
container types
basic types
compound types
recursive types
bool
int
count
real
duration
time
string
pattern
address
subnet
port
none
20 / 28
Data Model
TYPE
record
vector set
table
KEY VALUE
TYPETYPE
field 1
TYPE
field n
TYPE
…
container types
basic types
compound types
recursive types
bool
int
count
real
duration
time
string
pattern
address
subnet
port
none
20 / 28
Bitmap Index for IP Addresses
192.168.0.42
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
X 2 192.168.0.0/27
21 / 28
Bitmap Index for IP Addresses
X 2 192.168.0.0/27
�
21 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
Data Set
Single-Machine
Data:
I 10M packets from a 24-hourtrace (5 fields/event)
I 3.4M derived Broconnection logs (20fields/event)
Machine:
I 2 ⇥ 8-core Intel Xeon CPUs
I 128GB RAM
I 4 ⇥ 3TB SAS 7.2K disks
I 64-bit FreeBSD
ClusterData:
I 1.24B Bro connection logs(152GB)
I Split into N slices for Nnodes
I N 2 [1, 24]
Nodes:
I 2 ⇥ 8-core Intel Xeon CPUs
I 12GB of RAM
I 2 ⇥ 500MB SATA disks
I 64-bit FreeBSD
22 / 28
Data Set
Single-Machine
Data:
I 10M packets from a 24-hourtrace (5 fields/event)
I 3.4M derived Broconnection logs (20fields/event)
Machine:
I 2 ⇥ 8-core Intel Xeon CPUs
I 128GB RAM
I 4 ⇥ 3TB SAS 7.2K disks
I 64-bit FreeBSD
ClusterData:
I 1.24B Bro connection logs(152GB)
I Split into N slices for Nnodes
I N 2 [1, 24]
Nodes:
I 2 ⇥ 8-core Intel Xeon CPUs
I 12GB of RAM
I 2 ⇥ 500MB SATA disks
I 64-bit FreeBSD
22 / 28
Queries
23 / 28
Performance – Index Latency
●
● ●
●
●
● ● ● ● ● ● ● ●●
● ●
0
2
4
6
8
10
12
14
16
4 8 12 16Cores
Late
ncy
(sec
onds
) Query● A
BCDEFGHI
24 / 28
Performance — Scaling
Import
●
●
●
●
●
●
●
0.5
1.0
1.5
2.0
5 10 15 20 25Nodes
1 / U
tiliz
atio
n
Export
●
●●
●●●
0.5
1.0
1.5
2.0
2.5
5 10 15 20 25Nodes
Late
ncy
(sec
onds
)
25 / 28
Details in the paper
26 / 28
Conclusion
Network Forensics ChallengesI Explorative high-dimensional search
I Disparate data access
I Massive data volumes
VAST: Visibility Across Space and TimeI Platform for network forensics
I Interactive & iterative search
IInter-machine and intra-machine scaling
I Open-source, permissive license (BSD)
27 / 28