peeping tom in the neighborhood keystroke eavesdropping on multi-user systems usenix 2009 kehuan...
TRANSCRIPT
Peeping Tom in the Neighborhood
Keystroke Eavesdropping on Multi-User Systems
USENIX 2009Kehuan Zhang, Indiana University, BloomingtonXiaoFeng Wang, Indiana University,
Bloomington
Agenda
2
OverviewAssumptionImplementationExperimentConclusion
OverviewFor some command such as ps or top, they
need some information about the processThe virtual file system procfs, which discloses
such information, locates at /proc/<pid>/statOur attack take advantage of the stack
information of a process to infer keystrokes• Specially ESP 、 EIP
3
Overview (cont.)
4
For some command such as ps or top, they need some information about the process
The virtual file system procfs, which discloses such information, locates at /proc/<pid>/stat
Our attack take advantage of the stack information of a process to infer keystrokes• Specially ESP 、 EIP
Fig. 1: The sketch of keystroke extraction and recognition
AssumptionCapability to execute programMulti-core systemAccess to the victim’s informationAttacker can obtain some victim’s typing
sample as training data
5
Implementation
6
Pattern extractionTrace loggingGet inter-timingKeystroke analysis
Fig. 1: The sketch of keystroke extraction and recognition
Implementation
7
Pattern extractionTrace loggingGet inter-timingKeystroke analysis
Fig. 2: Steps about keystroke pattern extraction
Implementation (cont.)
8
Pattern extractionTrace loggingGet inter-timingKeystroke analysis
Fig. 3: Steps about trace logging and getting inter-timing
Implementation (cont.)
9
Pattern extractionTrace loggingGet inter-timingKeystroke analysis
Fig. 4: Steps about keystroke analysis
Pattern extractionDeterministic program• Same input cause the same output, such as vim• Use strace to get all system call sequences, then
extract the difference• False positive check
Non-deterministic program• Same input could cause different outputs, almost
all GUI programs are non-deterministic• An instruction level analysis tool to the function gtk_main_do_event(event) to get it’s event
10
Trace logging
11
Attacker’s shadow program keep monitor on /proc/<pid>/stat• That’s why we need multi-core system• However, the log won’t be complete
Avoid detection• Decrease the sample rate• Hide CPU usage
Fig. 3: Steps about trace logging and getting inter-timing
Get inter-timing
12
Use Longest Common Subsequence (LCS) algorithm to compare log with pattern• Ignore ASLR by normalize ESP pattern
Use a time duration to get only consecutive keystroke pattern
Fig. 5: Pattern matching Fig. 6: Using time duration
Keystroke analysis
13
Now, we have got inter-timing sequencesWe use Hidden Markov Model (HMM) to guess
what victim input and list 4500 candidates• N-Viterbi algorithm: use conditional probability• Average all probabilities• M-N-Viterbi algorithm: use conditional probability
Fig. 4: Steps about keystroke analysis
ExperimentEnvironment• Intel Core 2 Duo E6700, 3GB RAM• Red Hat Linux Enterprise 4.0, Debian 4.0, and
Ubuntu 8.04Evaluation on three public server• A Linux workstation in a public machine room
(Server 1)• A web server of Indiana University that allows SSH
connections from its users (Server 2)• A server for students’ course projects (Server 3)• 72-hour monitoring on these servers that user
number range from 1 to 2414
Experiment (cont.)
15
Fig. 11: CPU usage of three real world server during 72 hours
Fig. 10: Percentage of keystroke detected versus CPU usage
Experiment (cont.)
16
Speculating passwords• Training: 15 training keys, each has 13 letters and
2 digits, totally 225 key pairs. We detect 45 inter-timings for each of these pairs from a user
• Evaluation: select 3 passwords from the space of all possible 8-bytes sequences formed by 15 characters. Our HMM output 4500 candidates
Experiment (cont.)
17
Speculating passwords• Training: 15 training keys, each has 13 letters and 2
digits, totally 225 key pairs. We detect 45 inter-timings for each of these pairs from a user
• Evaluation: select 3 passwords from the space of all possible 8-bytes sequences formed by 15 characters. Our HMM output 4500 candidates
Fig. 7: Percentage of space to search before find the right password
Experiment (cont.)
18
Guess English words• Training: use the word frequency of British
national corpus to compute transition probabilities
• Evaluation: random draw a word from 2103 known words with length 3 to 5, then type them
Fig. 8: Time distribution of letter pairs
Experiment (cont.)
19
Guess English words• Training: use the word frequency of British
national corpus to compute transition probabilities
• Evaluation: random draw a word from 2103 known words with length 3 to 5, then type them
Fig. 8: Time distribution of letter pairs Fig. 9: Success rate on English word
ConclusionInformation leak: one can get others’
keystrokes without any special permissionTrade-off between convenience and securityContribute for keystrokes detection and
extraction method on almost all distributions of Linux
20
Future workMore precise detection method for non-
deterministic programsWay to detect keystrokes when system calls
are not immediately triggered by keystrokesBetter algorithm to identify English wordsUtilize more information to infer other events,
such as mouse moving
21
The End