Internet Censorship:
Great Firewall of China (GFC)
Jong C. [email protected]
Internet Censorship:
Great Firewall of China (GFC)
Jong C. [email protected]
Contents
• Internet Censorship• Censoring Mechanisms• How to circumvent• How to circumvent• Censorship Researches• Our Work @ UNM
Censoring Mechanisms
Censorship Researches
Information throttled
• Censorship observed• Slashdot• Open Net Initiative• Reporters Without Borders•• Human Right Watch• UN, etc.......• Censorship in burst• 3 out of 41 in ‘02 -> more than 25 in ‘09• can view at ONI for more details• political, social, security/confilict, Internet tools
Information throttled
Reporters Without Borders
> more than 25 in ‘09
can view at ONI for more details
political, social, security/confilict, Internet tools
Generic Filtering• Maximillian Dornseif, Germany, • from “Government mandated blocking of foreign Web content”
• Packet-level filtering• mostly done by firewall (rules), but need up-to-dateup-to-date
• OSI layer 3 filtering• inspects header and forwards/drops• coarse-grained due to high false positive (overblocking)
• OSI layer 4 filteringlayer 3
Generic FilteringMaximillian Dornseif, Germany,
from “Government mandated blocking of foreign Web content”
level filtering
mostly done by firewall (rules), but need
OSI layer 3 filtering
inspects header and forwards/drops
grained due to high false
positive (overblocking)
OSI layer 4 filtering: finer-grained than
Generic Filtering
(cont’d)• Application-level filtering• inspects payload and performs most detailed• often provides ways of informing users about the filteringthe filtering
• hard to be done in real• encryption/compression makes infeasible
Generic Filtering
level filtering
inspects payload and performs most detailed
often provides ways of informing users about
hard to be done in real-time due to overhead
encryption/compression makes infeasible
Generic Filtering
(cont’d)• Filtering mechanisms• proxy firewall (HW requirement, latency)• IDS (signature-based and/or heuristic)•• DNS poisoning• help from search engines• web-site delisting• keyword filtering• keyword forgery
Generic Filtering
Filtering mechanisms
proxy firewall (HW requirement, latency)
based and/or heuristic)
help from search engines
Circumvent Censorship• Discard TCP RST• Encrypt/Compress• Use proxy server (web cache)• Use publicly accessible DNS server (caveat: • Use publicly accessible DNS server (caveat: security)
• Tunneling such as ICMP, SSH, SSL, or VPN• Add comment in HTML• Use captchas• etc
Circumvent Censorship
Use proxy server (web cache)
Use publicly accessible DNS server (caveat: Use publicly accessible DNS server (caveat:
Tunneling such as ICMP, SSH, SSL, or VPN
Add comment in HTML
Great Firewall of China
• Why China?• Most complicated censorship mechanism• China filters to maintain power•• Cyber-sensors and cyber• Hierarchical supervisory bodies• World’s largest IPv6 backbone
Great Firewall of China
Most complicated censorship mechanism
China filters to maintain power
sensors and cyber-police
Hierarchical supervisory bodies
World’s largest IPv6 backbone
GFC (cont’d)• A large number of sites block• Not all about pornography sites• PA state, USA, blocks child pornography• Most advanced keyword filtering• Most advanced keyword filtering• falun gong, human right, Taiwan, Tibet, even foreign cities due to similar pronunciations
• TCP RSTs, specious SYN/ACK
GFC (cont’d)A large number of sites block
Not all about pornography sites
PA state, USA, blocks child pornography
Most advanced keyword filteringMost advanced keyword filtering
falun gong, human right, Taiwan, Tibet, even foreign cities due to similar pronunciations
TCP RSTs, specious SYN/ACK
Zittrain & Edelman
• First studied on blocked web sites in China• from “Internet filtering in China”, IEEE Internet Computing
• Studied filtering methods•• IP address blacklisting• DNS IP address blocking• DNS redirection• URL keyword filtering• HTML response keyword filtering
Zittrain & Edelman
First studied on blocked web sites in China
from “Internet filtering in China”, IEEE Internet Computing
Studied filtering methods
IP address blacklisting
DNS IP address blocking
URL keyword filtering
HTML response keyword filtering
Ignore the GFC
• Seminal work by Richard Clayton et al.• from “Ignoring the Great Firewall of China”, 6th workshop on Privacy Enhancing
Technologies
• Filtered at borders•• Triggers 3 consecutive TCP RSTs (seq, seq+1460, seq+4380)
• Stateless inspection
Ignore the GFC
Seminal work by Richard Clayton et al.
from “Ignoring the Great Firewall of China”, 6th workshop on Privacy Enhancing
Triggers 3 consecutive TCP RSTs (seq,
seq+1460, seq+4380)
ConceptDoppler
• More work done by Jed Crandall et al.• from “ConceptDoppler: a weather tracker for Internet censorship”, ACM CCS, 2007
• Showed GFC as panopticon• used tcptraceroute with increasing TTL• used tcptraceroute with increasing TTL• Also showed 28.3% requests survived to reach server
• Most filtering happens at 1st hop, but deep thru 13th hops beyond borders
• Provides blacklist words using LSA
ConceptDoppler
More work done by Jed Crandall et al.
from “ConceptDoppler: a weather tracker for Internet censorship”, ACM CCS, 2007
Showed GFC as panopticon
used tcptraceroute with increasing TTLused tcptraceroute with increasing TTL
Also showed 28.3% requests survived to reach
Most filtering happens at 1st hop, but deep thru
13th hops beyond borders
Provides blacklist words using LSA
TCP RST patterns
• Work done by N. Weaver et al.• from “Detecting forged TCP RST Packets”, TRUST, 2008
• Studied on forged TCP RST patterns and fingerprintsfingerprints
• Patterns:• D+R, R+D, R+R, S+R, SA+R• Fingerprints:• China has IPID 64, IPID
TCP RST patterns
Work done by N. Weaver et al.
from “Detecting forged TCP RST Packets”, TRUST, 2008
Studied on forged TCP RST patterns and
D+R, R+D, R+R, S+R, SA+R
China has IPID 64, IPID -26, SEQ 1460, etc
Electronic Big Brother
• Work done by J. Karlin et al.• from “/ation-state routing: censorship, wiretapping, and BGP”
• Countries with most information flow: USA, England, and GermanyEngland, and Germany
• China has little effect on interdomain routing• Power Law can be applied
Electronic Big Brother
Work done by J. Karlin et al.
state routing: censorship, wiretapping, and BGP”
Countries with most information flow: USA,
England, and GermanyEngland, and Germany
China has little effect on interdomain routing
Power Law can be applied
Our Work being going...• J. Crandall et al. showed, with a blacklist,GFCs blocks HTML GET requests substantially
• Are GFCs symmetric or asymmetric in HTML filtering?
•• If asymmetric, filtering would be less effective
• How much blocking on HTML response?• What blacklist keywords blocked & • How do GFCs block & how to escape?
Our Work being going...J. Crandall et al. showed, with a blacklist, that GFCs blocks HTML GET requests
Are GFCs symmetric or asymmetric in HTML
If asymmetric, filtering would be less
How much blocking on HTML response?
What blacklist keywords blocked & why?
How do GFCs block & how to escape?
Constraints
• Internet Measurement• No fine-grained protocol support• Privacy and legal issue•• and moreM• China claims world’s largest IPv6 backbone• Huge latency due to 4over6• Little collaboration from IPv6
Internet Measurement
grained protocol support
Privacy and legal issue
China claims world’s largest IPv6 backbone
Huge latency due to 4over6
Little collaboration from IPv6
Keyword Probe
GET server/keyword=
Doesn’t know which direction is blockedM
Keyword Probe
GET server/keyword=falun HTTP/1.1
Doesn’t know which direction is blockedM
Number Probe
• Use a proxy server, a.k.a. a web cache• Use number query instead of keyword• Chinese cannot be guaranteed• Neither can English since some • Neither can English since some abbreviations are blocked
• Bogus response page of ~4KB• Hash will be much better, but file used currently
• Latency generated due to table iteration
Number Probe
Use a proxy server, a.k.a. a web cache
Use number query instead of keyword
Chinese cannot be guaranteed
Neither can English since some Neither can English since some
abbreviations are blocked
Bogus response page of ~4KB
Hash will be much better, but file used
Latency generated due to table iteration
Number Probe
GET server/keyword=
1 means
Doesn’t matter which direction is blockedM
Number Probe
GET server/keyword=1 HTTP/1.1
means falun
Doesn’t matter which direction is blockedM
Setup
• 7 pairs of single server and single client• Scripting probes with wget command + Python code on server
• WGETRC, .wgetrc, or /etc/wgetrc• WGETRC, .wgetrc, or /etc/wgetrc• Scapy used
7 pairs of single server and single client
Scripting probes with wget command + Python
WGETRC, .wgetrc, or /etc/wgetrcWGETRC, .wgetrc, or /etc/wgetrc
Setup
• Python code with libpcap record traffics on both server-side & client
• tcptraceroute recording latency every 10 mins
•• 3 contiguous probes sent• Preliminary tests from late Feb to early Mar• Real experiments from spring recess thru now
Python code with libpcap record traffics on
side & client-side
tcptraceroute recording latency every 10
3 contiguous probes sent
Preliminary tests from late Feb to early Mar
Real experiments from spring recess thru now
QUERY SENT
FSM for Probe
200 OK RCV’D
START200 OK RCV’D
TCP RST RCV’D+ QUERY SENT(+port/2)+ QUERY SENT(-port/2)+ QUERY SENT(+256)
WAIT
FSM for Probe
+ QUERY SENT(+256)+ QUERY SENT(-256)
Exponential backup+ HELLO SENT(+1)
200 OK RCV’D
CLEAR
Proxy Hunt• Chinese broadcast a list of proxy servers• Financial issue• Falun movement• Geographically diverse web caches• Geographically diverse web caches• Use Visualroute server 2008 trial• Returns guess locations
• Proxy life time• Last up to two weeks or so• Some lasts a couple of days
Chinese broadcast a list of proxy servers
Geographically diverse web cachesGeographically diverse web caches
Use Visualroute server 2008 trial
guess locations from ISP info
Last up to two weeks or so
Some lasts a couple of days
What to test?
• 12/20 diverse locations randomly chosen• Most along with east coast (1 partial)• 1 inland location included, but partial• 1/2 hrs to 2 days each run• 1/2 hrs to 2 days each run• Different locations of a single keyword• Beginning/Middle/End• Different response sizes• 600B/4KB/40KB/175KB/350KB• Static vs dynamic pages
What to test?
12/20 diverse locations randomly chosen
Most along with east coast (1 partial)
1 inland location included, but partial
1/2 hrs to 2 days each run1/2 hrs to 2 days each run
Different locations of a single keyword
Beginning/Middle/End
Different response sizes
600B/4KB/40KB/175KB/350KB
Static vs dynamic pages
What to test?
• Keyword threshold (to be tested• How many number of the same/different kind triggers & how?
• Different HTTP protocols (• Different HTTP protocols (• Probably no effect since proxy uses HTTP 1.1
• No STDDEV data available yet (• TCP RSTs trend: how many & how (• Any other ideas?
What to test?
to be tested)
How many number of the same/different
kind triggers & how?
Different HTTP protocols (to be tested)Different HTTP protocols (to be tested)
Probably no effect since proxy uses HTTP
No STDDEV data available yet (being tested)
TCP RSTs trend: how many & how (to be tested)
Map
9
10
11
1
2
34
5
6
7
89
12
IPv 6 Backbone
12
IPv 6 Backbone
21
4
3
7 65
4
8
10
9
11
12
IPv 6 BackboneIPv 6 Backbone
TCP RST
• Depending on location & probe, the number of TCP RSTs received vary
• Sometimes just 1 RST• Sometimes more than 10 RSTs• Sometimes more than 10 RSTs• No automatism at this time (
Depending on location & probe, the number of
TCP RSTs received vary
Sometimes just 1 RST
Sometimes more than 10 RSTsSometimes more than 10 RSTs
No automatism at this time (future work)
Odd Responses
• Unlike HTML GET request, HTML response receives odd packets
• 404 Page Not Found (once)•• 502 Error (58 times)• Proxy error/Bad gateway• 503 Service Unavailable (236 times)• Connection Timed Out (discarded
Odd Responses
Unlike HTML GET request, HTML response
receives odd packets
404 Page Not Found (once)
502 Error (58 times)
Proxy error/Bad gateway
503 Service Unavailable (236 times)
Connection Timed Out (not counted yet):
Odd Responses
• Two kinds• Just 502/503 error received without any preceding/following RSTs
•• Some HELLOs get TCP RSTs (delay)• RST received first and then 502/503 error received: categorized same as TCP RST
• Who sends these? Let’s look at client
Odd Responses
Just 502/503 error received without any
preceding/following RSTs
Some HELLOs get TCP RSTs (delay)
RST received first and then 502/503 error
received: categorized same as TCP RST
Who sends these? Let’s look at client-side!
200 OK
UNM -> China: [SYN] seq=0, TTL=64
China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48
UNM -> China: [ACK] seq=1, ack=1, TTL=64
UNM -> China: GET server/search.php?keynum=0
HTTP/1.0, TTL=64 * 0 means hello
China -> UNM: [ACK] seq=1, ack=138, TTL=48
When a non-blocked keyword is requested
China -> UNM: [ACK] seq=1, ack=138, TTL=48
China -> UNM: [TCP Dup ACK] [ACK
China -> UNM: [TCP segment of a reassembled PDU
China -> UNM: [TCP segment of a reassembled PDU
China -> UNM: [TCP previous segment lost
seq=4262, ack=138, TTL=48
UNM -> China: [ACK] seq=138, ack=2897, TTL=64
UNM -> China: [TCP Dup ACK] [
China -> UNM: [TCP out-of-order
(text/html)
] seq=0, TTL=64
] seq=0, ack=1, TTL=48
] seq=1, ack=1, TTL=64
server/search.php?keynum=0
* 0 means hello
] seq=1, ack=138, TTL=48
blocked keyword is requested
] seq=1, ack=138, TTL=48
ACK] seq=1, ack=138, TTL=48
TCP segment of a reassembled PDU] TTL=48
TCP segment of a reassembled PDU] TTL=48
TCP previous segment lost] [FIN+ACK]
seq=4262, ack=138, TTL=48
] seq=138, ack=2897, TTL=64
] [ACK] seq=138, ack=2897
order] HTTP/1.0 200 OK
TCP RST
UNM -> China: [SYN] seq=0, TTL=64
China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48
UNM -> China: [ACK] seq=1, ack=1, TTL=64
UNM -> China: GET server/search.php?keynum=0
HTTP/1.0, TTL=64 * 0 means falun
When a TCP RST packet is received
HTTP/1.0, TTL=64 * 0 means falun
China -> UNM: [ACK] seq=1, ack=152, TTL=48
China -> UNM: [RST] seq=1, TTL=50
UNM -> China: [FIN+ACK] seq=1, ack=1, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
] seq=0, TTL=64
] seq=0, ack=1, TTL=48
] seq=1, ack=1, TTL=64
server/search.php?keynum=0
* 0 means falun
When a TCP RST packet is received
* 0 means falun
] seq=1, ack=152, TTL=48
TTL=50
] seq=1, ack=1, TTL=64
] seq=134, ack=2954, TTL=64
] seq=134, ack=2954, TTL=64
] seq=134, ack=2954, TTL=64
502,503 Error
UNM -> China: [SYN] seq=0, TTL=64
China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48
UNM -> China: [ACK] seq=1, ack=1, TTL=64
UNM -> China: GET server/search.php?keynum=0
HTTP/1.0, TTL=64 * 0 means falun
When a 503 error message is received
HTTP/1.0, TTL=64 * 0 means falun
China -> UNM: [ACK] seq=1, ack=134, TTL=48
China -> UNM: [TCP previous segment lost
seq=1578, ack=134, TTL=48
UNM -> China: [TCP Dup ACK] [
China -> UNM: [TCP retransmission
a reassembled PDU] , TTL=48
China -> UNM: [TCP retransmission
Unavailable, TTL=48
502,503 Error
] seq=0, TTL=64
] seq=0, ack=1, TTL=48
] seq=1, ack=1, TTL=64
server/search.php?keynum=0
* 0 means falun
When a 503 error message is received
* 0 means falun
] seq=1, ack=134, TTL=48
TCP previous segment lost] [FIN+ACK]
seq=1578, ack=134, TTL=48
] [ACK] seq=134, ack=1, TTL=64
TCP retransmission] [TCP segment of
] , TTL=48
TCP retransmission] HTTP/1.0 503 Service
502,503 Error
• Based on client-side, GFCs seem to intervene?• Probably yes or no depending on preceding/following RSTs
• Let’s look at server-side!•
502,503 Error
side, GFCs seem to intervene?
Probably yes or no depending on preceding/following RSTs
side!
502,503 Error
client proxy
GET
502/503
GETGET
GETOK
OK
502,503 Error
proxy server
GET
GET
OK
OK
502,503 Error
• Probes right before the current probe and right after the current were successful
• i-th keyword (3rd probe): OK• (i+1)-th keyword (1• (i+1)-th keyword (1• (i+1)-th keyword (2
502,503 Error
Probes right before the current probe and right
after the current were successful
probe): OK
th keyword (1st probe): 502 Errorth keyword (1st probe): 502 Error
th keyword (2nd probe): OK
502,503 Error
• What caused this kind of 502/503 error?• Probably proxy misconfigurations• More analysis needed• More analysis needed• Some HELLOs get delayed RSTs• 503 same as 502?• No concrete evidence• Still in analysis
502,503 Error
What caused this kind of 502/503 error?
Probably proxy misconfigurations
More analysis neededMore analysis needed
Some HELLOs get delayed RSTs
No concrete evidence
Odd Packets
• Odd packets are generated by a web cache• Mistakenly thought as TCP RST packets received from GFC on the way from server
to a web cache
•• Then, self-censorship? Definitely no!• Probably due to proxy misconfigurations• Odd packets can also be considered TCP RST? Some are definitely false positive
• Why differs? Can’t explain at this time
Odd Packets
Odd packets are generated by a web cache
Mistakenly thought as TCP RST packets
received from GFC on the way from server
censorship? Definitely no!
Probably due to proxy misconfigurations
Odd packets can also be considered TCP RST?
false positive, though.
Can’t explain at this time
Self-censorship?
UNM -> China: [SYN] seq=0, TTL=64
China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=46
UNM -> China: [ACK] seq=1, ack=1, TTL=64
UNM -> China: GET server/search.php?keynum=0
HTTP/1.0, TTL=64 * 0 means falun
When a self-censorship is observed at Beijing
HTTP/1.0, TTL=64 * 0 means falun
China -> UNM: [ACK] seq=1, ack=152, TTL=46
China -> UNM: [RST] seq=1, TTL=46
UNM -> China: [FIN+ACK] seq=1, ack=1, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64
Proxy ACK’ed server in which TCP RST is
not observed at all.
censorship?
] seq=0, TTL=64
] seq=0, ack=1, TTL=46
] seq=1, ack=1, TTL=64
server/search.php?keynum=0
* 0 means falun
censorship is observed at Beijing
* 0 means falun
] seq=1, ack=152, TTL=46
TTL=46
] seq=1, ack=1, TTL=64
] seq=134, ack=2954, TTL=64
] seq=134, ack=2954, TTL=64
] seq=134, ack=2954, TTL=64
Proxy ACK’ed server in which TCP RST is
Probing Results
• Odd probing results, but the following should be taken into an account:
• Route flutter• HTML request blocking trend varies • HTML request blocking trend varies depending on probing time (diurnal pattern)
• Might be the most efficient way because of distributed filtering on response?
• Can’t explain many phenomenon, but can make good guesses at least
Probing Results
Odd probing results, but the following should
be taken into an account:
HTML request blocking trend varies HTML request blocking trend varies
depending on probing time (diurnal pattern)
Might be the most efficient way because of
distributed filtering on response?
Can’t explain many phenomenon, but can
make good guesses at least
#Block on 4KB
#keywords of ~4KB
250
300
350
0
50
100
150
200
250
#1 #2 #3 #4 #5
#Block on 4KB
#keywords of ~4KB
#6 #7 #8 #9 #10 #11 #12
#Block on 4KB
Zoom-up of #keywords of ~4KB
25
30
35
0
5
10
15
20
25
#1 #2 #3 #4 #5 #6
#Block on 4KB
Zoom-up of #keywords of ~4KB
#6 #7 #8 #9 #10 #11 #12
#Block on 600B
#keywords of ~600B
25
30
35
0
5
10
15
20
25
#1 #2 #4 #5 #6
#Block on 600B
#keywords of ~600B
#7 #8 #9 #10 #11 #12
#Block on 40KB
#keywords of ~40KB
25
30
35
0
5
10
15
20
25
#1 #2 #4 #5 #6
#Block on 40KB
#keywords of ~40KB
#7 #8 #9 #10 #11 #12
#Block on 175KB
#keywords of ~175KB
25
30
35
0
5
10
15
20
25
#1 #2 #4 #5 #6
#Block on 175KB
#keywords of ~175KB
#7 #8 #9 #10 #11
#Block on 350KB
#keywords of ~350KB
50
60
0
10
20
30
40
#1 #2 #4 #5 #6
#Block on 350KB
#keywords of ~350KB
#6 #7 #8 #9 #10 #11
#keywords by size
40
50
60
#Block on different size
0
10
20
30
40
#1 #2 #3 #4 #5 #6
#keywords by size
s0s1s10
#Block on different size
#6 #7 #8 #9 #10 #11 #12
s10s50s100
#Block on different loc
Different Location
250
300
350
0
50
100
150
200
250
s0 s1 s10
#1 #2
#5 #6
#8 #9#11
#Block on different loc
Different Location
s10 s50 s100
#2 #4
#6 #7
#9 #10
#Block on different loc
Zoom-up of Different Location
50
60
#1 #2
#5 #6
#8 #9
0
10
20
30
40
s0 s1 s10
#8 #9#11
#Block on different loc
Zoom-up of Different Location
#4
#7
#10
s50 s100
#10
Nation-wide
0
10
20
30
40
50
60
s0
RST
30
40
50
60
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
RST
0
100
200
300
400
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
RST
0
10
20
30
40
50
60
s0 s1 s10 s50 s100
0
10
20
s0 s1 s10 s50 s100
0
s0 s1 s10 s50 s100
RST
s1 s10 s50 s100
STD DEV
STDDEV-AVG of
#keywords
300
350
0
50
100
150
200
250
#1 #2 #4 #5 #6
STDDEV-AVG of
#7 #8 #9 #10 #11
STD DEV
Zoom-up of STDDEV-AVG
of #keywords by location
40
50
0
10
20
30
#1 #2 #4 #5 #6
Zoom-up of STDDEV-AVG
of #keywords by location
#7 #8 #9 #10 #11
Latency
Latency
800
1000
0
200
400
600
h1 h4 h7 h10
h13
h16
h19
Latency
#1
#2
#3
#4
h19
h22
h25#4
#5
#6
#7
#8
#9
#10
#11
#12
Latency
Latency
500
600
200
300
400
h1 h4 h7 h10
h13
h16
h19
Latency
#1
#2
#3
#4
h19
h22
h25#4
#5
#6
#7
#8
#9
#10
#11
#12
Beijing on 4KB
30
#keywords by date
at Beijing (#9) with keyword at begin
4/6/2008
6
0
10
20
Beijing on 4KB
#keywords by date
at Beijing (#9) with keyword at begin
4/8/2008
17
False Positive
• Hard to predict, huh?• Need more time/tests• Assume all odd packets false positive•• Let’s visit all graphs again!• False positive rate of ~8.5%• 2236 times not rec’d• 2045 RSTs observed
False Positive
Hard to predict, huh?
Need more time/tests
Assume all odd packets false positive
Let’s visit all graphs again!
False positive rate of ~8.5%
2236 times not rec’d in total since Mar
2045 RSTs observed
STD DEV
#keywords of STDDEV at Beijing (#9)
with keyword at beginning on Apr 25-26
5
6
7
8
0
1
2
3
4
22-48 23-10 23-31 23-51 00-13 00-33 00-54 01-15 01-35 01-56
RST+Others
#keywords of STDDEV at Beijing (#9)
with keyword at beginning on Apr 25-26
4
5
6
7
0
1
2
3
4
22-48 23-10 23-31 23-51 00-13 00-33 00-54 01-15 01-35 01-56
RST only
STD DEV#keywords of STDDEV at Beijing (#9)
with keyword at beginning on Apr 26-28
8
10
0
2
4
6
s0 s1 s10
#keywords of STDDEV at Beijing (#9)
with keyword at beginning on Apr 26-28
s10 s50 s100
Beijing on 4KB
30
#keywords by date
at Beijing (#9) with keyword at begin
30
#keywords by date
at Beijing (#9) with keyword at begin
4/6/20084/8/2008
5
0
10
20
30
4/6/2008
6
0
10
20
30
Beijing on 4KB
#keywords by date
at Beijing (#9) with keyword at begin
#keywords by date
at Beijing (#9) with keyword at begin
4/8/2008
9
4/8/2008
17
STD DEV
Zoom-up of STDDEV-AVG
of #keywords by location
40
50
0
10
20
30
#1 #2 #4 #5 #6
Zoom-up of STDDEV-AVG
of #keywords by location
#7 #8 #9 #10 #11 #12
By Location#keywords
at Guangzhou (#1)
10
20
30
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Hangzhou (#3)
0
10
20
30
40
s0 s1 s10 s50 s100
RST+Others
RST only
#keywords
at Jiangxi (#2)
10
20
30
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Shanghai (#4)
0
10
20
30
s0 s1 s10 s50 s100
RST+Others
RST only
By Location#keywords
at Jiangsu (#5)
0
10
20
30
40
50
60
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Shanxi (#7)
0
10
20
30
s0 s1 s10 s50 s100
RST+Others
RST only
#keywords
at Shandong (#6)
0
10
20
30
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Tianjin (#8)
0
10
20
30
s0 s1 s10 s50 s100
RST+Others
RST only
By Location#keywords
at Beijing (#9)
10
20
30
RST+Others
RST only
#keywords
at Harbin (#11)
0
100
200
300
s0 s1 s10 s50 s100
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Liaoning (#10)
10
20
30
RST+Others
RST only
0
s0 s1 s10 s50 s100
#keywords
at Chengdu (#12)
0
10
20
30
s0 s1 s10 s50 s100
RST+Others
RST only
Dynamic Page (Beijing)
#keywords by dynamic
at Jiangxi (#2)
15
Begin
Middle
0
5
10
s1 s10
Middle
End
Dynamic Page (Beijing)
#keywords by dynamic
at Jiangxi (#2)
s50 s100
Static Page (Beijing)
#keywords by
static
15
0
5
10
s1 s10
Static Page (Beijing)
#keywords by
Begin
Middle
s50 s100
End
Dynamic vs Static:
beginning#keywords by
pages
15
0
5
10
s1 s10
Dynamic vs Static:
#keywords by
static
dynamic
s50 s100
dynamic
Dynamic vs Static: middle
#keywords by pages
at Jiangxi (#2)
15
0
5
10
s1 s10
Dynamic vs Static: middle
#keywords by pages
at Jiangxi (#2)
static
dynamic
s50 s100
dynamic
Dynamic vs Static: end
#keywords by pages
at Jiangxi (#2)
15
0
5
10
s1 s10
Dynamic vs Static: end
#keywords by pages
at Jiangxi (#2)
static
dynamic
s50 s100
Blacklist updated
#keywords by blacklist
50
60
old
0
10
20
30
40
s0 s1
new
Blacklist updated
#keywords by blacklist
s10 s50 s100
new
Filtering Pattern: probabilistic?
filtering pattern
(91 out of 2045)
40
50
0
10
20
30
40
xoo oxo oox xxo
Filtering Pattern:
filtering pattern
(91 out of 2045)
xxo xox oxx xxx
Blacklist
• ~95% blacklist keywords of HTML GET request also blocked in HTML response
• ~54% blocked when excluding Harbin•• Less substantial blocking on HTML response• Probably due to overhead and/or latency• Unlike request, response blacklist seems to be distributed over the nation
• Some GFCs seem to share
~95% blacklist keywords of HTML GET
request also blocked in HTML response
~54% blocked when excluding Harbin
Less substantial blocking on HTML response
Probably due to overhead and/or latency
Unlike request, response blacklist seems to be
distributed over the nation
Some GFCs seem to share
Blacklist (4KB)
11
5
1
10
66
131
more
118
29
358
4353
122
30
7
19
9
118
Blacklist (4KB)
3
6
204
236
8
359
2,3 = ∅∅∅∅
Blacklist (600B)
1, 4, 6, 8, 9, 10
5
19
122
359 18
29
30
358
Blacklist (600B)
11
1, 4, 6, 8, 9, 10
more
7
189
2, 12 = ∅∅∅∅
3: */A
Blacklist (40KB)
11
9
359
4
317
319
more
5,6,
10
18
29
358
359
1122
19
8
6
30
Blacklist (40KB)
320
330
2
200
209
228
12 = ∅∅∅∅
3: */A
Blacklist (175KB)
11
more 18
29
358
819
10
118
182
221
293
1,4,5,
7,9
Blacklist (175KB)
6
359
2 = ∅∅∅∅
3, 12: */A
Blacklist (325KB)
11
5
241
more 18
29
358
7122
6
19
359
1,4,8
Blacklist (325KB)
10
112, 127, 13, 136
177, 18
930
162177, 18
185, 191, 197, 245
48, 50
323, 346
2 = ∅∅∅∅
3, 12: */A
162
178
259
290
Blacklist (all)
104,107,109,135,161,163,204
205,210,213,230,232,236,238
249,256,263,268,285,357,370
40,55,56,73,74,79,84
11
245
10
more
118
9112,127,13,131,136,162,177
178,182,185,191,197,221
259,290,293,323,346,48,50,66
Blacklist (all)
200
2091
7189
5
2
3
18
19,29,122
358,359
30
358
209
228241
1,6
,8
317,319
320,330
353
4
12 = ∅∅∅∅
Blacklist Size
#keywords blocked
in all sizes
300
350
400
0
50
100
150
200
250
300
#1 #2 #3 #4 #5 #6
Blacklist Size
#keywords blocked
in all sizes
#6 #7 #8 #9 #10 #11 #12
Blacklist Size
Zoom-up of #keywords blocked
by location30
0
10
20
#1 #2 #3 #4 #5 #6
Blacklist Size
Zoom-up of #keywords blocked
by location
#6 #7 #8 #9 #10 #11 #12
Conclusion• GFC performs symmetric filtering against HTML
• Less substantial on HTML response probably due to overhead/latency
• Distributed blacklist words•• Share a subset of blacklist words• Route path, packet size, and keyword location affect censorship efficiency
• Stateful vs. stateless TCP RSTs• Inversion suggests applicationcensorship???
• Still can’t explain many phenomenon
GFC performs symmetric filtering against
Less substantial on HTML response probably due to overhead/latency
Distributed blacklist words
Share a subset of blacklist words
Route path, packet size, and keyword location affect censorship efficiency
Stateful vs. stateless TCP RSTs
Inversion suggests application-layer
Still can’t explain many phenomenon
Thank you
• Any questions or feedbacks• Any questions or feedbacksfeedbacks?feedbacks?
Lab 4
• We’re not using an actual IDS, but by • the server is too simple and is just buffering up to 4 KB
•• can beat this by sending a sequence of packets which split keywords
• need to flush first before sending the next• otherwise, TCP RSTs will be triggered
• Lab 4 1/2 will be about TCP fragmentation(?)
We’re not using an actual IDS, but by iptables.
the server is too simple and is just buffering
can beat this by sending a sequence of
packets which split keywords
need to flush first before sending the next
otherwise, TCP RSTs will be triggered
Lab 4 1/2 will be about TCP fragmentation(?)
Socket in Python
• import socket # client• s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
• s.connect((“10.0.0.3”, 8080))• s.connect((“10.0.0.3”, 8080))• s.send(“hello”)• d=s.recv(1024)• print repr(d)• s.close()
Socket in Python
import socket # client
s = socket.socket(socket.AF_INET,
socket.SOCK_STREAM)
s.connect((“10.0.0.3”, 8080))s.connect((“10.0.0.3”, 8080))
Scapy
• TCP fragmentation?• Not a proper jargon! But, TCP supports byte streams and out-of
• Let’s call it for the sake of simplicity.• Let’s call it for the sake of simplicity.• Why not IP fragmentation? RFC1858!• Tiny fragment attack• can use fragroute for smaller IP packets
• can forge packets with Scapy
Not a proper jargon! But, TCP supports byte
of-order delivery.
Let’s call it for the sake of simplicity.Let’s call it for the sake of simplicity.
Why not IP fragmentation? RFC1858!
Tiny fragment attack
can use fragroute for smaller IP packets
can forge packets with Scapy
Scapy (cont’d)• Use whatever you like, for example MuxTCP, C code with libpcap, etc. (Visit
code.)
• Scapy is troublesome in initiating TCP• need to change firewall rules by iptables• need to change firewall rules by iptables• iptables -A OUTPUT
RST -j DROP
• wouldn’t work for Lab 4 1/2 since the server is not under control
• initiate TCP handshake with your own socket
Scapy (cont’d)Use whatever you like, for example MuxTCP,
C code with libpcap, etc. (Visit Milw0rm for
Scapy is troublesome in initiating TCP
need to change firewall rules by iptablesneed to change firewall rules by iptables
A OUTPUT -p tcp --tcp-flags RST
wouldn’t work for Lab 4 1/2 since the server
is not under control
initiate TCP handshake with your own
Scapy (cont’d)
• To use tcpdump, you need to set PATH• use scapy.sh script mailed• use sniff.py to sniff the network•• Need to know some TCP/IP stuff• IP• IPID, IP addresses• TCP• flags, sequence #, ack #, ports
Scapy (cont’d)
To use tcpdump, you need to set PATH
use scapy.sh script mailed
use sniff.py to sniff the network
Need to know some TCP/IP stuff
IPID, IP addresses
flags, sequence #, ack #, ports
TCP Handshake• fd = open(“message.txt”, ‘r’)• lines = fd.readlines()• fd.close()
• for l in lines:• print l.strip()+’\n’
• import time• time.sleep(1)
TCP Handshakefd = open(“message.txt”, ‘r’)
lines = fd.readlines()