towards understanding developing world traffic
DESCRIPTION
TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton). IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD. Internet access is a scarce commodity in the developing world: expensive / slow - PowerPoint PPT PresentationTRANSCRIPT
TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC
Sunghwan Ihm (Princeton)KyoungSoo Park (KAIST)Vivek S. Pai (Princeton)
2
IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity
in the developing world: expensive / slow
Our focus: improving performance of connected network access
Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet)
Sunghwan Ihm, Princeton University
2
3
POSSIBLE OPTIONS
Web proxy cachingWhole objectsSingle endpoint (local)Designated cacheable traffic only
WAN accelerationPacket-level cachingMostly for enterpriseTwo (or more) endpoints, coordinated
Effective in first worldSunghwan Ihm, Princeton University
3
4
DEVELOPING WORLD QUESTIONS How effective are these approaches?
Systems designed for first-world useMost traffic studies small, first-world
focusedHow similar is developing region
traffic?
Any new opportunities to exploit?Differences in trafficDifferences in cost/tradeoffsSystem design issues
Sunghwan Ihm, Princeton University
4
5
UNDERSTANDING DEVELOPING WORLD TRAFFIC
Goal
Shape system design by better understanding the traffic optimization opportunities
Requirements
Large-scale, content-focused analysis
Sunghwan Ihm, Princeton University
5
6
PRIOR TRAFFIC ANALYSIS WORK Large scale traffic analysis
Internet Study 2007, 2008/2009 by ipoqueOne million usersHigh-level characteristics via DPIFirst-world focus
Developing world traffic analysisDu et al. WWW’06, Johnson et al. NSDR’10Proxy-level analysis from kiosk, Internet
cafes, and community centers
Sunghwan Ihm, Princeton University
6
7
OUR APPROACH
Combine best featuresLarge-scale and content-focused First world and developing world
Use traffic from CoDeeN content distribution network (CDN)Global proxy (500+ PlanetLab nodes)Running since 200330+ million requests per day
Sunghwan Ihm, Princeton University
7
8
WHAT TO ANALYZE?
1. Traffic profile
2. Caching opportunities
3. User behavior
Sunghwan Ihm, Princeton University
8
9
DATA COLLECTION
OriginWeb Server
Local ProxyCache
User BrowserCache
CoDeeNCache
WAN
• Assume local proxy caches• Focus on cache misses only • Capture full content
9
9
Sunghwan Ihm, Princeton University
10
DATA SET
Duration: 1 week (March 25-31, 2010)
# Requests: 157 Million
Volume: 3 TeraBytes
# Clients (unique IPs): 348 K
# Countries/Regions: 190 /8 networks coverage: 61.3%/16 networks coverage: 24.1%
Sunghwan Ihm, Princeton University
10
11
TOP COUNTRIES
Requests % Bytes % Clients %
PL
CN
SA
Etc. Etc.Etc.
11
DE (Germany)US (United States)RU (Russian Federation)AE (United Arab Emirates)
PL (Poland)CN (China)SA (Saudi Arabia)
DEUS
US
PL
CN
CN
PL
SA
SA
DEAE
RU
Etc.(185 Countries)
12
OECD VS. DEVREG
OECD: the first world27 high-income economies from OECD
member countries25% of total traffic
DevReg: the developing worldThe remaining 163 countries and 3 OECD
members: Mexico, Poland, and Turkey75% of total traffic
Sunghwan Ihm, Princeton University
12
13
ANALYSIS #1: TRAFFIC PROFILE
Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy)
We often hear a variant of“Offline Wikipedia content suffices for developing world users”
Sunghwan Ihm, Princeton University
13
14
Small: median 3KB vs. 5KB Large: similar demand/profile
16KB
OBJECT SIZE
Sunghwan Ihm, Princeton University
14
15
TEXT AND IMAGES
DevReg has a higher fraction of images Exact opposite of bandwidth conjecture
Sunghwan Ihm, Princeton University
15
16
VIDEO AND AUDIO
DevReg: higher fraction of video & audio Music videos and MP3 songs
Sunghwan Ihm, Princeton University
16
17
APPLICATION (FLASH)
DevReg has a higher fraction of application traffic
Median near 7%
Sunghwan Ihm, Princeton University
17
18
ANALYSIS #1 SUMMARY
Some evidence that DevReg-visited sites have smaller objects, but
DevReg users visit large pages as well, and
DevReg users seek a higher fraction of rich content than OECD users
Sunghwan Ihm, Princeton University
18
19
ANALYSIS #2: CACHING OPPORTUNITY
Conjecture: little gain from larger cachesSome analysis suggests 1GB sufficientTypical cache size < 20GBObject-based caching
Sunghwan Ihm, Princeton University
19
20
CONTENT-BASED CHUNK CACHING
Split content into chunksName chunks by content (SHA-1 hash)Cache chunks instead of objects
Fetch content, send only modified chunksTwo endpoints neededApplies to “uncacheable” content
A B C D E
Sunghwan Ihm, Princeton University
20
21
OVERALL REDUNDANCY
40% @ 64 KB: objects or parts of large object 60% @ 1 KB: parts of text pages 65% @ 128 bytes: paragraphs or sentences
Sunghwan Ihm, Princeton University
21
22
CACHE BEHAVIOR SIMULATION
Simulate one week’s trafficCache misses onlyLRU cache replacement policy
Determine size for near-ideal hit rateCalculate byte hit ratio (BHR) Vary storage size (from 10MB to max)
Results for US, China, and Brazil
Sunghwan Ihm, Princeton University
22
26
ANALYSIS #2 SUMMARY
Chunk caching usefulReduces WAN (cache miss) trafficComplements existing Web proxies
Larger caches usefulUseful reduction in miss rateCheap compared to bandwidth costs
Sunghwan Ihm, Princeton University
26
27
ANALYSIS #3: USER BEHAVIOR
Conjecture: as first-world Web pages get larger, DevReg users suffer delays
Mechanism: observe aborted transfers Intentional terminationAutomatic when browsing away
Abort = users bored or downloads slow
Sunghwan Ihm, Princeton University
27
28
CANCELLED OBJECT SIZEC-CDF
Cancelled objects larger than normal (red) Complete objects (green) much larger than actual
download (blue) Most downloads less than 10MB
Sunghwan Ihm, Princeton University
28
29
CANCELLED TRANSFER VOLUME 17% of transfers are terminated early
Due to the early termination, 25% of actual traffic
If fully downloaded, would have been 80% of all bytesOverall traffic increase of 375%
Sunghwan Ihm, Princeton University
29
30
CANCELLED CONTENT TYPES
Most canceled responses were text Most bytes from video/audio/application
Sunghwan Ihm, Princeton University
30
31
% CANCELLED REQUESTS CDF
OECD cancel more often than DevRegMedian almost double
Sunghwan Ihm, Princeton University
31
32
ANALYSIS #3 SUMMARY
Many transactions aborted
Previewing video filesContent-based caching is effective
OECD users less patient than DevRegCheap bandwidth = more sampling?
Sunghwan Ihm, Princeton University
32
33
CONCLUSIONS
First glimpse at CoDeeN trafficLarge-scale, content-focused analysisOECD and developing world
Many DevReg assumptions are false In fact, strong desire for rich content, andPatient despite slow connections
Systems implicationsChunk caching worth more explorationLarger caches very useful
Sunghwan Ihm, Princeton University
33