ibm research, network server systems software © 2007 ibm corporation instant messaging traffic...
TRANSCRIPT
IBM Research, Network Server Systems Software
© 2007 IBM Corporation
Instant Messaging Traffic Analysis
Zhen Xiao, Lei Guo, and John Tracey
The 27th International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 2007
IBM Research, Network Server Systems Software
© 2007 IBM Corporation2
Instant messaging
Quick response
User presence service
Multitasking
Private chat
Enterprise cooperation
AIM: 53 M usersMSN: 29 M users
Jabber: 13.5 M usersSameTime:15M users
Skype: 7 MQQ: 20 M
Peak online users
IBM Research, Network Server Systems Software
© 2007 IBM Corporation3
Instant Messaging Traffic Analysis – Goals
Understanding instant messaging traffic characteristics
– Other workloads (Web, Database, etc.) well understood
– Little study on instant messaging workloads
Instant messaging is a key application for SIP
– Workload characterization essential to realistic workload generation
– Workload generation essential to benchmarking
Challenge: sniffing in the middle of the network is hard
– IM formats and protocols are proprietary
– Developing IM sniffer has distinct challenges
IBM Research, Network Server Systems Software
© 2007 IBM Corporation4
Existing Work on Instant Messaging Analysis
Social behaviors of IM users [CSCW2000, CSCW2002]
– Based on surveys and interviews
– Small sample sizes, subjective descriptions
Specialized instant messagers [CSCS2002 Hubbub]
– Relatively large scale (437 users)
– incompatible with popular IMs
– Also focuses on social behaviors
Security of IM networks [WORM2005]
– Propagation of viruses and worms
Very little focus on characteristics of instant messaging traffic
IBM Research, Network Server Systems Software
© 2007 IBM Corporation5
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation6
AOL Instant Messaging (AIM)
Authentication
Redirection
User-to-user chat
Multi-user chat
P2P communication
Authentication server
BOS server BOS server
Chat room server
P2P voice/video chat,file transferring
Email server
Buddy iconserver
…
Other services
IBM Research, Network Server Systems Software
© 2007 IBM Corporation7
MSN Messenger
Switchboard server
Dispatch server
Notification server Notification server
P2P voice/video chat, file transferring
MSN passport server
Email server
…
Other services
IBM Research, Network Server Systems Software
© 2007 IBM Corporation8
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation9
Instant Messaging Traffic Analysis - Approach Trace collection
– Analyze traffic between large enterprise (~4K users) Intranet and the Internet
– Comprehensive analysis of AIM and MSN
– Cursory analysis for Yahoo, GTalk, SameTime, IRC chat
– Logs anonymized version of traffic
– About one month duration: 2006-10-14 to 2006-11-06
– More than 20K conversations
Sniffer
Enterprisenetwork
Internet
IBM Research, Network Server Systems Software
© 2007 IBM Corporation10
Online anonymization
Dump to disk
Instant Messaging Sniffer Architecture
MSNP
AIM protocol– Classic: OSCAR– Triton: new (Aug
2006), N/A
10% AIM traffic
Networkinterface
OSkernel
pcap library
Online packetreconstructor
AIM packetparser
MSN packetparser
Ethernetpackets
IM packet 1
IM packet 2
IP packets
MD5 hash with random seed
IBM Research, Network Server Systems Software
© 2007 IBM Corporation11
Overview of IM traffic
0
100
200
300
400
500
600
AIM MSN Yahoo Gtalk
InboundOutbound
For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends.
MB Traffic volume
0
200
400
600
800
1000
1200
AIM MSN Yahoo Gtalk
Total # of server IPs collected
The number of IM servers is very large
IBM Research, Network Server Systems Software
© 2007 IBM Corporation12
IM traffic rateHourly traffic rate of AIM
Hourly traffic rate of MSN
Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring
IBM Research, Network Server Systems Software
© 2007 IBM Corporation13
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation14
Breakdown of IM message types
Chat msgs: text msgs a user types
Hint msgs: generated by IM client software
Presence msgs: status of buddies
Icon/binary msgs: transfer pics of users, deliver voice/video chat, file transfers when two users cannot communicate directly
Service control msgs: log in, log out, server redirection, appl level keep alive, etc.
Other: all other msgs
IBM Research, Network Server Systems Software
© 2007 IBM Corporation15
Message level analysis of IM traffic
# of msgs: chat < hint < presence
MSN has more bin msgs for user icons, voice/video chats
AIM MSN
During overload, instant messaging servers can prioritize traffic and drop lower priority traffic to protect the instantaneous nature of the communication
IBM Research, Network Server Systems Software
© 2007 IBM Corporation16
Size of chat messages
AIM: messages are in html format (not extracted online)
MSN: format is described in message header and easy to remove
MSN: 90% messages are smaller than 50 bytes
CDF (semi-log scale) CCDF (log-log scale)
< 50 bytes
IBM Research, Network Server Systems Software
© 2007 IBM Corporation17
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation18
Online activity of AIM usersNumber of online users
Number of simultaneous chat conversations
Clear diurnal and weekly patterns peak time about 2:00 PM# of chat conversations << # of online users
120 users
12 chatconversations
IBM Research, Network Server Systems Software
© 2007 IBM Corporation19
Online activity of MSN usersNumber of online users
Number of simultaneous chat conversations
90 users
14 chatconversations
Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break)# of chat conversations << # of online users
IBM Research, Network Server Systems Software
© 2007 IBM Corporation20
Online duration of IM user sessions
CDF CCDF
Two mode distribution
10 hours – the divide between long online durations and short online durations
IBM Research, Network Server Systems Software
© 2007 IBM Corporation21
Online activity of AIM users
Login events Logout events
Peak time: about 9:00 AM Peak time: about 5:00 PM
IBM Research, Network Server Systems Software
© 2007 IBM Corporation22
Online activity of MSN users
Login events Logout events
Peak time: about 9:00 AM Peak time: about 5:00 PM
IBM Research, Network Server Systems Software
© 2007 IBM Corporation23
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation24
IM social network: number of users contacted
Rank (log-log scale) CCDF (Weibull scale)
Disclaimer: cannot rebuild the contact network of IM systems with only a subset of its users
MSN: Weibull, AIM: a little rough
IBM Research, Network Server Systems Software
© 2007 IBM Corporation25
Number of buddies an IM user chats with
A user only contacts with a small portion of of buddies in its contact list
MSN users are more active?– Not sure, we do not count AIM Triton users
MSN AIM
A user chat with 5.5 buddies (about 25%) in average
A user chat with 1.9 buddies (about 7%) in average
IBM Research, Network Server Systems Software
© 2007 IBM Corporation26
Outline
Introduction
Related work
Background on AIM and MSN
Instant Messaging Traffic Analysis
– Overview
– Message level analysis
– Online session analysis
– Social network analysis
Conclusion
IBM Research, Network Server Systems Software
© 2007 IBM Corporation27
Concluding remarks
Workload characterization essential to benchmarking
– The Design, Implementation, and Validation of an Instant Messaging Workload Generator Submitted for publication
Message level analysis
– Chat messages constitute only a small percentage of the total IM traffic
Social network
– Does not follow a power law distribution
Future work
– Measurement from other user population (e.g., universities)
– Server side workload (a global map of IM user social network)