workload characterization of a large systems conference web server (cnsr 2009)

Upload: asicsnew

Post on 08-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    1/16

    Workload Characterization of a LargeSystems Conference Web Server

    Aniket Mahanti Carey Williamson Leanne Wu

    University of Calgary

    CNSR 2009

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    2/16

    Introduction

    Organizers of large systems conferences often confrontnumerous challenges.Apart from handling the high load of paper submissionsand the logistics of hosting the actual conference,

    organizers also have the job of creating and maintaining aconference Web site.Hosting a conference Web site stresses the existinginfrastructure of a host institution because of additionaloverall traffic, different workload characteristics, and

    specific periods of extreme load.Web server workload characterization can provide somevaluable insights for future organizers of large systemsconferences with a significant Web presence.

    2

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    3/16

    Conference Web Server Characterization

    We present a workload characterization study of the WWW2007 conference Web site, using datacollected from both server-side and client-side .

    Our datasets were collected over a 1-year periodin the form of server logs (server-side) andGoogle Analytics (client-side) reports.We provide multi-layered workload analysis of the WWW2007 conference Web site dataincluding traffic profile, usage behaviour, trafficsources, and robot activity.

    3

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    4/16

    Trace Data OverviewCharacteristic Server Logs Google Analytics

    Trace Duration May 22, 2006 May 22, 2007 (1-year)

    Unique Visitors 129, 185 80, 554

    One-time Visitors 99, 608 (77%) 56% new visits

    Visits 431, 698 143, 505

    Avg. Visits per Day 1, 180 392

    Pageviews 1.57 million 0.54 million

    Avg. Pageviews per Visit 3.18 3.77

    Total Traffic Volume 215 GB -Avg. Traffic Volume per Day 600 MB -

    4

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    5/16

    Server-side versus Client-side Data CollectionCharacteristic Server Logs Google Analytics

    Hits Record hits to every file onthe server, providing a morecomplete record of resourceaccess patterns.

    Tracks pages (e.g., PHP, H T ML, ASP), butdoes not record hits to the individualresources (e.g., images, documents) thatare linked or embedded in the pages.

    Visitors A unique visitor is identified

    by a uniqueIP address inthe log.

    Google Analytics and other page tagging

    services track visitors and their visits byplacing cookies based on session IDs inthe visitors browsers.

    Visits Web proxies and dynamic IPaddresses make it difficultto track unique visits to the

    site.

    Visits from search engine spiders androbots are ignored by Google Analytics.

    Traffic Volume Records size of eachrequest.

    Can not measure traffic volume.

    Page Errors Provide details about serverand client errors.

    Does not record page errors.

    5

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    6/16

    Traffic Profile

    T he traffic volume was low for the first few months, until the firstconference newsletter was sent on August 28.T he most daily visits during the 2006 calendar year were observedon November 20 (paper submission deadline).Between April 5 and 12 the Web site was overhauled, whichincluded addition of the online proceedings.T he site had its highest usage during the conference itself.

    J u n / 0 6 A u g / 0 6 O c t / 0 6 D e c /0 6 F e b / 0 7 A p r /0 7 J u n / 0 7

    C o un

    t

    0

    1 0 0 0

    2 0 0 0

    3 0 0 0

    4 0 0 0

    5 0 0 0

    6 0 0 0V i s i tsU n i q u e V i s it o r s

    6

    Aug, 28

    Nov, 20

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    7/16

    Visitor Trends

    Figure (left) shows the weekly cycle typical to any workplace.On average Monday is the busiest day of the week, while activity is

    almost constant over the rest of the work week.Figure (right) does not show a strong diurnal pattern because theWeb site had a global reach with a good proportion of visitors whowere located outside North America.Normal work hours account for almost 40% of the total visits.

    DaySun Mon Tue Wed Thu Fri Sat

    P er c

    en

    t a g e of T

    o t al

    0

    5

    10

    15

    20

    PageviewsVisits

    Hour 0 5 10 15 20

    P er c en

    t a g e of T

    o t al

    0

    2

    8

    PageviewsVisits

    7

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    8/16

    Visit Duration

    T he average visit duration varied between 2-4 minutes, except forthe last three months leading up to the conference when theaverage visit duration varied between 4-6 minutes.Most of the spikes can be attributed to organizing committeemembers testing the site.Other peaks in the graph coincide with important dates of theconference such as submission deadline, notification, etc.

    J u n / 0 6 A u g / 0 6 O c t /0 6 D e c / 0 6 F e b / 0 7 A p r / 0 7 J u n / 0 7

    V i s i t D

    ur a

    t i on

    ( mi n )

    0

    2

    4

    6

    8

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    9/16

    Page Errors

    No server-side errors were noted, indicating that the Web site functionedproperly with few outages during the 1-year period.Most of the errors were due to two missing files, namely favicon.ico androbots.txt . T he favicon.ico file was added on April 14, 2007, while the robots.txt file was

    never added.The error rate was relatively low until the first week of April 2007, when aspike occurred. As part of conference proceedings production, the entire site was crawled to

    check for broken links. T he links were fixed later in the week.

    J / / c t / c / / r/ J /

    E r r r s

    ( X

    )

    9

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    10/16

    Traffic Volume

    The traffic volume almost quadruples during April and May of 2007 compared to thepreceding months.During the first week of April the entire Web site was crawled and the online proceedingswas added containing PDF files of all accepted papers and posters.May 7, 2007 was the busiest day when 4.7 GB of data was transferred by the server to itsvisitors. T his increase in traffic volume was mostly due to visitors accessing PDF files of papers from

    the online proceedings and site updates.Approximately 55% of the total traffic volume transferred occurred during the last 60 days.

    T i m eJ u n / 0 6 A u g / 0 6 O c t /0 6 D e c / 0 6 F e b / 0 7 A p r /0 7 J u n / 0 7

    T r af f i c V

    ol um

    e ( M

    B )

    0

    1 0 0 0

    2 0 0 0

    3 0 0 0

    4 0 0 0

    5 0 0 0

    10

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    11/16

    Geographic Location

    G7 countries account for the majority (60%) of visitors and trafficvolume, with the U.S. alone accounting for approximately 40%.

    About 75% of the conference participants were from G7 countries,including 43% from the U.S. Participants from Canada accounted for15% of the total.

    U.K., China, and Canada transfer a higher percentage of traffic ascompared to percentage of unique visitors.

    tr

    K i

    J G

    r s t r l i

    i r

    c

    I t l

    I i t

    r s

    r c

    t

    f T

    t l

    is it r sT r ffic l

    11

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    12/16

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    13/16

    Robot VisitsCharacteristic Maintenance ExternalTotal Hits 601, 906 283, 762

    Unique Visitors 7 4, 263

    Total Visits 400 39, 666

    Avg. Visit Duration (min: sec) 41:13 9:45

    Avg. Traffic Vol. per Visit 19.6 MB 189.8 KB

    Avg. Pageviews per Visit 404 4

    Major updates to the conference Web site were done using Maintenancerobots. Maintenance robots accounted for over half of the total traffic volume

    transferred by the server and their average visit duration was five times that of External robots.Among the external robots, search engine spiders accounted for over 75%of the visits. Almost 45% of the search engine spider visits were due to Yahoo!, 10% due to

    Microsoft, and 7% due to Google.Other robot visits came from educational institutions, Web filtering

    companies, and anti-virus companies. 13

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    14/16

    ContributionsWe believe that our experiences with WWW2007 and ourcharacterization results will be useful for future conferenceorganizers.With knowledge of the frequency, duration, and source of client visits, organizers can perform appropriate searchengine optimizations in order to promote the site moreefficiently.Site administrators could improve the user experience onthe site by designing the site with a more completeknowledge of system and network properties of potential

    visitors.We showed that robot loads should not be underestimated,and that an understanding of robot visit patterns may alloworganizers to prepare security procedures to safeguardtheir Web site.

    14

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    15/16

    Summary

    Using server-side and client-side measurementwe analyzed usage behaviour, client errors,bandwidth, and robot activity of the site.The Web site traffic was non-st a tion a ry , withmuch of the Web site activity in the month justprior to the conference.Visitor activity showed no strong diurnal pattern,reflecting the international usage of the site.Almost half of all visits came via search enginequeries.Robot visits were also prevalent on the site.

    15

  • 8/6/2019 Workload Characterization of a Large Systems Conference Web Server (CNSR 2009)

    16/16

    TH ANK YOU.

    16