2008 au tim wikipedia

Upload: bm9ib2r5

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 2008 Au Tim Wikipedia

    1/25

    Wikimedia's Squid NetworkAugust 2008

  • 8/6/2019 2008 Au Tim Wikipedia

    2/25

    Wikimedia Foundation

    Non-profit organisation

    Registered 501(c)(3) charity

    Funded entirely by donations

    Head office in San Francisco

    Operates Wikipedia and ~10 other websites

    21 staff members, including 6 tech staff

  • 8/6/2019 2008 Au Tim Wikipedia

    3/25

    Me

    Tim Starling

    Working for Wikipedia (later Wikimedia) since2002

    Paid for it since 2006

    Multiple roles: developer and systemadministrator

    mailto:[email protected]:[email protected]
  • 8/6/2019 2008 Au Tim Wikipedia

    4/25

    Wikipedia

    Ranked 7-8 on Alexa with adaily reach of 9%

    250 languages (written from

    scratch in each language,not translated)

    Far bigger than any otherencyclopedia

  • 8/6/2019 2008 Au Tim Wikipedia

    5/25

    Wikipedia

    It's a Wiki (from the Hawaiian word quick)

    As soon as an edit is performed, it is instantlyvisible to everyone

    Cache Control: private, s maxage=0, max age=0,

    must revalidate

  • 8/6/2019 2008 Au Tim Wikipedia

    6/25

    Operations

    Squid reports cache size of ~500 GB text, ~770GB images

    Text backend: PHP/MySQL

    Images backend: lighttpd/NFS

    Terabytes of old revisions (we keep them all)

    50,000 req/s at peak

  • 8/6/2019 2008 Au Tim Wikipedia

    7/25

    Load divided by geographic DNS

    Tampa

    DB

    Amsterdam

    SeoulOperations

  • 8/6/2019 2008 Au Tim Wikipedia

    8/25

    Tampa, Florida

    Where Jimmy Wales used to live

    Backend

    24 image squids

    25 text squids

    We pay for bandwidth

  • 8/6/2019 2008 Au Tim Wikipedia

    9/25

    Amsterdam, The Netherlands

    14 image squids

    15 text squids

    Peering on AMSIX

    Hosting and transitdonated by Kennisnet

  • 8/6/2019 2008 Au Tim Wikipedia

    10/25

    Seoul, South Korea

    9 image squids

    8 text squids

    Hosting donated byYahoo! Korea

  • 8/6/2019 2008 Au Tim Wikipedia

    11/25

    Squid network

    Each squid server has two instances of squidrunning

    root@sq1:~# ps -e --forest | grep squid

    4890 ? 00:00:00 squid4893 ? 1-11:03:14 \_ squid4919 ? 00:00:00 squid-frontend4921 ? 2-01:07:02 \_ squid-frontend

    LVSLoad balancer

    SquidFrontend

    SquidCache

  • 8/6/2019 2008 Au Tim Wikipedia

    12/25

    Squid network

    LVS

    Simple load balancing

    Frontend squid

    Small memory-only cache

    Client ACLs

    Selects a cache squid using CARP (URL hashing)

    Cache squid Memory and disk cache

    Forwards to Tampa squid cluster, or to backend

  • 8/6/2019 2008 Au Tim Wikipedia

    13/25

    Caching strategy

    Backend gives a licence to cache

    Cache-control: s-maxage=2678400, must-revalidate, max-age=0

    Squid replaces the header to suppress externalcaches

    header_access Cache-Control allow tiertwoheader_replace Cache-Control private, s-maxage=0, max-age=0,

    must-revalidate

    Backend sends an HTCP CLR message to allsubscribed squids when an object changes

  • 8/6/2019 2008 Au Tim Wikipedia

    14/25

    Caching strategy

    Logged-in page views are not cached in squid

    Vary: Accept-Encoding, Cookie

    Images and CSS are cached for everyone, no

    Vary header

  • 8/6/2019 2008 Au Tim Wikipedia

    15/25

    HTCP CLR

    Backend Tampa squids

    Tampaudpmcast.py

    Seouludpmcast.py

    Amsterdamudpmcast.py

    Amsterdamsquids

    Seoul squids

    Multicast

    Unicast Multicast

    MulticastUnicast

  • 8/6/2019 2008 Au Tim Wikipedia

    16/25

  • 8/6/2019 2008 Au Tim Wikipedia

    17/25

    Australian squid cluster?

    Our ideal Australian squid cluster would:

    Be a single cluster of a few dedicated servers

    Give us root access

    Serve the whole region (Australia, New Zealand,Indonesia)

    Expected traffic ~100 Mbps outbound peak

  • 8/6/2019 2008 Au Tim Wikipedia

    18/25

    Patches

    Make HTCP CLR work when you use Varyheaders

    Make header_replace only act on response

    headers UDP logging

    X-Vary-Options

    All open source (in debs/squid)

  • 8/6/2019 2008 Au Tim Wikipedia

    19/25

    UDP logging

    Squid access logs are sent by UDP to anaggregation host in Seoul

    The log stream is processed by various scripts

    in real time Only a 1/1000 sample is logged to disk

    The whole system is openly documented and

    open source:https://wikitech.leuksman.com/view/Squid_logging

  • 8/6/2019 2008 Au Tim Wikipedia

    20/25

    X-Vary-Options

    The Vary header is a blunt instrument

    "Vary: Cookie is not a good way to detectlogins, when you have JS-only cookies

    "Vary: Accept-Encoding is not a good way todetect gzip support

    X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=enwikiToken;string-contains=enwikiLoggedOut;string-contains=enwiki_session;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut

  • 8/6/2019 2008 Au Tim Wikipedia

    21/25

    X-Vary-Options

    The X-Vary-Options patch gives fine-grainedcontrol over how squid constructs Vary records

    The syntax is extensible

  • 8/6/2019 2008 Au Tim Wikipedia

    22/25

    Configurator

    PHP script to generate configuration files forevery squid instance

    'apaches' => array('pmtpa' => array(

    'test.wikipedia.org' => 'srv35.pmtpa.wmnet','=wap_domains' => 'yongle.wikimedia.org','ls2.wikimedia.org' => 'srv77.pmtpa.wmnet','whygive.wikimedia.org' => 'isidore.wikimedia.org','blog.wikimedia.org' => 'isidore.wikimedia.org','=thumb_php' => 'rendering.pmtpa.wmnet',

    'apaches.pmtpa.wmnet', # LVS),

    ),

  • 8/6/2019 2008 Au Tim Wikipedia

    23/25

    Configurator

    # srv35.pmtpa.wmnetcache_peer 10.0.2.35 parent 80 3130 originserver no-query

    connect-timeout=5 login=PASScache_peer_access 10.0.2.35 deny wap_domainscache_peer_access 10.0.2.35 deny ls2_wikimedia_orgcache_peer_access 10.0.2.35 deny whygive_wikimedia_orgcache_peer_access 10.0.2.35 deny blog_wikimedia_org

    cache_peer_access 10.0.2.35 deny thumb_phpcache_peer_access 10.0.2.35 allow test_wikipedia_orgcache_peer_access 10.0.2.35 deny all

    PHP script to generate configuration files forevery squid instance

    C f

  • 8/6/2019 2008 Au Tim Wikipedia

    24/25

    Configurator

    Generates complex ACL rules

    Supports origin servers behind multiple cacheclusters

    Can configure things like cache_mem, per-server and per-cluster

    Perl deployment script pushes the configuration

    files out by ssh and HUPs squid Not public (but might be some day)

    Q ti ?

  • 8/6/2019 2008 Au Tim Wikipedia

    25/25

    Questions?

    Image credits:

    Wikimedia logo, Wikipedia logo: Copyright Wikimedia

    Foundation, all rights reserved Wiki Wiki bus: Andrew Laing, cc-by-sa-2.0

    Squid: from http://www.squid-cache.org/Artwork/ cc-by-nc-sa

    Florida locator: Huebi, cc-by

    Netherlands locator: Quizmodo, PD

    South Korea locator: Vardion, GFDL