aarnet copyright 2006 aarnet out of hours support questnet workshop qut 5 november 2008...

24
AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 [email protected] [email protected]

Upload: sabrina-miles

Post on 21-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

AARNet Out of Hours SupportQUESTnet Workshop

QUT 5 November 2008

[email protected]@aarnet.edu.au

Page 2: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

2

Core Operational Functions• Monitoring• Measurement• Contact & Communication• Logging / Record Keeping• Fault Diagnosis• Co-ordination• Restoration

Page 3: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

3

Things go wrong… IT happens!• Power failures• Fibre optic cables get dug up, broken (Dial-after-you-Dig)• Equipment misbehaves, fails• Operator errors – misconfigurations• Denial of Service and Distributed Denial of Service attacks• Maintenance without notice• Fires in computer rooms• Floods in computer rooms• Floods in the middle of the desert• Lightening strikes• Rodents chew through cables• Ships drag anchor, trains go off the rails (and onto fibre optic cables)• The list goes on…

Page 4: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

4

Maximise up-time via design of the Network• Design philosophy is to have diversity and redundancy• Ideally no single points of failure• Dual PoPs in Major Capital Cities• Diverse transmission paths• Dual customer connections• I know-

– No diverse path Nth of Townsville– Single PoP in NT & Tasmania– No option for dual connections for many customers

• You can solve all problems with the application of cash

Page 5: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

AARNet’s – International Network

Page 6: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

AARNet’s national network

© 2008, AARNet Pty Ltd Private and Confidential

6

Page 7: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

7

Monitoring - Nagios• Nagios – a ‘free’ software package• Runs continuously on servers in Sydney and Perth• Constantly polls / probes nearly 1,000 network

elements (hosts & services)• Raises alarm in the event of a fault

– SMS messages sent to key operations staff• Format under review – include customer contact

– E-mail to NOC mailing list– Web based on-line, real-time displays & reports

• Measures and records availability

Page 8: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

8

Nagios

Page 9: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

9

Measurement – SNMP, MRTG• Interface/circuit information

gathered every five minutes with Simple Network Management Protocol

• Visualised with Multi Router Traffic Graph (MRTG) on all important interfaces on the network

• Link/circuit utilisation– Bits per second– Packets per second– Flows per second

• Blue/inbound, Green/outbound

Page 10: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

10

Measurement – SmokePing• Measures and records Round-trip Time

(RTT) and Packet-Loss to selected targets

• Ideal for visualising latency and jitter (variance in latency)

• Y-axis represents latency• X-axis - time• ‘Smoke’ represent degree of jitter• Coloured bars represent packet-loss:

– Green – no packet-loss– Blue – 1 in 20 lost (5%)– Purple – 4 in 20 lost (20%)– Red – 19 out of 20 packets lost!

Page 11: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

11

Measurement - NetFlow• Flow:- a conversation in one direction between two

computers on the network• Flow record:- Information about the flow – timestamp,

source, destination, number of packets and bytes, protocol used, application type, etc

• Member’s Edge Router generates flow records and exports them to the Member’s Edge Server

• Raw flow records are analysed and processed to produce various reports

• Extremely useful when investigating security incidents

Page 12: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

12

Contact & Communication • 24 x 7 Call Centre / Help Desk

– (02) 9963-3538– New number: 1300 APL NOC (1300 275 662)

• Calls & SMS to the On-Call Officer• Escalates to:

– Secondary On-Call Officer– Tertiary On-Call Officer

• Response normally within 15 minutes

Page 13: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

13

Contact & Communication • National and State based e-mail distribution lists for

AARNet operational notices• Contain e-mail addresses of the form

[email protected]– Member controls which individuals receive AARNet operational notices– Member can add or change recipients without any intervention by

AARNet– Better quality distribution lists – more up-to-date

Page 14: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

14

Contact & Communication

AARNetNOC

AutoSMS

AutoE-Mail

WebSite

Nagios

Supplier Peer Customer

Direct E-Mail

DirectE-Mail

TelephoneMailing

List

Customer

DirectE-Mail

Telephone

Supplier/Peer

Customer Supplier Peer

Telephone Call Centre24 x 7 - Auto SMS

Page 15: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

15

Contact & Communication • Under development – Common Contact Database

– Single, centralised and definitive– Shared by Operations staff– All Customer Technical Contacts

• Preference for mobile phone numbers– All Supplier contacts

• Other NOCs• Co-location facilities• Circuit IDs, rack locations, etc

Page 16: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

16

Logging / Record Keeping / Trouble Tickets• JIRA by Atlassian• General purpose, web based ‘issue tracking’ software• Multi-user – anyone with a web browser (and username and password)• Queue oriented – multiple queues• One queue dedicated to network trouble tickets (NOCTTS)• Log of issues including

– fault descriptions– current fault status– contact information– communications– sequence of events– comments– through to resolution and ‘closed issue’

Page 17: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

17

Page 18: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

18

Page 19: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

19

In Summary…

• Increased emphasis on monitoring• Continued focus on measurement• Contact and communication vital to our success

• What does AARNet need from you?– Quality contact information– Pro-active communication if something goes wrong– Co-operation and collaboration to fix the problem

Page 20: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

20

AARNet provides 24 x 7 On-call coverage• The 24 x 7 Helpdesk function is outsourced to a call centre

(Link:Q Communications)– Customers, peers and service providers call Link:Q (not AARNet directly)– A call centre operator answers the phone, takes details and pages the AARNet on-

call officer– An escalation roster is sent to Link:Q weekly – primary, secondary and tertiary

AARNet contacts for the week– Upon receiving a call and taking details, Link:Q then call and SMS the primary

AARNet on-call officer– If the call/SMS is not acknowledged with Link:Q within 20 mins, Link:Q call and SMS

the secondary– If the secondary call/SMS is not acknowledged with Link:Q within 20 mins, Link:Q call

and SMS the tertiary– Hence the AARNet SLA of responding within the hour (although usually it is much

sooner)

Page 21: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

21

On-Call Officer • The on-call officer is responsible for:

– Taking and acknowledging calls to the AARNet Helpdesk– Responding immediately to the caller– Gathering as much information, in the first instance and creating a ticket for

each call– Deciding to take responsibility for the ticket personally or assigning the ticket

to someone else– If taking responsibility for the ticket:

• Analysing, troubleshooting, testing, solving, working the ticket to closure• Monitoring communications, events and developments related to the ticket,

including e-mail to the AARNet NOC• Communicating/updating relevant information to all parties involved or affected

for the duration of the ticket

Page 22: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

22

On-Call Office...Business Hours• Additionally, during business hours, the on-call officer is

responsible for:– Monitoring alarms, communications, events and developments via e-

mail messages to the AARNet NOC– Creating and updating tickets to record and document faults and

difficulties– Creating and updating tickets and calendar entries to reflect periods of

scheduled maintenance– Informing all affected parties of outages or hazardous conditions due to

faults or periods of scheduled maintenance

Page 23: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

23

On-Call Allowance • AARNet pays the primary on-call officer an allowance per day,

usually for the 7-day week they are on call.• In addition, the primary on-call officer is paid time-and-a-half if

called out-of-hours; – Usually regarded as outside Mon-Fri 8:00am-6:00pm and public

holidays,– For a minimum of 2 hours per day, again only if called out-of-hours.

• All Operational Staff are on the roster!– Eg. Sys Admins cover network faults– Improves the skills and familiarity across all Operational areas

Page 24: AARNet Copyright 2006 AARNet Out of Hours Support QUESTnet Workshop QUT 5 November 2008 Don.Robertson@aarnet.edu.au Mahmoud.AboElwafa@aarnet.edu.au

AARNet Copyright 2006

Thank you