measure() or die()
TRANSCRIPT
By Arik Lerner Team Lead Automation & Performance/Resilience
Measure() OR Die();
Measure or Die
- 3.5 years in Liveperson
- 2 years - Reporting Platform
- 1.5 years Team Lead Automation & Performance/Resilience
- Interests: Private pilot on Cessna 172
Bio
➔ How we monitor with e2e testing
➔ E2E Products & Persona’s
➔ The Awakens of the End2End Data
➔ Architecture & Life cycle
Meetup Agenda
About Liveperson
Liveperson transforms theconnection between brands and
consumers.
3BN Visits/month
200BN API calls/month
2 PB data a year
1.5 M Visits concurrent
Our Scale
Our Engineering
~200 people RnD
Constant innovation
Multiple Technologies
Fast release cycle
We Monitor Liveperson Services
By e2e tests which simulate Real Business scenario
➔ Indicates real business problems
➔ Service availability from consumer eyes.
➔ Alert and acquire immediate action.
➔ Insight on our business services
Agent Login Enter into the system
Visitor init chatVisitor enter into site
Agent Chat
E2E Scenario Example
E2E customers expectations
➔ Stability == TRUST
➔ Investigatable
➔ Service Coverage
➔ Scale
E2E
E2E Dashboard Statistics
Real Time Dashboard
Kibana - HAR statistics & Aggregation
E2E Persona’s
Production specialist
PMO
Management
This is Yossi.When Yossi gets up in the morning Yossi looks at the E2E RT dashboardYossi recognize failureYossi enters into E2E debug center toolsYossi is smart!Be like Yossi.
Production Specialist User Story
PMO User Story
This is Michal.Before any software deployment When dashboard failure rate is below 3%Michal have a GO for deploymentMichal is smart!Be like Michal.
Management story
This is Eli.When Eli getup in the morning.Eli looks into the Dashboard statisticsEli can see the health and availabilityEach Data CentersEli is smart!Be like Eli.
➔ Total failures rate.
◆ Filter for each Data Center
◆ Filter each business flow
KPIs
➔ Trend to understand service stability
Widgets
What KPIs do I need to measure ?
➔ Total chats failure rate.
➔ Total missing engagements
➔ Total login failures
➔ Average login response time.
KPIs
➔ Failure cause break down
➔ Client location root cause
➔ Test scenario failures
Widgets
What KPIs do I need to measure ?
Dashboard Demo
The Awakening of the End2End Data
Start collecting the data!
➔ Get build failures/success
➔ Get failure cause
➔ Business flows
➔ Test duration
➔ Client location
➔ Data Center location
➔ Account
@Test
Raw Data Output
The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's
interaction with a site. The common extension for these files is .har.
The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can
be used by a web browser to export detailed performance data about web pages it loads. The specification for
this format is produced by the Web Performance Working Group [1] of the World Wide Web Consortium (W3C).
The specification is in draft form and is a work in progress.
HAR (Http Archive)
➔Logging web browser traffic
HAR proxy diagram
Proxy on port XXX
Selenium WebDriver
HAR
www.Liveperson.com
Request passes through proxy
Based on BrowserMob embedded proxy server
Code snippet - adding proxy into Selenium
• N scenarios• Running from M locations • Running to X Data Centers • Yields HAR Data
Question: how do we investigate the data for the entire Farm/Location/Scenario ? etc...
Answer: aggregation.
Pop quiz:
Start with collecting the data!
@Test
Raw Data Output { metaData:{ "Testname": ChatFlow, "Account": qa12345, "ClientLocation": US, "DataCenter": UK, }}
MetadataHAR
Kafka (topic e2e)
Logstash + Elasticsearch
Kibana Dashboard
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
HAR files@Test @Test
HAR Processor
Files Output Get Json
Send data
Code snippet send message into Kafka
Our benefits➔ Data Retention - 30 days
➔ Ability to query and aggregate over the data for investigation
➔ Ability to build dashboards
➔ Access to the data thorough Elasticsearch APIs
ELK & HAR Downsides➔ Complicated queries over Kibana
➔ ELK setup & maintenance
➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)
What more E2E outputs do we have ?
@Test
More Output BDD ReportsVideoLogsBrowser console logs
Code snippet
BDD - Behaviour Driven Development
MySql DB KAFKA + ELK
Kibana service E2E Reports
HAR datae2e data
Graphite
Zabbix
Jenkins Master
Production
metrics
Grafana
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
Jenkins
Slave
DC-1 DC-2 DC-N
@Test @Test
RT Dashboard
Jenkins Master DR
E2E Test Lifecycle
DEV ProductionStagingQADEV
E2E @ Scale
E2E @ Scale➔ 1.5M http traffic records per day
➔ 200K runs per day
➔ 60 Jenkins slaves machines
➔ 28 scenarios
➔ 6 client location
➔ 6 Regions
What to take home ?
➔ Monitor your Data Centers from consumer experience
➔ Collect data
➔ Provide business meaning with the data.
THANK YOU!We are hiring
YouTube.com/LivePersonDev
Twitter.com/LivePersonDev
Facebook.com/LivePersonDev
Slideshare.net/LivePersonDev