Download - Web analytics presentation
Web Analytics
Jim JansenAssociate Professor, The Pennsylvania State University
Who is Jim Jansen?• Associate professor at College of Information Sciences and
Technology, The Pennsylvania State University, USA• Senior Fellow at the Pew Research Center (Pew Internet
and American Life Project) - http://www.pewinternet.org • Active research and teaching efforts -
http://ist.psu.edu/faculty_pages/jjansen/ • Several funded and non-funded research project• Teach several courses, including keyword advertising• 2011 book, Understanding Sponsored Search (Cambridge) …
theory of keyword advertising• Editor of journal, Internet Research (Emerald)• Book, Understanding User-Web Interactions via Web
Analytics (Morgan & Claypool) - basics of web analytics
• Let talk web analytics! • We’ll discuss:
– context– theory– application
• Begin by setting the stage … what are we facing?
Moving too ‘everything’ recorded and indexed
A lot global but much will remain local
Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology
Raises issues, including: Infrastructure
requirements. How and who pays?
Changes the nature of privacy and anonymity
As publishers or providers, how do we make sense of how people are using this data? --- Web analytics
Explosion of Information -the Zettabytes are coming
There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
How much is a Zettabyte?
1. The volume of data is exploding (information growth)
2. The complexity of data is growing (information architecture)
3. The users have less time (attention economy)
4. The user expects improved features (technological sophistication)
Explosion of Information -the Zettabytes are coming
There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
Web analytics can help us …• Deal with the volume of data
(information growth)• Understand the growing
complexity of data (information architecture)
• Address users’ less time (attention economy)
• Lead to improved features (technological sophistication) expected by users
How does web analytics do this?
• Thousand years ago: science was mainly naturalistic
describing natural phenomena• Last few hundred years:
theoretical branch using models, generalizations
• Last few decades: a computational branch
simulating complex phenomena• Today:
data exploration (eScience)unifying theory, experiment, and simulation – Data captured by sensors, instruments,
or generated by simulator– Processed by humans and software– Information / knowledge stored in computer– Analyzes database / collection content using data
management and statistics– Network and Web Science
Data Information Knowledge
This is the realm of Web analytics!
What is web analytics?• The Web Analytics Association (WAA) defines
Web analytics as:– the measurement, collection, analysis, and
reporting of Internet data for the purposes of understanding and optimizing Web usage
(http://www.webanalyticsassociation.org/)
• Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
Let’s break that definition down … • Collection - accumulate and store over a period of time • Internet data - internet facts and statistics collected
together for reference or analysis • Measurement – ascertain the size, amount, or degree of
something by using an instrument or device• Analysis - examine methodically the structure of
information for purposes of explanation and interpretation. • Reporting - giving a spoken or written account of
something that one has investigated. • Understanding - perceive the significance, explanation, or
cause of something • Optimizing - make the best or most effective use of a
resource • Web usage – employ or deploy something as a means of
accomplishing a purpose or achieving a result
Data
Information
Knowledge
• How is the data collected?
W3C Extended Log Format
W3C Extended Log Format -Variety of fields for examining visitors to Web sites.
Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer
Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log.
• Okay, that’s collection? • What about analysis and reporting?
Variety of tools help make sense of this log data
• With that context, let’s look at the foundations aspects …
Theoretical Foundations• Web analytics is based on the behaviorism
paradigm• Behaviorism – an approach focused on the
outward behavioral aspects of thought and emphases the observed behaviors
• Behaviorism – Pavlov, Watson, and Skinner
Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov
Behaviorism Characteristics• Inductive, data-driven and characterized by empirical observation
of measurable behavior • Grounded on somebody doing something in a situation (all the
environmental and situational features are embedded behaviors)• Critics of behaviorism as a psychological theory have issues with
rejection of mental processes.• I agree - people are more than “mediators between behavior and
the environment” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water
What is a Behavior?
… an observable activity of a person, animal, team, organization, or system.
One can classify behaviors into three general categories. Behaviors are
• something that one can detect and record• actions or specific goal-driven events with
some purpose other than the specific action that is observable
• reactive responses to environmental stimuli
What is a Behavior?• Behavior is the essential construct of
the behaviorism and of web analytics• Logs record behaviors of users and
systems (records behavior but can’t tell affective, cognitive, or situational aspects .. yet, but we’re working on it! )
•A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value)
• can view the data collected in log files as trace data • people conducting the activities of their daily lives
many times create things, create marks, induce wear, or reduce some existing material
• within the confines of research, these things, marks, and wear become data
• classically, trace data are the physical remains of people’s interaction
Data Collection: Trace Data
Wear on a carpet
Trash heap
Surfing web
Trace Data• In the past, trace data was often time consuming to
gather and process, making such data costly. • logging software makes collecting trace data on the
Internet easy and cheap• Log data is controlled accretion data, where the
researcher or some other entity alters the environment in order to create the accretion data
• With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective
What is cool about trace data for researchers?
Data CollectionLog data/trace data has significant advantages as a data
collection approach for the study and investigation of behaviors, including:
• Scale: not a limiting factor as in lab user studies• Power: large sample size for inference testing; in
fact, so large must account for the size effect• Scope: naturalistic; researchers can investigate
range of interactions in a multi-variable context• Location: can collect in distributed environments• Duration: collect log data over an extended period
Methodological Foundations
Customer Behavior (video)
Use of logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods …
• allows data collection without directly interfering into the context and,
• does not require a direct response from participants Chemistry (surface marking)
Methodological FoundationsThree justifications for unobtrusive methods: • Uncertainty principle: researchers interjected into an
environment become part of the system• Observer effect: difference that is made to an activity
or a person’s behaviors by being observed• Observer bias: observers overemphasize behavior
they expect to find and fail to notice behavior they do not expect
Trace data helps in overcoming the Uncertainty principle, Observer effect, and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis
Example: ethnography studies (where the researcher “bird dogs” a study participantExample: no one searches for porn in a lab study of Web searching
Example: is why medical trials are double blind rather than single blind
Methodological FoundationsInherent characteristics in the method of log data
collection; Web analytics has issues to address as a result:• Abstraction – how does one relate low-level data to
higher-level concepts?• Selection – how does one separate the necessary from
unnecessary data? • Reduction – how does one reduce the complexity and
size of the data set?• Context – how does one interpret the significance of
events? • Evolution – how can one collect data without impacting
application deployment or use?
• Okay, nice but how to we apply it …
Web analytics process
• Every consulting firm has a web analytics process … (which is fine)
• However, the effective ones all boil down to four essential steps
Essential steps to any effective web analytics process
Collection of
data
Processing of data into information
Developing key
performance indicators
Formulating online
strategy
Drives Drives Drives
DrivesDrivesDrives
Typically counts.
Basically, data collection
Examples:• time stamp• referral URL• query term
Typically ratios.
Data becomes metrics.
Counts and ratios infused with business
strategy.
Online goals, objectives, or standards for organization.
Examples:• time on page• bounce rate• unique visitors
Examples:• conversion rate• average order value• task completion rate
Examples:• save money• make money• marketshare
Three types (plus 1) of Web analytics metrics Implementation
• Count — the most basic unit of measure; a single number.• Ratio — typically, a count divided by a count, although a
ratio can use either a count or a ratio in the numerator or denominator.
• KPI (Key Performance Indicator) — can be either a count or a ratio, it is frequently a ratio. A KPI is infused with business strategy, and therefore the set of appropriate KPIs typically differs between site and process types.
• Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number.
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Can be applied to three levels of granularity
• Aggregate — Total site traffic for a defined period of time. (typically used for market comparisons)
• Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. (by developing personas and profiles in Google Analytics).
• Individual — Activity of a single Web visitor for a defined period of time. (excellent for persona developing and outlier analysis)
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Classifications of Metrics• Building Block – foundational
metrics • Visit Characterization – metrics
aimed at understanding visits, either single or aggregate
• Content Characterization – metrics aimed at understanding content or its use
• Conversion – metrics aimed at linking visits and content
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Building Block• Page: A page is an analyst definable unit of content.• Page Views: The number of times a page was viewed.• Visits/Sessions: A visit is an interaction by an individual, with a
website consisting of one or more requests for a page.• Unique Visitors: The number of inferred individual people, within
a designated reporting timeframe, with activity consisting of one or more visits to a site.– New Visitor: The number of Unique Visitors with activity
including a first-ever Visit to a site during a reporting period– Repeat Visitor: The number of Unique Visitors with activity
consisting of two or more Visits to a site during a reporting period.
– Return Visitor: The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Visit Characteristics• Entry Page: The first page of a visit.• Landing Page : A page intended to identify the beginning of
the user experience.• Exit Page: The last page on a site accessed during a visit,
signifying the end of a visit/session.• Visit Duration: The length of time in a session.• Referrer: The referrer is the page URL that originally generated
the request for the current page view or object.• Click-through: Number of times a link was clicked by a visitor.• Click-through Rate: The number of click-throughs for a specific
link divided by the number of times that link was viewed.• Page Views per Visit: The number of page views in a reporting
period divided by number of visits in the same reporting period.
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Content Characterization• Page Exit Ratio: Number of exits from a page divided by total
number of page views of that page• Single Page Visits: Visits that consist of one page regardless of
the number of times the page was viewed.• Single Page View Visits (Bounces): Visits that consist of one
page-view.• Bounce Rate: Single page view visits divided by entry pages.
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Conversion Metrics
• Event: Any logged or recorded action that has a specific date and time assigned to it by either the browser or server
• Conversion: A visitor completing a target action
Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Translating these metrics
• Translating these metrics into meaningful and accurate knowledge is not always easy.
• Real world example – the hotel problem (excellent illustration of the importance of proper period selection)
The hotel• Use Daily Uniques
Sam
Ted
Jane
Sam
Scott
Jane
Sam
Ara
Sam
Chi
Sam
Tom
Sam
Yen
Sam
Tim
Jane Jane Jane Jane Jane
Ro
om
s1
2
3
Days 1 2 3 4 5 6 7
3 3 3 3 3 3 3
• Total Daily Uniques = 21
• Use Weekly Uniques
1
1
Count
Count
7
• Total Weekly Uniques = 9
Bottom line: the time qualifier matters!
• So, can’t just add daily uniques to get weekly uniques
• Have to scrub the data• This just one example of many issues that one
can face when digging into the data in order to get meaningful web analytics data!
50 minutes = Can’t Cover Everything
• … some starting points for further reading
Research Work (mine)
• Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis, Hershey, PA: Idea Group Publishing– First chapter on theory of log analysis is free!
• Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics. Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA.– manuscript about Web Analytics, soup to nuts– companion website (free):
http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
Research Work (mine)
• Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it. Library and Information Science Research, 28(3), 407-432.
• http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf
Great ‘how to books’ for web analytics
• Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007)
• Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009)
• Advanced Web Metrics with Google Analytics, 2nd Edition by Brian Clifton (Mar 15, 2010)
• Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004)
Thanks!(welcome questions / discussion!)
Web Analytics
Jim JansenAssociate Professor, The Pennsylvania State University
• Before we end …
Follow-on Discussion
• Happy to chat with anyone (get with me either today or contact me via email)
• Email [email protected]• LinkedIn http://www.linkedin.com/in/jjansen• Twitter jimjansen
Again, thanks!
Web Analytics
Jim JansenAssociate Professor, The Pennsylvania State University