6910 week 8 - testing & optimization
Post on 15-May-2015
161 Views
Preview:
TRANSCRIPT
Testing & OptimizationISM 6910 – Week 8
Week 8 Topics
• Testing
• End Action
• Attribution
• Media Mix Modeling
Testing
TestingA/B Tests:
Multi Variety Tests:
vs.
It's TimeExpedia
HeroNumbers
Offer only
It's TimeExpedia
HeroNumbers
Offer only
It's TimeExpedia
HeroNumbers
Offer only
Go
Search Now
Book Now
= Test cells executed during test = Test cells evaluated but not executed
Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving
A/B Test Example:Videos continue to score higher in NSAT and PSAT, but under perform in conversions.
NSAT Results - FYQ2 2011With video results were stat. significantly higher
Unique Visitors
NSAT PSAT
FPP Upgrade
Conv. Rate
Avg. Revenue
(All SKUs)
Compare w/o Video
186,330 108 134 0.35% $1.13
Compare with Video
187,185 124 136 0.30% $0.95
Lift 16 2 (0.05%) ($0.18)
*PSAT lift only has a stat. significance of 91%, all other are +99%
FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.
A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.
Numbers have been doctored to hide client sensitive data
Multivariate TestsFull factorial – test every possible variation. For example if you are testing four different elements and four variations of three element you are looking at 4 x 3 x 3 = 36 combinations.
Partial factorial – Partial factorial tests can be set up in a way that allows you to infer results, the Taguchi method is probably the most commonly used method.
It's TimeExpedia
HeroNumbers
Offer only
It's TimeExpedia
HeroNumbers
Offer only
It's TimeExpedia
HeroNumbers
Offer only
Go
Search Now
Book Now
= Test cells executed during test = Test cells evaluated but not executed
Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving
Offer
Call to Action
Message*
Shopping Rate by Call to Action
165
191
180
150
180
210
Book Now Search Now Go
Sh
op
pin
g R
ate
pe
r m
illi
on
co
ok
ies
Shopping Rate by Offer
157
200
179
150
180
210
Tanning Beach Cruise Bay Bridge
Sh
op
pin
g R
ate
pe
r m
illi
on
co
ok
ies
Image
Multivariate Example:
Shopping Rate by Call to Action
176
159
201
179
150
180
210
Expedia as Hero It's Time Offer Only Numbers
Sh
op
pin
g R
ate
pe
r m
illi
on
co
ok
ies
Shopping Rate by Offer
174
183
180
150
180
210
Book Together and Save 50% off Hotels Generic
Sh
op
pin
g R
ate
pe
r m
illi
on
co
ok
ies
_____
Pros and Cons
Pros Cons
A/B Test • Set up is relatively easy• Analysis is easier• Don’t need much of a stats
background to interpret results
• Easy to get sucked into testing too many things at once
• A and B need to be different enough to get results
• Time consuming to test on element at a time
Multivariate Test • Less political push back, everyone gets to test their idea
• Get all of the analysis done in one shot
• Easy to mess up• Tools are a black box, or do it
your self + your best PhD stats buddy
• Need a lot of volume or time
Testing Recommendations
Start with high impact tests:• Test home/landing pages• Test Conversions i.e. sign-up forms, cart/purchase
pages, etc.• Test ad design• Price tests (a hard one politically to pull off)
Other great things to tests:• Test landing page/deep linking• Page Heros
Testing Best Practices
• Start with a hypothesis – Don’t just start testing random stuff like colors unless you have a good reason.
• Set goals – Looking to improve conversion rate by x%
• What is significant – We’re not testing drugs, no one’s life is on the line so 99.9% statistical significance is probably over kill, but what about say 60%???
More Testing Tips
• Get help – Setting up the test, searching through your old stats notes can be a challenge. Don’t
• Make it fun/interesting – It takes a lot to pull of a good test: UX, creative team, site dev, analysts, and maybe more. Plus someone’s budget. Everyone has an opinion and/or theory, you can use that to get momentum for a testing project.
At Getty we held a company wide contest to see who could pick the winner of a multivariate test. There were +300 possible combinations and everyone got to vote on which one they thought would be the winner.
End Action – Site Surveys
How It Works
• NSAT• PSAT• Value Prop• Purch. Intent
Attitudinal End Action ProcessEA captures both behavioral and attitudinal data and correlates shifts in attitude with end actions taken on site:
What is it good for
Combining attitudinal & behavioral data
The End Action scorecard was originally designed to value experiences based on shifts in attitudes. For Q4 2010, we added Microsoft Store Purchase behavior as well:
End Action conversion ratesUsing End Action cookie data we can report on a more accurate conversion rate.If we assume most site visits don’t last longer than 30 minutes, we can conclude less than half of Store buyers (43%) make a purchase during their first site visit. The remaining purchasers return later (sometimes days later) to complete their purchase. Using End Action cookie data site visitors who read a product review, leave the Shop page, and return later to finally make a purchase will still be counted when reporting on site visitors who read a product review and then made a purchase.
Numbers have been doctored to hide client sensitive data
Challenges
Measurement overload
Numbers have been doctored to hide client sensitive data
EA measures correlations, not causation
Example: People who watch 7 Second demos have 10% higher Win.com NSAT than people who don’t watch demos
However, EA quantifies correlation, not causation• Cannot immediately say: Watching videos makes people more 10% satisfied• This requires additional information such as specific testing, observation, and insight
Survey timing can create respondent biases
Site visitors are invited to take the EA survey as soon as they leave the windows domain. So, as site visitors move further down the funnel, survey respondents start to look more like visitors who are abandoning their cart vs. purchasers. This can be seen in the illustration below.
In this example, Visitor A takes the survey and will be included in the NSAT results for the Visit Shop EA, but does not purchase. While Visitor B completes the purchase process but by doing so, never receives a survey invite.
Key Insights
Compare Page VideoVideos continue to score higher in NSAT and PSAT, but under perform in conversions.
NSAT Results - FYQ2 2011With video results were stat. significantly higher
Unique Visitors
NSAT PSAT
FPP Upgrade
Conv. Rate
Avg. Revenue
(All SKUs)
Compare w/o Video
186,330 108 134 0.35% $1.13
Compare with Video
187,185 124 136 0.30% $0.95
Lift 16 2 (0.05%) ($0.18)
*PSAT lift only has a stat. significance of 91%, all other are +99%
FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.
A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.
Numbers have been doctored to hide client sensitive data
Attitudes Influence Buying BehaviorAs site visitors move deeper into the site and further down the purchase funnel, we start to see an increase in both site satisfaction (NSAT) and the Windows 7 Upgrade conversion rate. Based on the EA survey data, we know that we have some levers for improving site satisfaction – using video or interactive experiences, providing value added downloads etc. From this data, we can see that by first improving NSAT, we can push more people into a transactional mode on the site.
(1)
(2)
(1) NSAT % ∆ from FYQ4 EAA Scorecard(2) FPP Upgrade # ∆ from FYQ4 Sales Trans Scorecard Note: Purchase NSAT & Video Conv. Rate were not statistically significant by
+/-5%
*Conversion rate = purchasers who took the end action / count of unique cookies who took the end action.
Numbers have been doctored to hide client sensitive data
Target Content = Higher ScoresWindows 7 visitors – Visitors who visited the Compare pages, Anytime Upgrade, and Features pages had higher NSAT scores while less relevant pages like the Upgrade Advisor and the Get win7 default page scored lower.
Vista Visitors – Vista users who visited the Compare pages and Upgrade Advisor related pages had higher NSAT scores. The less relevant Anytime Upgrade pages scored lower.
XP Visitors – Similar to the Vista users, the Compare pages and Upgrade Advisor related pages had higher NSAT scores, while the less relevant Anytime Upgrade pages scored lower.
Numbers have been doctored to hide client sensitive data
Multi Touch Attribution
Ad Conversions
When a user clicks on an Ad they re-directed through Atlas to the destination page.
Atlas
Atlas records the click and re-directs the user to the destination page.
Atlas Ad Server(img server,
CDN)
Atlas
1x1
If the site has an action tag on the landing page the visit can now be directly tied back to the ad.
Atlas can then tie each ad impressions and click back to the action tag, (per cookie). This is data is then used to optimize the ad campaign.
GA Video
http://youtu.be/Cz4yHOKE5j8
Advanced Attribution: DetailsProblem: Ad-server rules are heavily biased in favor of click-based and ‘last-touch’ exposures (i.e. branded search) and undervalue a person’s history of exposure to display media.
Objective: Correct this bias by reallocating credit for conversions in proportion to the relative contribution of past exposures.
Approach: Model cookie-exposure history to estimate relative contribution. Use model estimates to ‘score’ the individual placements; awarding each placement some, all, or no credit for a cookies conversion.
Action: Media-planners may optimize online media budget, either during or after a campaign, towards those publishers and engagements that drive the greatest ROI.
There are several approaches
Method II: Recency-weighted Attribution
Score is assigned according to its time distance to conversion
Special weight might be given to the first and last touch point
Method III: Probabilistic Attribution
Weight is given according to conversion probability change from exposure to the ads
Probability is calculated from predicting models on ads frequency and attributes
1/n C1/n1/n 1/n 1/n 1/n 1/n 1/n
CS.t1S.t2S.t3S.t4S.t5S.tn S.tn-1 S.tn-2
CΔP1ΔP2ΔP3ΔP4ΔP5ΔP8 ΔP7 ΔP6
Simple approach, but flawed in that it’s really a “welfare state” for media that does not address relative efficacy
More nuanced approach differentiates by recency, but does not account for relative performance differences of different formats
More complex performance-based approach uses the change in historical conversion probability per exposure to allocate credit
Method I: Even Distribution
Score = 1/ n (n is total exposure frequency)
Outcome ExampleUsing the conversion rates under the attribution model certain placements and networks look better or worse, this directly effects how and where the media team purchases ad placements.
Incremental revenue from attribution
Incremental revenue increase is calculated by comparing attribution media optimization against last touch media optimization.
Incremental revenue increase varies with the degree of optimization shift from least to most efficient media.
5% optimization: +$15 million (+2.67%).
10% optimization: +$29 million (+5.09%) incremental revenue increase.
15% optimization: +$45 million (+7.95%) incremental revenue increase.
Base Lowest 5% Lowest 10% Loweest 15%
$520,000,000
$540,000,000
$560,000,000
$580,000,000
$600,000,000
$620,000,000
$640,000,000
$563,957,214
$576,048,162 $579,374,307 $582,368,257
$563,957,214
$591,141,240$608,291,365
$627,830,612
Revenue With OptimizationStandard Last Touch Razorfish Advanced Attribution
$15 MM
$29MM
$45 MM
Case Study
0.69
1.02
ControlTest
0.69%
1.02%
48% lift in Paid Search Click-Through Rate due to Banner Ad Exposure
Test group was exposed to client
media when encountering
campaign placements
Control group was exposed to PSA
media when encountering
campaign placements
• Across clients and advertisers, banner exposure consistently drives incremental search clicks and conversions
• Clearly, some portion of credit for search conversion belongs to prior display (and other media exposure)
• Attribution quantifies the relative contributions of each touch point and allocates credit accordingly
Example is from an apparel retailer. We ran a “true lift test” – where we held out a random control from all display media for a period, and evaluated performance differences between control and exposed. These results are consistent with other similar test run for other clients.
Media Mix Models
ConversionsTV
Radio
Display Mobile
Cinema
Media Mix ModelsProblem: When multi-channel marketing efforts occur simultaneously it can be hard to identify which of these channels responsible for conversions. Answers are difficult to come by when direct measurement of individual-level exposure is not feasible (i.e. OOH, TV etc.).
Objective: Create a model that accurately reflects how well each channel operates within a general business/marketing environment.
Approach: Use daily (or weekly) tracking data to specify the relationship between channel activity and conversion volume. Incorporate into the models channel-specific accumulation and decay effects as well as relevant, macroeconomic indicators and historical events.
Action: Using the results to estimate the channel specific point of diminishing returns, the optimal spend per channel is appraised for future campaigns.
Factors and media effectsThe most important aim of the attribution analysis is to get to the relationship between media spend and the KPI that we are optimizing for. In order to get there, we need to understand each media type impacts KPIs and each other
Ad Stocking effectsAdding the ad stocking effect of media to the model helps account for the diminishing effects of an ad over time. The chart below shows the approximate half life of each media type modeled. Note some media types have a longer half life than others, i.e. the effect of TV ads tend to last longer than a banner ad for example.
Optimizer
Ad Stocking Effects
Effectiveness Curves
Media Cost Curves
Total Budget
Typical Half Life's by Media Types
Media effectiveness curvesThe effectiveness of media diminishes as the volume of exposure is increased. Eventually the incremental change in media will have little to no effect on the reached audience, the saturation point. Each media type reaches its saturation point at different levels of exposure (GRPs).
Diminishing Returns by Media Type
Optimizer
Ad Stocking Effects
Effectiveness Curves
Media Cost
Curves
Total Budget
Media cost effects
Media Reach Curves: • Inventory constraints for each media type.• Planner judgment on maximum feasible investment levels.
Media Cost Curves:• These reflect how media costs scale as spend scales.• These need to capture realities such as increasing costs per reach point,
seasonality etc. in order to pragmatically reflect the media landscape.
Optimizer
Ad Stocking Effects
Effectiveness Curves
Media Cost
Curves
Total Budget
Budget effectsBecause the saturation point and level of effectiveness changes at a different rate for each media type the overall optimal mix for each channel will change with the overall media budget. In the example below shows how optimal mix in spend shifts from one media type to another depending on the level of spend.
Diminishing Returns by Media Spend
Budget A
Budget B
Optimizer
Ad Stocking Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
OptimizationThe optimizer takes into account all of the factors, ad stocking, diminishing returns, cost and inventory constraints and through and through an iterative process chooses the optimal media channel for each incremental dollar spent.
Diminishing Returns by Media Spend Final Optimized Results
Optimizer
Ad Stocking Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
top related