6910 week 8 - testing & optimization

Testing & OptimizationISM 6910 – Week 8

Week 8 Topics

• Testing

• End Action

• Attribution

• Media Mix Modeling

Testing

TestingA/B Tests:

Multi Variety Tests:

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

Search Now

Book Now

= Test cells executed during test = Test cells evaluated but not executed

Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving

A/B Test Example:Videos continue to score higher in NSAT and PSAT, but under perform in conversions.

NSAT Results - FYQ2 2011With video results were stat. significantly higher

Unique Visitors

NSAT PSAT

FPP Upgrade

Conv. Rate

Avg. Revenue

(All SKUs)

Compare w/o Video

186,330 108 134 0.35% $1.13

Compare with Video

187,185 124 136 0.30% $0.95

Lift 16 2 (0.05%) ($0.18)

*PSAT lift only has a stat. significance of 91%, all other are +99%

FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.

A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.

Numbers have been doctored to hide client sensitive data

Multivariate TestsFull factorial – test every possible variation. For example if you are testing four different elements and four variations of three element you are looking at 4 x 3 x 3 = 36 combinations.

Partial factorial – Partial factorial tests can be set up in a way that allows you to infer results, the Taguchi method is probably the most commonly used method.

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

Search Now

Book Now

= Test cells executed during test = Test cells evaluated but not executed

Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving

Call to Action

Message*

Shopping Rate by Call to Action

Book Now Search Now Go

Shopping Rate by Offer

Tanning Beach Cruise Bay Bridge

Multivariate Example:

Shopping Rate by Call to Action

Expedia as Hero It's Time Offer Only Numbers

Shopping Rate by Offer

Book Together and Save 50% off Hotels Generic

Pros and Cons

Pros Cons

A/B Test • Set up is relatively easy• Analysis is easier• Don’t need much of a stats

background to interpret results

• Easy to get sucked into testing too many things at once

• A and B need to be different enough to get results

• Time consuming to test on element at a time

Multivariate Test • Less political push back, everyone gets to test their idea

• Get all of the analysis done in one shot

• Easy to mess up• Tools are a black box, or do it

your self + your best PhD stats buddy

• Need a lot of volume or time

Testing Recommendations

Start with high impact tests:• Test home/landing pages• Test Conversions i.e. sign-up forms, cart/purchase

pages, etc.• Test ad design• Price tests (a hard one politically to pull off)

Other great things to tests:• Test landing page/deep linking• Page Heros

Testing Best Practices

• Start with a hypothesis – Don’t just start testing random stuff like colors unless you have a good reason.

• Set goals – Looking to improve conversion rate by x%

• What is significant – We’re not testing drugs, no one’s life is on the line so 99.9% statistical significance is probably over kill, but what about say 60%???

More Testing Tips

• Get help – Setting up the test, searching through your old stats notes can be a challenge. Don’t

• Make it fun/interesting – It takes a lot to pull of a good test: UX, creative team, site dev, analysts, and maybe more. Plus someone’s budget. Everyone has an opinion and/or theory, you can use that to get momentum for a testing project.

At Getty we held a company wide contest to see who could pick the winner of a multivariate test. There were +300 possible combinations and everyone got to vote on which one they thought would be the winner.

End Action – Site Surveys

How It Works

• NSAT• PSAT• Value Prop• Purch. Intent

Attitudinal End Action ProcessEA captures both behavioral and attitudinal data and correlates shifts in attitude with end actions taken on site:

What is it good for

Combining attitudinal & behavioral data

The End Action scorecard was originally designed to value experiences based on shifts in attitudes. For Q4 2010, we added Microsoft Store Purchase behavior as well:

End Action conversion ratesUsing End Action cookie data we can report on a more accurate conversion rate.If we assume most site visits don’t last longer than 30 minutes, we can conclude less than half of Store buyers (43%) make a purchase during their first site visit. The remaining purchasers return later (sometimes days later) to complete their purchase. Using End Action cookie data site visitors who read a product review, leave the Shop page, and return later to finally make a purchase will still be counted when reporting on site visitors who read a product review and then made a purchase.

Challenges

Measurement overload

EA measures correlations, not causation

Example: People who watch 7 Second demos have 10% higher Win.com NSAT than people who don’t watch demos

However, EA quantifies correlation, not causation• Cannot immediately say: Watching videos makes people more 10% satisfied• This requires additional information such as specific testing, observation, and insight

Survey timing can create respondent biases

Site visitors are invited to take the EA survey as soon as they leave the windows domain. So, as site visitors move further down the funnel, survey respondents start to look more like visitors who are abandoning their cart vs. purchasers. This can be seen in the illustration below.

In this example, Visitor A takes the survey and will be included in the NSAT results for the Visit Shop EA, but does not purchase. While Visitor B completes the purchase process but by doing so, never receives a survey invite.

Key Insights

Compare Page VideoVideos continue to score higher in NSAT and PSAT, but under perform in conversions.

NSAT Results - FYQ2 2011With video results were stat. significantly higher

Unique Visitors

NSAT PSAT

FPP Upgrade

Conv. Rate

Avg. Revenue

(All SKUs)

Compare w/o Video

186,330 108 134 0.35% $1.13

Compare with Video

187,185 124 136 0.30% $0.95

Lift 16 2 (0.05%) ($0.18)

*PSAT lift only has a stat. significance of 91%, all other are +99%

FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.

A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.

Attitudes Influence Buying BehaviorAs site visitors move deeper into the site and further down the purchase funnel, we start to see an increase in both site satisfaction (NSAT) and the Windows 7 Upgrade conversion rate. Based on the EA survey data, we know that we have some levers for improving site satisfaction – using video or interactive experiences, providing value added downloads etc. From this data, we can see that by first improving NSAT, we can push more people into a transactional mode on the site.

(1) NSAT % ∆ from FYQ4 EAA Scorecard(2) FPP Upgrade # ∆ from FYQ4 Sales Trans Scorecard Note: Purchase NSAT & Video Conv. Rate were not statistically significant by

*Conversion rate = purchasers who took the end action / count of unique cookies who took the end action.

Target Content = Higher ScoresWindows 7 visitors – Visitors who visited the Compare pages, Anytime Upgrade, and Features pages had higher NSAT scores while less relevant pages like the Upgrade Advisor and the Get win7 default page scored lower.

Vista Visitors – Vista users who visited the Compare pages and Upgrade Advisor related pages had higher NSAT scores. The less relevant Anytime Upgrade pages scored lower.

XP Visitors – Similar to the Vista users, the Compare pages and Upgrade Advisor related pages had higher NSAT scores, while the less relevant Anytime Upgrade pages scored lower.

Multi Touch Attribution

Ad Conversions

When a user clicks on an Ad they re-directed through Atlas to the destination page.

Atlas records the click and re-directs the user to the destination page.

Atlas Ad Server(img server,

If the site has an action tag on the landing page the visit can now be directly tied back to the ad.

Atlas can then tie each ad impressions and click back to the action tag, (per cookie). This is data is then used to optimize the ad campaign.

GA Video

http://youtu.be/Cz4yHOKE5j8

Advanced Attribution: DetailsProblem: Ad-server rules are heavily biased in favor of click-based and ‘last-touch’ exposures (i.e. branded search) and undervalue a person’s history of exposure to display media.

Objective: Correct this bias by reallocating credit for conversions in proportion to the relative contribution of past exposures.

Approach: Model cookie-exposure history to estimate relative contribution. Use model estimates to ‘score’ the individual placements; awarding each placement some, all, or no credit for a cookies conversion.

Action: Media-planners may optimize online media budget, either during or after a campaign, towards those publishers and engagements that drive the greatest ROI.

There are several approaches

Method II: Recency-weighted Attribution

Score is assigned according to its time distance to conversion

Special weight might be given to the first and last touch point

Method III: Probabilistic Attribution

Weight is given according to conversion probability change from exposure to the ads

Probability is calculated from predicting models on ads frequency and attributes

1/n C1/n1/n 1/n 1/n 1/n 1/n 1/n

CS.t1S.t2S.t3S.t4S.t5S.tn S.tn-1 S.tn-2

CΔP1ΔP2ΔP3ΔP4ΔP5ΔP8 ΔP7 ΔP6

Simple approach, but flawed in that it’s really a “welfare state” for media that does not address relative efficacy

More nuanced approach differentiates by recency, but does not account for relative performance differences of different formats

More complex performance-based approach uses the change in historical conversion probability per exposure to allocate credit

Method I: Even Distribution

Score = 1/ n (n is total exposure frequency)

Outcome ExampleUsing the conversion rates under the attribution model certain placements and networks look better or worse, this directly effects how and where the media team purchases ad placements.

Incremental revenue from attribution

Incremental revenue increase is calculated by comparing attribution media optimization against last touch media optimization.

Incremental revenue increase varies with the degree of optimization shift from least to most efficient media.

5% optimization: +$15 million (+2.67%).

10% optimization: +$29 million (+5.09%) incremental revenue increase.

15% optimization: +$45 million (+7.95%) incremental revenue increase.

Base Lowest 5% Lowest 10% Loweest 15%

$520,000,000

$540,000,000

$560,000,000

$580,000,000

$600,000,000

$620,000,000

$640,000,000

$563,957,214

$576,048,162 $579,374,307 $582,368,257

$563,957,214

$591,141,240$608,291,365

$627,830,612

Revenue With OptimizationStandard Last Touch Razorfish Advanced Attribution

$15 MM

$45 MM

Case Study

ControlTest

48% lift in Paid Search Click-Through Rate due to Banner Ad Exposure

Test group was exposed to client

media when encountering

campaign placements

Control group was exposed to PSA

media when encountering

campaign placements

• Across clients and advertisers, banner exposure consistently drives incremental search clicks and conversions

• Clearly, some portion of credit for search conversion belongs to prior display (and other media exposure)

• Attribution quantifies the relative contributions of each touch point and allocates credit accordingly

Example is from an apparel retailer. We ran a “true lift test” – where we held out a random control from all display media for a period, and evaluated performance differences between control and exposed. These results are consistent with other similar test run for other clients.

Media Mix Models

ConversionsTV

Display Mobile

Cinema

Media Mix ModelsProblem: When multi-channel marketing efforts occur simultaneously it can be hard to identify which of these channels responsible for conversions. Answers are difficult to come by when direct measurement of individual-level exposure is not feasible (i.e. OOH, TV etc.).

Objective: Create a model that accurately reflects how well each channel operates within a general business/marketing environment.

Approach: Use daily (or weekly) tracking data to specify the relationship between channel activity and conversion volume. Incorporate into the models channel-specific accumulation and decay effects as well as relevant, macroeconomic indicators and historical events.

Action: Using the results to estimate the channel specific point of diminishing returns, the optimal spend per channel is appraised for future campaigns.

Factors and media effectsThe most important aim of the attribution analysis is to get to the relationship between media spend and the KPI that we are optimizing for. In order to get there, we need to understand each media type impacts KPIs and each other

Ad Stocking effectsAdding the ad stocking effect of media to the model helps account for the diminishing effects of an ad over time. The chart below shows the approximate half life of each media type modeled. Note some media types have a longer half life than others, i.e. the effect of TV ads tend to last longer than a banner ad for example.

Optimizer

Ad Stocking Effects

Effectiveness Curves

Media Cost Curves

Total Budget

Typical Half Life's by Media Types

Media effectiveness curvesThe effectiveness of media diminishes as the volume of exposure is increased. Eventually the incremental change in media will have little to no effect on the reached audience, the saturation point. Each media type reaches its saturation point at different levels of exposure (GRPs).

Diminishing Returns by Media Type

Optimizer

Ad Stocking Effects

Media Cost

Curves

Total Budget

Media cost effects

Media Reach Curves: • Inventory constraints for each media type.• Planner judgment on maximum feasible investment levels.

Media Cost Curves:• These reflect how media costs scale as spend scales.• These need to capture realities such as increasing costs per reach point,

seasonality etc. in order to pragmatically reflect the media landscape.

Optimizer

Ad Stocking Effects

Media Cost

Curves

Total Budget

Budget effectsBecause the saturation point and level of effectiveness changes at a different rate for each media type the overall optimal mix for each channel will change with the overall media budget. In the example below shows how optimal mix in spend shifts from one media type to another depending on the level of spend.

Diminishing Returns by Media Spend

Budget A

Budget B

Optimizer

Ad Stocking Effects

Effectiveness

Curves

Media Cost

Curves

Total Budget

OptimizationThe optimizer takes into account all of the factors, ad stocking, diminishing returns, cost and inventory constraints and through and through an iterative process chooses the optimal media channel for each incremental dollar spent.

Diminishing Returns by Media Spend Final Optimized Results

Optimizer

Ad Stocking Effects

Effectiveness

Curves

Media Cost

Curves

Total Budget

6910 week 8 - testing & optimization

test cells

good test

timemultivariate test

ab test example

test landing pagedeep

conspros consab test

testing recommendationsstart

testing project

Technology

website testing & optimization framework & results

6910 week 4 - sem, seo, & cxm

testing & optimization for mobile devices

design optimization and testing of combined …

6910 6911 seminar 3 (fall 2013)

6910 6911 seminar 3 (fall 2014)

value of optimization testing

software testing optimization by advanced quantitative...

6910 6911 seminar 2 canvas

6910 6911 seminar 4 (fall 2014)

production, optimization, purification, testing anti

on testing global optimization algorithms for space

best practices - testing & optimization | bredan rendan

l.l.bean a/b testing and optimization analyst

applications of optimization to logic testing

energy testing and optimization of mobile applications

automated cluster testing and optimization

is 6910 (1985): method of testing corrosion resistance of...

3g cluster optimization by drive testing

galileo optimization: single and dual frequency testing