research business report - sample frame experts

13 years of delivering MR business insights, from the publisher ofRESEARCH CONFERENCE REPORTRESEARCH DEPARTMENT REPORTPHARMA MARKET RESEARCH REPORTwww.rflonline.com

Research BusinessReport

Technological, Methodological, Economic & Business Changes Impacting MR

September - October 2008

toy represented by digital fingerprinting tech-nology, R B R approached companies promi-nently associated with commercial digital fin-gerprinting, most of whom completed a checklist/question-naire about their capabilities. We also contacted these ser-v i c e s ’ customers and MR agencies with new proprietary digi-tal fingerprinting capabilities and consulted a number of estab-lished authorities in research-related technologies.This R B R investigation documents digital fingerprinting as a

k e y tool in addressing duplicate and fraudulent online respon-d e n t s . However, it is also fair to say that digital fingerprintingis 1) far from “new,” 2) not costly or imposing for nearly

any research agency to develop internally, 3) nott e r r i b l y d i f f e r e n t i a t e dbetween commercial ser-vices and 4) more limitedin its usefulness than originally represented. “The only time that fingerprinting is really effec-

tive,” asserts Socratic Technologies CEO Bill McElroy (who,we discovered, may know more about the subject than any-one else), “is if you use a lot of different sample sources tofill a very large and complex sample frame. Agencies and bigpanels should be screening for duplicates. Not everybody istracking everything to the same level of detail, but the indus-try is getting down to the point where we can eliminate99.9% of a 0.8% problem.”

Digital fingerprinting enteredMR industry vernacular thispast March when PeanutLabs–a VC tech start-up spe-

cializing in recruiting research sample from online social net-works and communities–commercially launched itsOptimusID™ product. R B R’s editors were treated to a pri-vate demonstration of the technology in January 2008.Peanut Labs touted its offering as a “new digital fingerprintingtechnology that improves data quality by accurately identify-ing and flagging suspect respondents as defined by you inreal-time from any sample source across the industry.”After the industry’s two-year-old preoccupation with pro-

fessional respondents, Peanut Labs was warmly received.One month after the Optimus launch, Ali Moiz, PeanutLabs’ COO, was accorded the ARF’s 2008 Silver GreatMinds Award for development of the most innovativeresearch idea. Peanut Labs’ press release aboutthe award noted of Optimus, “marketers andadvertisers can breathe easy knowing they

The advent of automated “digital fingerprinting”technologies that accurately identify duplicate orfraudulent online survey sample respondents with-

o u t breach of privacy has garnered masses of industry atten-t i o n throughout 2008. Commercial providers are signingcustomers and an increasing number of MR firms are intro-ducing i n t e r n a l l y -developed capabilities. Research agencieswithout access to internal or third-party digital fingerprint-ing technology are searching for this solution as they didfor online research capability a decade ago. However, theselauded, validated and rapidly adopted technologies are char-a c t e r i z e d by misconceptions regarding their novelty, com-

petitive differences and, most of all,impact on data quality.Investigations into “professional”

respondents–“incentive chasers”and/or people who spend an exces-sive amount of time responding to sur-

veys– confirm the existence of these problematic individuals,but the extent of their influence is inconclusive. Individualcompanies have studied this issue and ARF is currentlyrepeating those efforts.In 2005, comScore shocked the industry with an analysis

that 0.25% of the population accounted for 30% of surveycompletions. Later, Burke Research reported up to 14% of a

sample (across a number ofsample providers)are suspect andESOMAR’s esti-mated that 62%

of panelists belong to multiple panels. It is no secret that sam-ple aggregators are commonly used to stock panels, whichcan lead to the same people–some unknowingly–being signedup for the same panel multiple times. Tim Macer in the June2008 issue of Research Magazine cited anecdotes of individualrespondents with 100 or 200 different panel memberships.A flurry of commercial digital fingerprinting launches and

research partnerships from March to August trumpetedthe control, at long last, of “professional” survey respondents.Accenting this trend, a Forrester Research report asked,“Is The Long Online Panel Quality Nightmare Over?” MRmedia headlines screamed “Chasing the Cheaters” and“Changing the Online Research Game.” Even R B R w a s

called out for contributing to the hype in aJuly/August letter to the editor.To set the record straight about the shiny new

2

have uninhibited access to insights of key demographicswithout the frustrations of survey fraud that until now hasdragged research quality down.”It didn’t take long for the digital fingerprinting market to

draw a crowd of commercial and proprietary entrants.Some had apparently existed in relative obscurity for years:• Germany’s Mo’Web Research actually debuted a com-

mercial digital fingerprinting technology in 2007, but hadn’tmade much of a blip in the U.S. because it isn’t licensable.• Authentic Response, an online panel and research services

provider, touted its own digital fingerprinting capability,which had reportedly been bundled into its broader servicesfor years. President Jeff Mattes remarked to RBR, “Whydo you think there is a fingerprint is in our logo?”• In April, MarketTools licensed a digital fingerprinting capa-

bility from 41st Parameter, whose technology has safeguardedfinancial institutions and e-commerce from fraud for years.• In July, Greenfield Online announced a partnership

with research technology provider RelevantView®. In rapid succession, C&R Research, Western Wats and

other high-quality full-serviceagencies rolled out proprietarydigital fingerprinting versions.And in late-September, field andtab provider Mktg Inc. debutedits Crop Duster™ service. COO Steve Gittelman noted, “If

y o u have created your owninterviewing software for pro-g r a m m i n g and hosting purposes,you should be able to create atool like this.” Gittelman saidCrop Duster was producedwithout external assistance insix months.Quite by chance, R B R d i s c o v e r e d

the unassuming, unofficial “fatherof research digital fingerprinting.”Recognized research technology

authority Bill McElroy, CEO of Socratic Technologies, con-fessed having followed the recent chatter about digital finger-printing with a mixture of amusement and regret. “I get thebiggest kick from companies with pronouncements aboutpatenting this technology,” he told R B R. “We refer to thetechnology as a de-duping algorithm. The truth is that a lot ofit is actually built right into the servers that run websites, andit doesn’t take very much magic to put the technology towork in a survey environment. It looks to me that de-dupingalgorithm providers are doing, very, very common stuff, andcertainly nothing substantive that’s proprietary.“Almost anybody who runs their own system via a Web

server can do the same thing,” he continued. “There’s nosoftware, cookies or incompatibilities. The one problem isthat most people using off-the-shelf survey software, ASPsor DIY solutions lack the control at the server level. They

don’t have those software codes and likely lack the continuitythat comes from an IT team. People who are running aproprietary system possess both. Packaged software can’tsee Web server communications and can’t interpret all ofthe data strings, which require a third-party.“A client buying sample from six different sample

sources can pass them first through a de-duping algorithm,which is also not new technology. We used de-dupingalgorithms at Socratic as early as 2003, and the capabilityexisted well before that. This is five- or six-year-old technol-o g y , which in Internet time is eons,” MacElroy remarked.“We have never really talked to our clients about possessing

it; we’ve just done it. It never occurred to us to market

Mktg COO Steve Gittelman

Explaining Digital Fingerprinting

Digital fingerprinting identifies and distinguishes each individualrespondent by gathering a number of public data points fromeach participant’s computer as they agree to engage in a survey orjoin a panel. The technology creates a “machine fingerprint”based on small hardware-level variations on each machine (set-tings, configuration, etc.), and tags the respondent’s computerwith a piece of code (not a cookie), basically the equivalent of anon-invasive, automated computer kiss.Providers of commercial and private digital fingerprinting that

submitted responses to an R B R questionnaire include PeanutLabs, 41st Parameter, RelevantView, Mo’Web, C&R Research,Mktg Inc. MarketTools and Greenfield Online. They all usealgorithm-based machine (PC) characteristics and deploy tech-n o l o g y before a respondent begins taking a survey. None candifferentiate between respondents who might legitimatelyaccess a survey at the same computer within the same household,office, Internet café, etc., unless each user logs on/off.There are a few minor variations among providers, mainly relat-

ed to whether the capability is privately held or commerciallyavailable. If the private service sells its own panel, fingerprinting isgenerally applied to that panel, but if other sample sources areneeded to fill quota, fingerprinting may not be applied to thosesample sources. Most commercial providers offer online reportingof suspect respondents in real-time; some private providers donot. All applications are hosted; several commercial ones are alsoportable and may be operated from the client’s website. Mostproviders can adjust the fingerprint identification criteria to fitclient specifications and can append information to the recordrelated to undesirable behaviors like satisficing, straight-lining, etc.According to several customers of these services and anony-

mous representatives of some fingerprinting providers, there isessentially no difference between technologies in the market.Some account for more configuration factors and variables intheir algorithm than others. This can boost their accuracy, albeitnegligibly. A more advanced capability might compare the differ-ence between the state the computer resides in and the state ofthe server that they are trying to pick. That difference becomesan added variable in the fingerprint. Services R B R r e v i e w e dclaimed to be accurate within a few tenths of one percent.

are effective because people click on a link to the surveyand immediately pass through the external filter. Sampleproviders charge you per complete. When a respondentclicks on the final stage, he or she returns to the originalsource, is logged as a survey completer and the providerthen bills me. If you don’t identify duplicate respondentsbefore a survey, a problem is raised. If duplications arereported after the survey and a person belongs to threepanels, which panel do you pay? All three competed yoursurvey and were provided by a supplier in good faith.”For a supplier with server access, McElroy estimates moder-

a t e difficulty in implementing an internal de-duping algorithmcapability. “It depends on the degree people are runningtheir own completely integrated system,” he s t r e s s e d .“Some programming and IT sophistication, preferably an ITdepartment, needs to monitor the servers.”Most de-duplication situations don’t perturb McElroy. “Any

inference that the world is going to end if we do not catch allof these duplicates is, I think, overstated The probability ofduplicates in a small sample of a couple of hundred is low,”he asserted. “The average isvery, very small, far less than1%. And it’s not necessarilyacute in specialized panels,which are pretty clean. Butwhen you need more thanone panel for your sample,we know there is up to 30%duplication of peoplebetween the big national pan-els,” McElroy explained.“It’s just not a very big deal

unless you are doing massiveongoing tracking studies,” heconcluded. “You find duplica-tion in some big IT studies,particularly in foreign coun-tries. No one alone can fillthe sample needed by theauto manufacturers, the big technology corporations, etc. fortheir massive trackers. Over time, they need more and moreproviders, compounding the probability of picking up aduplicate.”Even if sample is from one panel, McElroy stressed the non-

necessity of deploying de-duping algorithms. “Some dupli-cates do exist in the same source,” he cautioned. “If you wantto deal with duplication and cheaters, they are pretty easy tocatch. It’s also pretty easy for a sample company to hide thefact they are selling duplicates. Frankly, it isn’t financially lucra-tive for a panel to wrestle down cheaters to the Nth level.“There is no money in de-duping sample. Except for panels

committed to pure, unique sample, there’s not a lot of incen-tive for the big houses to do cleaning outside of the obvious.Cheaters don’t have one email address, but that doesn’tmean they aren’t easily trackable. All sample providers have

3

this capability as some kind of super secret sauce,” h ecommented, “because, to me, it would seem like trying topatent the number ‘7.’ Some executives apparently believe inhanging a trademark on anything that seems to be different.”McElroy said the capability is so available that he has been

sharing his knowledge and tips about the technique with therest of the industry since at least 2004. He wrote “How toCatch a Cheat” for Quirk’s Marketing Research Review t h a tyear, outlining his use of real-time data-matching algorithms todetect and eliminate duplicates with 90%-plus accuracy. His2005 Powerpoint presentation, titled “How to Catch OnlineSurvey Cheaters and Detect Lazy Respondent Behavior,”demonstrated Socratic was doing an IP address, browserconfiguration string and language setting check on potentialparticipants for every survey.Explaining the mechanics of the process, McElroy said, “The

Web survey sits on a server. When somebody clicks alink to a survey, they first touch the server before theWeb application because the server has to interpret theindividual’s browser settings to serve up the applicationcorrectly. When the browser touches the Web server, ittransfers a whole bunch of information.”Elaborating on the de-duping algorithm detection process,

McElroy said, “The IP address isn’t necessarily all that unique.It is useful in cutting respondent duplication and even moreuseful in identifying whether that IP address is located in thecorrect geographic area. If you are expecting to survey some-one in Chicago, the IP address may come back from China.You know this is a cheater to dispense with. In fact, the biggestIP match problem comes from China, where a cottage indus-try has sprouted among individuals who are making money bysigning up to do surveys. It’s also the source of numerous ‘sur-vey bots,’ computer programs written specifically to completesurveys automatically by generating junk responses that fit theform on screen. The sophistication of some of these systems–including IP masking servers–makes an IP check one in a seriesof steps required to prevent fraud.”He added, “A ‘browser string’ contains all sorts of complex

data: the operating system, language settings, sometimes aproduct number and other things that are pretty uniquetaken one at a time–and extremely unique when viewed as agroup. Together, these provide a good digital fingerprint,”McElroy summed.“At a deeper level, we detect machine settings, including

each computer’s time–taken out to six decimal places–anddate. That clock setting, along with internal readings and abrowser string, placed in some sort of determinant p r o c e s s o rsoftware between the server and survey application, gives youa 99.9% chance of eliminating duplicates. It gathers de-dupingdata in real-time and instantaneously, with no respondentdelay. The minute the processor detects identical respondentvariables a second time, we stop the person’s second par-t i c i p a t i o n . ”McElroy stressed the importance of identifying duplicates

before any person takes a survey. “De-duping algorithms

R E S E A R C H B U S I N E S S R E P O R T

Socratic Technologies CEO Bill MacElroy

the ability to track them but some aren’t very vigilant, orthey only do it periodically. People who recruit for panelsusing mercenary incentives, like ‘get paid to take surveys,’receive the most duplicate registration attempts.So what does it cost to fix a problem that, if McElroy is cor-

rect, on average amounts to less than 1% of your sample? Aminute sum after the investment in building the capability.Full-service firms with proprietary de-duping capabilities gen-erally charge nothing or a very tiny fee for the service,

because its use has no incremental cost. “Once the technolo-g y is in place, it just runs itself,” McElroy said. “When we dis-cussed commercializing this, we were going to use other‘look up service rates’ at about $50.00 per 40,000. But evenif you charged $19.99 to screen 20,000 emails, there wouldbe sufficient margin to make it worthwhile.”For commercial providers, finding the right price point and

model for digital fingerprinting took some initial trial anderror and there was a period of wild price fluctuation. At onepoint, Peanut Labs was reportedly quoting as much as $2,500per 5,000 records for Optimus. The price they submitted toR B R in July reflected a significant decrease–$0.30 to $0.60 percomplete–and that figure has reportedly since dropped.The competitive marketplace and technology similarities

have created more client-friendly pricing. Commercial systemsare typically available on a monthly or annual subscriptionor license basis, and also on a “transactional” basis ( e a c htransaction amounting to every potential respondent who

passes through the server). Mktg Inc’s Gittelman told R B Rabout a recent bid against two other major services. Allthree bids approximated $100,000 for 20 million to 30 milliontransactions. “Depending on the incidence, the pricing s e e m sto be 2.5 cents per complete,” he summarized. “Ideally,we’re trying to land at $0.002 per transaction.”He said even at these low prices, it’s a worthwhile business.

“We believe in complete transparency. We don’t upchargefor this if you purchase programming or hosting from us. I

wouldn’t go into this if we carried a huge sales force and thiswas my core product. I don’t see it as sustaining,” he noted.“I think digital fingerprinting should be used by everyone,”

Gittelman added. “But I don’t feel the full burden should fallon the panel companies, nor should this become a tool ofthe giants. Many many smaller, middle-market researchcompanies wouldn’t design or build it for themselves, yetcould probably benefit from using it defensively.”

4

R E S E A R C H B U S I N E S S R E P O R T

Digital Fingerprinting AdjustabilityOf all the revelations from R B R ’ s digital fingerprinting review,

none was more surprising than the technology’s adjustment ofsensitivity for key data quality characteristics (i.e., criteria forflagging suspect respondents). Think of it like adjusting thesensitivity of an airport security system.Globalpark USA President Dan Coates recently reviewed com-

mercial digital fingerprinting technologies and found them compa-rable and accurate. “The services offer a clear, externally verifiableindication of sample quality,” he noted. “If it shows 1% of yourselected sample is suspect, chances are you’re using the rightpanel provider. If 14% of your sample is suspect, you may want toconsider another sample.” But, assessing sample based on percentages is debatable because

digital fingerprint scanning levels are adjustable. And metrics aresubjective; definitions are often dictated by a research agency.For instance, what does a study do with a multi-panel respon-

dent who turns up during de-duping? C&R Research EVP/CTOWalt Dickie said, “In a stand-alone survey, it may be moreappropriate to allow the ‘first in’ survey to remain.” Anothersource questioned the wisdom of accepting even a first responsefrom someone who attempts to take the same survey twice.Mktg Inc.’s Steve Gittelman suggested the industry consider

establishing standards for duplication rates, plus other issues.“When people talk about duplicates or speeders as though

there is a universal definition, they’re working from an inac-curate and dangerous assumption,” he asserted. “To believethe digital fingerprinting tool that registers the highestduplication rate is the superior tool is misleading. Someonewho thinks that duplication is rampant might say that 1%removal indicates your tool doesn’t work. A research exec-utive who doesn’t believe duplication to be a major issuecould say 7% removal is overreaction. “Mathematically, when you talk about duplication, it’s easy to

adjust the base that is the denominator. Frankly, to increase thenumber of duplicates, just push the data correction model intoa low-incidence, highly-targeted, geographically-limited zone.The tool is built to allow adjustment of the variables. A panelcompany stuck paying an incentive or geared to get high utiliza-tion wants that sliding scale to only knock out exact duplicates,whereas a client might have a more stringent suspect respon-dent identification consideration,” commented Gittelman.“At some point,” h suggested, “the industry is going to

have to seriously discuss what exactly a speedster is. Whatis the proper algorithm? What formula are we supposed touse? The problem is that so far this great new tool hasbeen driving the mindset of the market. Now, I think themarket is waking up and thinking, ‘Wait a second. This isjust a tool, and it is making decisions for us that we shouldbe making for ourselves.’”

Reproduced from the October 2008 issue of ResearchBusiness Report by RFL Communications, Inc. (Skokie,IL), publisher of Research Conference Report, R e s e a r c hDepartment Report and Pharma Market Research Report,three other MR newsletters. For more information,send an e-mail request to [email protected], visit ourwebsite: http://www.rflonline.com or call RFL at (847)673-6284.

research business report - sample frame experts

Documents