dropping the people’s elbow - domo · 2019-04-17 · • 60+ marketo (map) users | 1.5k...

Post on 09-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dropping the People’s ElbowEmpowering new Domo users with python integration

M. FoleySenior Manager, Marketing Operations

Korn Ferry International

WELCOME

Agenda• Background - Why Marketing, Why Python?

• Democracy: not dead yet– Native scripting lets eager departments punch above their weight

• Sample project: Boss normalizations for non-coders

– Hijack Domo ETL components as mini-user interface– Step through simple normalizations (eg, title case)– Modify code for more advanced use-cases (replace dictionaries)– Setup Python tiles to archive and shape data for card building– Let users take the wheel

• Advanced applications – The $45M fuzzy match-ocalypse

Overview

Korn Ferry in Numbers

Korn Ferry - Marketing Operations• 60+ Marketo (MAP) users | 1.5K Salesforce (CRM) users

• 3 B2B business units | 2 B2C - diverse stakeholders and buyer personas

• Cross-platform marketing programs: ads, social, website, events, podcasts, reports, etc

• Marketing Ops:– Manage tech stack so people can focus on their day jobs– Help connect individual platform data to broader business analytics

(esp. sales and revenue data)

• Why Marketing, why Python?

No ones does dirty like Marketing

No ones does insights like Marketing• What do customers think about our brand and products?

• Do people hate our value proposition or just the method of delivery?

• Where do they see us in relation to our competition?

• Where are users engaging? What is influencing their decisions?

• What is our customer lifetime value?

• How many people read that random blog post I insisted we blast out? (answer: everyone)

Many great data sources, all the same story…

• Rich insights – but lots of baggage

• Overly complex tech stack

• Dirty data

• No developers

• No development environment (SDK, etc)

• Eager users, but generally disconnected from broader company reports and metrics

Marketing Metrics and ROI - Sample Questions• Is engagement up or down?

– Connections across platforms: social, website, events, webinars, ads, etc– What about buyer persona engagement?

• What is our database health?

• What is our progress in growing the mkt pipeline for new industries?

• What mkt efforts are most associated with sales wins? Which should be cut?– What countries/regions show potential for cross-sells or new deals? – Do any show signs of fading based on marketing data?

Onboarding New Data Sources / Departments• All data science begins with cleaning

• Many tools in the stack lack basic normalization and data wrangling– Out-of-box is never good enough

• Empower users to own and run their data cleaning and manipulation

• Help connect them to broader business reporting

• Let users surprise you (fuzzy what?!?)

• ….But don’t let them hijack all your time!

Empower Users

Sample Project:• Normalize data for N columns

against list of normalized values – determined via user input

• Archive and shape data to enable streamlined Domo card building from same source dataset

Empower Users• Power of python/pandas

– Doesn’t care about name or number of columns to normalize– Doesn’t care about what values the user designates “acceptable” (can be 5 or

500)– Users can add/delete columns or values every day – no errors thrown!

• User can maintain via Domo webform, ‘Select Columns’ tile, etc– Leverage domo ecosystem to create mini-user interface (zero coding)– Avoids constant troubleshooting by admins/developers

• Scalable for massive data storage and reporting

• With Python, endless ways to improve and apply same core concepts

And use just the basics! – Python|Pandas• Loops vs. Pandas/NumPy vectors

• Boolean indexing

• List of lists; lists of data frames; dictionaries

• Groupby/Grouper objects

• Multi-indexing

• List comprehensions

• Made for dirty data

Disclaimers• A quick word on SQL

– This is not a SQL vs. Pandas/Python presentation

– Python data structures and inputs lend themselves to user-facing apps

– Sample code can be replicated in SQL, but often more clunky or fragile for this task

– Emphasis on this task - not “SQL is fragile”__________

• We may move fast, but all code available in appendix!

Step 1: Create an interface for basic normalizations

Project 1: New Interactive Content Marketing Tool• Janelle, Digital Campaign Manager, Marketing Ninja

• Runs the new landing page optimization tool the company needed yesterday

• Integrates directly with social and record data enrichment

• Exists outside the main website CMS and MAP tool

• Direct MAP integration requires development and timeline unclear

• Contract renewal forthcoming, with potential expansion

Start with the basics: Demographic persona (later tie to behaviour/engagement)

Core buyer demographics:− Job level− Job function− Industry− Country

Fire up the Domo connector!

How are we engaging with buyer personas?

What is our database health?

What is our progress in growing the mkt pipeline for new industries?

Target BI Report: Finance Contacts in Tech Industry, China

Job Function

Job Function – [womp, womp]

Long tail of doom

Industry

Industry by Job Function

What is our progress in growing the mkt pipeline for new industries?Look no further

???

Simple Normalizations• Ask department to input list of acceptable values (for each target

column/dimension)

• Make all values title case

• Evaluate if they comply with acceptable list values– If not, flag as “Uncategorized”

Domo Webform - Acceptable Values by Column/Field

Normalize columns

webform

Acceptable Values – List of Lists (input via webform)

k[0]

k[1]

Read in lists; Loop through all user-designated columns*gotcha: try list hack if

df.notna() mask causing trouble in tilev1.Pandas methods

v2.Java-ish (for reference)

#with python, can insert any string manipulation you want…just do it via Pandas/vector

#Boolean mask

#catch-all

Read in lists; Loop through all user-designated columns

Output - Industry

Industry (non-‘Uncategorized’ catch-all)

Step 1.2: Dictionary Replacements

Normalize 2.0: Dictionary replacements• Slay the long tail of doom

– Normalize words that contain a list of substrings (eg, bank, financ*, mortgage)

• Generally for more advanced users (after mastering Norm 1)

• Start with simple, streamlined rules: – eg, all “contains” logic; ignore uppercase; order of values does not

matter

Normalize 2.0: Nested Dictionary• Sample dictionary

– {Target Column : {Target Normalized Value : [list of values to search for] } }

– {Job Function : {Financial : [‘financ’, ‘banking’, ‘venture cap’…] } }

• Revisit webformk[0]

k[1]

Normalize 2.0: Nested Dictionary• Many ways to get there. Let’s go literal left-to-right:

Outer Dict Keys

Inner Dict Keys

Inner Values [List]

Normalize via Replace Dict

#<< beware user input error in webform

#<< same Boolean mask structure

{Target Column : {Target Normalized Value : [list of values to search for] } }{ x : { k : [ v ] } }

**Different parameters available for df.replace vs. df.str.replace; can also pass the nested dictionary, but may perform slower (see Pandas Mixer in appendix)

Output - Industry

Output - Industry

Update dictionary list. Rinse, repeat

Pandas Mixer• Nested dictionary helps

– Maintain sanity when revisiting– Allow flexibility to experiment with Python/Pandas options for

readability and performance– Mix and match parameters

• Many ways to win– Can offer different ways to structure non-developer user inputs

Project 2: New Event Management Tool• John, Events Manager, Marketing Ninja

• Owns “open” training and workshop events

• Attendees may or may not be customers (50% walk-ins)

• Not connected to CRM or MAP

• Events are expensive and drawing leadership’s attention

• Boss wants a breakdown of 20 profile segments to showcase reach beyond traditional marketing base– Little overlap with Janelle’s and almost double the permutations

No problem – same setup ready to go

New webform(new permutations)

Step 1.3: Helping the source database

Step 1.3 - What about source database?• Normalizations good for Domo, but what about source? Janelle wants

to find the leaks

• Python tile: – Flag and save erroneous data prior to normalization to track in source– Add helper columns and links to bad records for extra finesse

• All the works is already done. Just need to save the first evaluation of the full list of lists (code in appendix)

• Create second output that users can download – “Bad Values Dataframe”

• Reap rewards of charm offensive when onboarding new departments

#Boolean Mask

#store frame of bad rows for EACH column

#concat all frames into one

Modify main loop – build array of bad value data frames[FULL CODE IN APPENDIX]

#example illustrates passing list of dataframes to pd.concat; if # of ‘badValue’ rows too large for memory, append in-place via loop

Demo Output - Table

Can also visualize source data

integrity

Other Ideas – Audit Table for UsersCan control for all of these….or not (depends on users and your time)

<< spelling error

<< “contains vc” too aggressive?

<< pattern too specific?

Place norm values in new column

Meanwhile, back in Events…

Not all users are created equal• Let’s get John back to what he does best

• SOS: Drop the permutations: – 10 profile segments Marketing/Janelle uses seem just fine

• Easily clone setup from one department to another

• Bad users often relieved to be locked out (!)

Recap: Normalizations for non-coders• Created ETL section for users to set normalization values and

replacements

• Flagged problem areas in source database

• Zero upkeep – all inputs are controlled outsideof Python tile

• Cloneable template that applies to all data types

• Helped users own their data cleaning

• Made some new friends

Step 2: Shaping the data for card building

Desired Output

Option 1: Column Counts(less data – SEE APPENDIX)

Option 2: Counts filtered by multiple columns (more data)

One count columnDesired output for card building(all on same dataset)

All Columns of interestDesired Input for card building

Multi-index pandas series

Grouped Counts

Sample multi-index output

Option 2: Grouped Counts

Use pd.Dataframe.fillna

(outside of Domo Python tile)

Benefits• Shakes off any user wrench

– Drop segment from buyer personas - acceptable value list– Drop entire dimension (Job Title) or add new one – acceptable value column– Compare to SQL

• pd.concat will accept and create new columns on the fly

• Can support historical data across N dimensions of any size

• All cards powered by same dataset – keeps Domo tidy for users and admins

• Streamlined card building that all follows same format

Results of grouped columns – 346K rows

Identical to full archive – 2.1M (scales)

Python flexibilityMarketing-Ops –New value added to acceptable list

Python flexibilityNumber of Employees

-new

dimension/column added in Dec

Advanced Applications: Internal data

Advanced Applications – Data generated internally• MAPs and (esp) CRMs require timely action and input across multiple

stakeholders

• Lots of highly customizable objects, naming conventions (or lack of), workflows, user roles and responsibilities, etc

• No two are alike – configured specifically for your business

• Complete nightmare when they go south, but rains $$$ when it all comes together

• Everyone wants the ROI for reporting – but are often locked out

• Ripe for advanced data wrangling and fuzzy matching in warehouses

Fun with marketing attribution

*images from Steve Patrizi (blog); leadmd.com; salesforce.com

Fun with marketing attribution• How are these codified in the CRM and MAP? Were they reshuffled in

the latest rebrand?

• Are all users following a naming convention or tagging system? (hint: no)

• Are users entering key milestones as they are occurring? (event attendance; opportunity open; stage change; close-won)

• Are there ANY open-text fields for internal users?

• Are there ways for internal users to bypass the “proper” process?

• Time for Python fuzzy matching!

Advanced Applications• Code gets more dense, but all just variations of the sample

Normalization project above

• Key difference is not the Domo admin – it’s savvy Domo users– No technical coding, but deep logic, analysis, and patience– Strong command of business offerings and operations

• Not for everyone. Scaling the sophistication of normalizations (and similar flows) helps self-select

• $45M and beyond: Start small and build traction with reports in broader business ecosystem

Takeaways - Design• Build user inputs around core Python tile

• Design from desired Python data structure out

• Leverage Pandas for speed and flexibility– Anything delicate WILL break!

• Keep rules simple (ignore case; all “contains” logic; etc )– Infinite options with Python, but too many weigh down users and

code

Takeaways - Culture• Help users own the data wrangling process

• Keep scope/expectations narrow. Goal is entry to broader BI system—world domination later

• Build for your savviest users first [<- not necessarily most technical]

• Let users share the love • No two departments are the same….until they are…• [RIP John Domo ETL license]

• Get creative with Python• Your users are more creative than you (and often closer to the

business)

QUESTIONS?

THANK YOU

Appendix

1.1: Read in lists of lists; Loop all user-designated columns

#<< sample param you can also encode/pass via webform. get creative!

#<< no shame in println statements #safeSpace

#with python, can insert any string manipulation you want…just do it via Pandas/vector

#Boolean mask

#catch-all

1.1: Call simple normalization function

1.2: Normalize via Replace Dict

#<< watch out for user input error and crazy outer-dict keys

{Target Column : {Target Normalized Value : [list of values to search for] } }{ x : { k : [ v ] } }-restrict df.replace to target column

-series.str.replace only accepts series (obvi)-can pass nested dict to str.replace w/o loop, but slower

Pandas Mixer (as of v.0.23.4)

• Nested dictionary helps– Maintain sanity when revisiting– Allow flexibility to experiment with Python/Pandas options for

readability and performance

#Boolean Mask

#store dataframe of bad rows for EACH column#concat all frames into one

1.3: Modify main loop – build array of bad value data frames

#capture entire row with invalid column data (for reference)#create new column to identify target column in result

1.3: Bad Values Frame: Boolean mask strikes again

Add other helper columns

#invalid because of case only?*note gotcha on astype(str)

#link to record (read in base url)

Help users interpret the output

One count column

Desired output for card building(all on same dataset)

All Columns of interestin one

Card Building, Option 1: Stacked Counts (aka ‘simple pivot’)

Date stamp on append

Option 1: Stacked Counts (aka ‘simple pivot’)

top related