10:30 vision: pillai - intuit payment graph
DESCRIPTION
You are familiar with "friend of a friend" relationship from Facebook and you have seen the "recruiter-prospect" graph at LinkedIn. Here is a very different graph auto-discovered from payment transactions - "payer-payee" i.e. who has paid whom, what amounts and how frequently. The Nodes are businesses in the US or Individuals like you and me and the links connecting them are the real payment transactions. The strength of the connections is defined by the volume of transactions happening between the entities. Come and listen to this talk to know more about the challenges we had to overcome, the algorithms we came up with, the insights derived from the analysis and how we support real time APIs over RESTful endpoints to integrate with web applications. We will also go over what are some other technologies we considered and why we picked Neo4J. Gokuldas Pillai (Lead Engineer at Intuit) I am a technologist leading the graph team at Intuit in the Business Intelligence organization. I have spent most of my career building systems - mostly business software that deal with high volumes of transactions. More recently, I have been experimenting with BigData solutions like Apache Hive,Hbase, Netezza and Neo4J. At Intuit, we are building the next big graph - the Intuit Payment graph which has real impact on who is paying whom, at what frequency and how that impacts the larger business community.TRANSCRIPT
Commercial Graph at IntuitGokuldas Pillai
Engineer, Data Services, Intuit@gokool
Improving the lives of 60M people
…creates a unique and compelling set of data
1 in 3Tax Returns
1 in12Americans
Pay
$2.6Tin Transactions
25 MillionQuestions Answered
1 to 50Apps
From
7 MillionMobile Customers
45M Customers Using Connected Services
Is it time to hire?
Small Business Hiring Trends
My revenue increased
5%...is that good?
Revenue Comparisons
Am I spending more than my
friends?
Spending Profiles
Auto $750
Rent $1,200
Groceries $400
Intuit Payment Graph
• Discover the latent network from multiple product data-stores– Uniquely identify entities and their connections– Connections scored by volume of trade
• Empower Business Unit (BU) teams to leverage the Intuit Payment Graph to build applications.– Graph to be available for real time access
The Graph Server provides rich profiles
IdentityName
AddressPhoneEmail
Mint IdEtc.
SocialFacebook
YelpTwitter
Etc.
DemographicsAge
GenderEtc.
Consumer Profile Facets
IdentityName
AddressPhoneEmail
QBO IdEtc.
SocialFacebook
YelpTwitter
Etc.
FirmographicsCategoryRevenue
EmployeesEtc.
Business Profile Facets
And the buyer-seller relationships
May 20113 purchases$650.25
May 20111 purchase$25.95
Consumer
Business Business
Design
Fuzzy matching & de-duplicating entities
ID: 002114902Name: The Windsor-Press IncStreet: 6 N 3rd StCity: HamburgState: PAZip: 19526-1502Phone: (610)-562-2267
Company ABC
name: The Windsor Press, Inc.address: PO Box 465 6 North Third Streetcity: Hamburgstate: PAzip: 19526phone: (610) 562-2267
name: The Windsor Pressaddress: P.O. Box 465 6 North 3rd St.city: Hamburgstate: PAzip: 19526-0465phone: (610) 562-2267
Company PQR
Dun & Bradstreet
Both of the above vendor records map to external reference data:
Commercial Graph Architecture
Business names, address, phone, industry code
Real-time Applications
Request
Response
De-duped Nodes
Transactions
Invoices, bills, payments, vendors, customers
Categorization
Matching/De-duping
Offline analytics
Data Model
CompanyName: Acme IncZip: 95134…
CompanyName: Veva LLCZip: 94040…
ProductName:Quickbooks…
ProductName:Payroll…
Relationship:CUSTOMERTxn Count: 125No. of years:1
Relationship:LICENSEDNo. of years:8
CompanyName: Beta LLCLocation: 94043…
Relationship:CUSTOMERTxn Count: 467No. of years:3
Data-model Demo
Scale
• Size of the graph– 29 Mn Unique Nodes– 315 Mn Properties– 48 Mn Relationships
Referrals & recommendations
Connecting consumers with
small businesses
Small business micro-communities
Big Data
for the Little Guy
Usecase - Vendor Recommendation
START n=node(23539) MATCH
n-[:PAYS]-v-[:PAYS]-vov WHERE
has(vov.IC4_DESC) AND vov.IC4_DESC =~ 'Legal.*' AND not (ID(vov) = ID(v))
RETURN ID(vov),vov.ENTITY_TYPE,vov.CITY?,vov.IC4_DESC?
ORDER BY vov.loyalty;
Why Neo4J
• Java – matched in-house skills• Flexible/Supports quick exploration• Easy admin functionality – set-up, adding data• Built in access points over HTTP (REST/JSON)• SQL-like Query language (Cypher is awesome!)• Active mailing list• Good documentation• Vendor support
Neo4j for real-time graph applications
18
Cypher Query LanguageSTART biz = node(100) MATCH biz– [TRANSACTS]- x RETURN x
Great for… Opportunity Areas…
Real time
Cypher
Built-in Algos
Lucene search
Horizontal scaling
Access control
Indexing
Experiment. Measure. Pivot.
Persevere.
Privacy matters…a lot.
Build the right team.
Team
• 2 Engineers (100%)• 2 Data Scientists (50%)• 1 Product Manager• We are hiring Data Engineers ! – http://careers.intuit.com/professional
Thank you.