creating a single view: data design and loading strategies
DESCRIPTION
Learn how to design a single view application and load your data into the application.TRANSCRIPT
Enterprise Architect, MongoDB
Buzz [email protected]
#ConferenceHashTag
Creating a Single View Part 2:Data Design & Loading Strategies
Who Is Talking To You?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that
• Over 27 years of designing and building systems• Big and small• Super-specialized to broadly useful in any vertical• “Traditional” to completely disruptive• Advocate of language leverage and strong factoring• Inventor of perl DBI/DBD
• Still programming – using emacs, of course
What Is He Going To Talk About?
Historic Challenges
New Strategy for Success
Technical examples and tips
Overview &Data Analysis
Data Design &Loading
Strategies
Securing YourDeployment
çΩ
Creating A Single View
Part1
Part2
Part3
Historic Challenges
It’s 2014: Why is this still hard to do?
• Business / Technical / Information Challenges
• Missteps in evolution of data transfer technology
A X
We wish this “just worked”
A
Query objects from A with great performance
Query objects from B with great performance
X
Query objects from merged A and B with great performance
B
…but Beware The Blue Arrow!
A X
• Extracting many tables into many files• Some tables require more than one file to capture
representation• Encoding/formatting clever tricks• Reconciliation• Different extracts for different consumers• Different extracts for different versions of data to same
consumer
Loss of fidelity exposedclass Product {
String productName;
List<Features> ff;
Date introDate;List<Date>
versDates;int[]
unitBundles;//…
}widget1,,3,,good texture,retains value,,,20142304,102.3,201401widget2,XS,6,,,,not fragile,,,20132304,73,87653widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,,widget4,,,3,4,,,,,,20040101,,999999,,
AORM
What happened to XML?
class Product {String
productName;List<Features>
ff;Date introDate;List<Date>
versDates;int[]
unitBundles;//…
}
<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…
çΩ
XML: Created More Issues Than Solved
<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…
• No native handling of arrays
• Attribute vs. nested tag rules/conventions widely variable
• Generic parsing (DOM) yields a tree of Nodes of Strings – not very friendly
• SAX is fast but too low level
… and it eventually became this
<p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” …<p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” …<p name=“widget3” ftxt1=“dense” idt=“20140203” …<p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” …
• Short, cryptic, conflated tag names
• Everything is a string attribute
• Mix of flattened arrays and delimited strings
• Irony: org.xml.sax.Attributes easier to deal with than rest of DOM
Schema Change Challenges:Multiplied & Concentrated!
X
Alter table(s)split() more data
AAlter table(s)Extract more dataLOE = x1
Alter table(s)split() more dataAlter table(s)split() more data
BAlter table(s)Extract more dataLOE = x2
CAlter table(s)Extract more dataLOE = x3
where f() is nonlinear wrt n
SLAs & Security: Tough to Combine
A
B
User 1 entitled to see XUser 2 entitled to see Y
User 1 entitled to see ZUser 2 entitled to see V
X
Entitlements managed per-system/per-application here….
…are lost in the low-fidelity transfer of data….
…and have to be reconstituted here…somehow…
Solving The Problem with mongoDB
What We Are Building Today
Overall Strategy For Success
• Let the source systems entities drive the data design, not the physical database
• Capture data in full fidelity
• Perform cross-ref and additional logic at the single point of view, not in transit
Don’t forget the power of the API
class Product {String
productName;List<Features> ff;Date introDate;List<Date>
versDates;int[] unitBundles;//…
}
If you can, avoid files altogether!
Haskell
çΩ
But if you are creating files: emit JSON
class Product {String
productName;List<Features> ff;Date introDate;List<Date>
versDates;int[] unitBundles;//…
}
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“unitBundles”: [1,3,7,9]// …
}
çΩ
Let The Feeding System Express itself
A
B
C
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”, “type”: “A” }]
}
{ “myColors”: [“red”,”blue”], “myFloats”: [ 3.14159, 2.71828 ], “nest”: { “as”: { “deep”: true }}}}
{ “myBlob”: { “$binary”: “aGVsbG8K”}, “myDate”: { “$date”: “20130405” }}
What if you forgot something?
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“versMinorNum”: [1,3,7,9]// …
}
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],
“coverage”: [ “NY”, “NJ” ],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“versMinorNum”: [1,3,7,9]// …
}
çΩ
The Joy (and value) of mongoDB
AAlter table(s)Extract more dataLOE = .25x1
BAlter table(s)Extract more dataLOE = .25x2
CAlter table(s)Extract more dataLOE = .25x3
Helpful Hints
Helpful Hint: Use the APIscurs.execute("select A.did, A.fullname, B.number from contact A left outer join phones B on A.did = B.did order by A.did")
for q in curs.fetchall(): if q[0] != lastDID: if lastDID != None: coll.insert(contact) contact = { "did": q[0], "name": q[1]} lastDID = q[0]
if q[2] is not None: if 'phones' not in contact: contact['phones'] = [] contact['phones'].append({"number”:q[2]})
if lastDID != None: coll.insert(contact)
{ "did" : ”D159308", "phones" : [ {"number”: "1-666-444-3333”}, {"number”: "1-999-444-3333”}, {"number”: "1-999-444-9999”} ], "name" : ”Buzz"}
çΩ
Helpful Hint: Declare Types
Use mongoDB conventions for dates and binary data:{“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}}{“dateB”: {“$date”:1400617865438}}{“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=", "$type" : "00" }
Helpful Hint: Keep the file flexibleUse CR-delimited JSON:
{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
…instead of a giant array:
records = [ { “name”: “buzz”, “locale”: “NY”}, { “name”: “steve”, “locale”: “UK”}, { “name”: “john”, “locale”: “NY”},]
Helpful Hint: A quick sidebar on jq$ cat myData
{ "name": "dave", “type”: “mobile”, "phones": [ { "type": "mobile", "number": "2123455634", "dnc": false }, { "type": "mobile", "number": "6173455634" }, { "type": "land", "number": "2023455634" } ] }
{ "name": "bob", “type”: “WFH”, "phones": [ { "type": ”land", "number": "70812342342", "dnc": false }, { "type": "land", "number": "7083455634" } ] }
(another 99,998 rows)
Helpful Hint: jq is JSON awk/sed/grep$ jq -c '.phones[] | select(.dnc == false and .type == “mobile” )' myData
{"dnc":false,"number":"2123455634","type":"mobile"}
{"dnc":false,"number":"70812342342","type":"mobile"}
…
$ jq [expression above] | wc –l
32433
$ gzip –c –d myData.gz | jq [expression above] | wc –l
32433
http://stedolan.github.io/jq/
Helpful Hint: Don’t be afraid of metadata
Use a version number in each document:{ “v”: 1, “name”: “buzz”, “locale”: “NY”}{ “v”: 1, “name”: “steve”, “locale”: “UK”}{ “v”: 2, “name”: “john”, “region”: “NY”}
…or get fancier and use a header record:{ “vers”: 1, “creator”: “ID”, “createDate”: …}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
Helpful Hints: Use batch ID
{ “vers”: 1, “batchID”: “B213W”, “createDate”:…}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
Now that we have the data…
You’re well on your way to a single view consolidation…but first:
–Data Work• Cross-reference important keys• Potential scrubbing/cleansing
– Software Stack Work
You’ve Built a Great Data Asset; leverage it!
DON’T Build This!
Giant Glom
OfGUI-biased
code
http://yourcompany/yourapp
Build THIS!http://yourcompany/yourapp
Data Access Layer
Object Constructon Layer
Basic Functional Layer
Portal Functional Layer
GUI adapter Layer
Web Service Layer
Other Regular Performance Applications
Higher Performance Applications
SpecialGeneric Applications
What Is Happening Next?
Access Control
Data Protection
Auditing
Overview &Data Analysis
Data Design &Loading
Strategies
çΩ
Creating A Single View
Part1
Part2
Securing Your Deployment
Part3