i've always wanted to data model - data week 2013
DESCRIPTION
One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.TRANSCRIPT
![Page 1: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/1.jpg)
I’ve Always Wanted To Data Model
Ian Varley, Salesforce.comData Week, 2013-10-02
Lightning Talk (10 minutes)
![Page 2: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/2.jpg)
Who am I?Ian VarleyAustin, TX
Salesforce.comBig Data Team@thefutureian
![Page 3: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/3.jpg)
What’s Data Modeling?
![Page 4: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/4.jpg)
The act of taking the intelligible structure of the world around us, and
making it concrete enough for computers to act on it.
(More specifically, data modeling usually has to do with storing it in a database.)
![Page 5: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/5.jpg)
Traditionally, data modeling has meant Entity Attribute Relationship
modeling techniques.
There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
![Page 6: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/6.jpg)
Many a project was sunk due to shitty data modeling.
![Page 7: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/7.jpg)
It’s a difficult occupation.You have to be part engineer, part psychologist, and part philosopher.
![Page 8: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/8.jpg)
If you’re doing it, you’re not alone.Lots of smart folks think about this stuff.
(David Hay, Steve Hoberman, Joe Celko, many more.)
![Page 9: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/9.jpg)
But.
![Page 10: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/10.jpg)
The expressive power of our conceptual modeling techniques hasn’t
improved much since the 1970s.
We mostly look at the world in the same static way we did 40 years ago.
![Page 11: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/11.jpg)
Partly, this is because our discipline is wedded to relational (SQL) DBs.
When the only tool you have is a hammer ...
![Page 12: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/12.jpg)
A book that opened my eyes ...
(He said a lot of the stuff I’m about to say back in 1978!)
![Page 13: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/13.jpg)
I don’t have a lot of answers.But I want to raise some questions.
And hopefully, start a conversation.
![Page 14: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/14.jpg)
Here are 5 observations about the tools of traditional data modeling.
![Page 15: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/15.jpg)
#1: nobody actually knows what an “entity” really is.
![Page 16: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/16.jpg)
“Entity” is another word for Category, in linguistics terms.
And an important property of linguistic categories is that they are slippery.
See:● Steven Pinker: The Stuff Of Thought● Douglas Hofstadter: Surfaces & Essences● George Lakoff: Women, Fire, and Dangerous Things
![Page 17: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/17.jpg)
part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy
part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
![Page 18: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/18.jpg)
And if you think you can “solve” the problem, I’ve got some world trade
center insurance policies to sell you.
![Page 19: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/19.jpg)
That said, there are a couple tools we could adopt that would help:
● First-class Sub- / Super-Typing● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
![Page 20: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/20.jpg)
#2: entities, attributes, and relationships are really the
same thing, maaaan ...
http://the-hippie-portfolio.tumblr.com/
![Page 21: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/21.jpg)
Say I’ve got a “parent” in my model.
Is it:● A “parent” entity?● A “person” entity with
an “isParent” attribute?● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is arbitrary.
![Page 22: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/22.jpg)
The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that
abstractly about most software.
![Page 23: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/23.jpg)
Normally, we make the choice based on our experience and gut feeling, and
pretend there’s a science to it.
![Page 24: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/24.jpg)
But the whole way of thinking is a convenience based on “records”.
![Page 25: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/25.jpg)
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
![Page 26: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/26.jpg)
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
![Page 27: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/27.jpg)
I have no idea what to do about this.
Tools that allow you to view any part of your model in any of those ways?
![Page 28: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/28.jpg)
This isn’t realistic with today’s tools, so this is just idle speculation.
![Page 29: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/29.jpg)
#3: prescriptive models encourage black & white thinking in a gray world
![Page 30: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/30.jpg)
You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
![Page 31: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/31.jpg)
This is a strength of (some) NoSQL databases: you can do data first, and
surface structure later.
![Page 32: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/32.jpg)
Sometimes the deep structure is actually ambiguous.
![Page 33: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/33.jpg)
![Page 34: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/34.jpg)
This can apply broadly.(What if an employee isn’t really “in” a department, but has
flexible membership based on where she spends her time?)
![Page 35: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/35.jpg)
You can represent that in a traditional data model, sure.
But you’re not encouraged to.
![Page 36: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/36.jpg)
#4: static models make the time dimension unwieldy
![Page 37: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/37.jpg)
Entity models are generally silent on the ways data changes.
![Page 38: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/38.jpg)
Many modern databases can keep older versions of objects.
But should they? For which entities How many versions? etc.
![Page 39: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/39.jpg)
Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old
model was?
![Page 40: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/40.jpg)
As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
![Page 41: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/41.jpg)
#5: boxes & lines aren’t how we actually think
![Page 42: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/42.jpg)
Our spatial processing of diagrams doesn’t map well to our temporal,
spatial, and causal comprehension of data structure.
![Page 43: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/43.jpg)
What do people really do?
Skip making models when their models look too complicated.
![Page 44: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/44.jpg)
![Page 45: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/45.jpg)
F*** THAT NOISE.
![Page 46: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/46.jpg)
Is there an alternative? Not yet.
![Page 47: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/47.jpg)
What could move the needle?● Prototype based modeling● Proper scoping● Semantic zooming
![Page 48: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/48.jpg)
The map is not the territory.
![Page 49: I've Always Wanted To Data Model - Data Week 2013](https://reader033.vdocuments.us/reader033/viewer/2022052905/558437c7d8b42ad8268b48aa/html5/thumbnails/49.jpg)
In conclusion … if you dig this stuff, let’s talk!
@thefutureian