modeling tricks my relational database never taught me
DESCRIPTION
In this session we will explore several modeling scenarios from my own experience that can easily be achieved using RavenDB, but difficult (if not nearly impossible) to build using a classic relational database. The focus will be on helping those accustomed to SQL Server or other relational databases learn good document modeling skills by example, with a summary of document modeling guidelines at the end.TRANSCRIPT
Modeling Tricks My Relational Database Never Taught Me
David Boike
@DavidBoikemake-awesome.com
About Me
• Principal Consultant, ILM Professional Services• NServiceBus Champion• Author of Learning NServiceBus• Husband, father, geek,
amateur beer brewer• @DavidBoike• www.make-awesome.com
SQL Server
SQL Server
SQL Server
• SQL first appeared – 1974– Hard drive price: $185/MB, $189,000/GB
• Microsoft SQL Server 1.0 – 1989– Hard drive price: $7.48/MB, $7,659/GB
• Current Version: SQL 2012– November 2013: 0.004¢/MB, 4.096¢/GB
http://www.jcmit.com/diskprice.htm
Entity Framework
SQL Server
RavenDB
What is RavenDB?
• Fully transactional document database• Dead simple API
Store(object entity) Delete<T>(T entity)Load<T>(string id) Query<T>()
• Smart LINQ-powered client library• Safe by default• Powerful indexes• Replication/sharding• Fraction of SQL Server’s cost• Fraction of SQL Server’s requirements
RavenDB
Document Modeling
Bad Document Modeling
Take the red pill…
Inheritance
Oh boy, Com Sci 101 again!
Inheritance: FlatFlat
Id No
Type No
BaseProp1 No
BaseProp2 No
BaseProp3 No
DerivedAProp1 Yes
DerivedAProp2 Yes
DerivedAProp3 Yes
DerivedBProp1 Yes
DerivedBProp2 Yes
DerivedBProp3 Yes
DerivedCProp1 Yes
DerivedCProp2 Yes
DerivedCProp3 Yes
Column Name Nullable
Cliché Polymorphism Examples
• Animals (cat, dog, horse)• Cars (sedan, truck, minivan, SUV)– More broadly, machines (bike, motorcycle, car)
• Shapes (circle, rectangle, square)
Inheritance: Table per Class
AleBeerId
AleProp1
AleProp2
BeerBeerId
Type
Name
ABV
IBU
LagerBeerId
LagerProp1
LagerProp2
StoutBeerId
StoutProp1
StoutProp2
Inheritance: Table per Top-Level Class
Inheritance: Extended Attributes
AttributeNames:foo:S:0:11:intVal:S:11:2:
AttributeValues:Hello World42
Deserialized:foo=“Hello World”bar=“42”
Inheritance: XML Attributes
Raven Inheritancepublic class BeerListModel{ public string Id { get; set; } public List<BeerModel> Beers
{ get; set; }
public BeerListModel() { Beers = new List<BeerModel>(); }}
public abstract class BeerModel{ public string Name { get; set; } public decimal ABV { get; set; } public int IBU { get; set; }}
public class AleModel : BeerModel{ public string AleProp1 { get; set; } public string AleProp2 { get; set; }}
public class LagerModel : BeerModel{ public string LagerProp1 { get; set; } public string LagerProp2 { get; set; }}
public class StoutModel : BeerModel{ public string StoutProp1 { get; set; } public string StoutProp2 { get; set; }}
Raven Inheritance
Inheritance: Lessons
• Inheritance/Polymorphism in SQL Server sucks• In RavenDB, it’s no big deal• Key: With RavenDB we can model things just
like they exist in real life.
Hierarchy
SQL Hierarchy
SELECT *FROM CategoryWHERE SiteId = @SiteIdAND ParentId = @ParentId
SQL Hierarchy
SELECT *FROM CategoryWHERE SiteId = @SiteId
Raven Hierarchy #1public class CategoryTree{ // Sites/42/Categories public string Id { get; set; } public List<Category> RootCategories { get; set; }
public CategoryTree() { RootCategories = new List<Category>(); }}
Raven Hierarchy #1public class Category{ public string CategoryId { get; set; } public string Name { get; set; } // Other Stuff
public List<Category> ChildCategories { get; set; }
public Category() { ChildCategories = new List<Category>(); }}
Raven Hierarchy #1
Raven Hierarchy #2public class Category{ public string Id { get; set; } public string Name { get; set; } // Other Stuff}
public class CategoryTree{ public string Id { get; set; } public List<CategoryRef> RootCategories { get; set; }}
public class CategoryRef{ public string CategoryId { get; set; } public List<CategoryRef> ChildCategories { get; set; }}
// Don't forget default constructors!
Raven Hierarchy #2
Loading Hierarchy
• Method 1– One document, just load it
• Method 2– Multiple Documents– Normally to load within collections• .Include("RootCategories,CategoryId")• Does not work recursively• Works fine for one level
Loading Hierarchy
var tree = RavenSession .Load<CategoryTree>("Sites/42/CategoryTree");
// Recursively build list of ids neededstring[] ids = RecurseOnYourOwnTime(tree); // ;-)
var cats = RavenSession .Load<Category>(ids) .ToDictionary(doc => doc.Id);
Hierarchy: Lessons
• Model by Units of Change and Transactional Boundaries– Guided our choice of hierarchy implementation– Think about actors in your use cases
• Always initialize collections and child objects– Nobody likes a NullReferenceException
• We can have global “singleton” documents for configuration– Site/Configuration, Site/Categories, etc.– Great candidates for Aggressive Caching
Duplication
You want me to do what?
Wordpress ERD
Wordpress ERD
Comments
CommentMeta
Options
Posts
PostMetaTerm
Relationships
Users
UserMeta
TermTaxonomy
Links
Terms
Copy a Post
• 3 tables involved– wp_posts– wp_postmeta (17 types)– wp_term_relationships
• More if we want to copy comments– wp_comments– wp_commentmeta
Copy a Post
• Fairly simple example– New row in Posts
• Store new PostId
– New rows in PostMeta w/ PostId– New rows in TermRelationships w/ PostId
• What if we had to copy a table related to TermRelationships?• What if an object graph spans dozens of tables?• What if PostMeta contains hidden ids linking to other tables?• P.S. Revisions are stored as additional
records in Posts!
Raven Post Model Examplepublic class WordPressPost{ public string Id { get; set; } public string AuthorId { get; set; } public DateTimeOffset PostDate { get; set; } public string Contents { get; set; } public string Title { get; set; } public string Excerpt { get; set; } public PostStatus Status { get; set; } // etc. public List<string> Categories { get; set; } public List<string> Tags { get; set; } public List<string> Series { get; set; } public List<string> RelatedPostIds { get; set; } public List<PostLink> Links { get; set; } public List<PostMedia> Media { get; set; } // Comments?...}
Options for Comments
1. Add to WordPressPost :public List<PostComment> Comments { get; set; }
2. Document per comment3. Separate document for comments
• WordPressPosts/1• WordPressPosts/1/comments
4. Capped documents• WordPressPosts/1• WordPressPosts/1/comments/1• WordPressPosts/1/comments/2• WordPressPosts/1/comments/3
Capped Documents
Bounded Contexts Unbounded Contexts
Posts/1 Posts/1
Posts/1/CommentsPosts/1/
Comments/3Posts/1/
Comments/2Posts/1/
Comments/1
How to duplicate?
// Global.asax / App_Start configMapper.Initialize(cfg =>{ cfg.CreateMap<BlogPost, BlogPost>() .ForMember(p => p.Id, opts => opts.Ignore());});
Can also decorate model’s Id property with IgnoreMapAttribute but this leaks a dependency to AutoMapper in your models.
How to duplicate?
BlogPost post = RavenSession.Load<BlogPost>("BlogPosts/1");BlogPost dupe = Mapper.Map<BlogPost>(post);// dupe.Id == null
RavenSession.Store(dupe);// dupe.Id == "BlogPosts/2"
Why duplicate?
• User Duplicate function (duh)• Auditing/History– Posts/1/History/yyyyMMddHHmmss– Consider Versioning bundle for auditing EVERY document
modification• Preview
– Posts/1/Preview/ecbe7b49c3de4c8394880bdd81eeae97• Use Expiration Bundle
• Complex Edit/Publish/Cancel– Posts/1/Edit/davidboike
• Use Expiration Bundle
Duplication: Lessons
• Thinking about duplication can help to identify units of change, but can’t get us all the way
• Make use of semantic IDs• Duplicating data enables many advanced
scenarios• Think about possible document growth• Break down large documents within one unit
of change into slimmer documents
NServiceBus Saga Storage
Hey remember that book I mentioned?
NServiceBus/Sagas 101
• NServiceBus– Framework for creating distributed systems– Transactional message handlers w/ auto-retry– Publish/Subscribe
• NServiceBus Sagas– Correlated message handlers with shared state– Transactional state stored between messages– Similar in concept to ASP.NET Session State
Saga Storage Interfacepublic interface ISagaPersister{ T Get<T>(Guid sagaId); T Get<T>(string property, object value); void Save(IContainSagaData saga); void Update(IContainSagaData saga); void Complete(IContainSagaData saga);}
public interface IContainSagaData{ Guid Id { get; set; } string Originator { get; set; } string OriginalMessageId { get; set; }}
Saga Storage History
• NServiceBus 2.0– SQL Storage via Fluent NHibernate• It sucked
– Built my own based on XmlSerializer• That also sucked, just not as much
• NServiceBus 3.0– RavenDB Saga Persister became default– Villagers rejoiced!
Fluent NHibernatepublic class MySagaData : IContainSagaData{ public virtual int MessagesReceived { get; set; }
// Don't touch! NServiceBus uses these internally! public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
MySagaDataId
Originator
OriginalMessageId
MessagesReceived
Fluent NHibernatepublic class MySagaData : IContainSagaData{ public virtual int MessagesReceived { get; set; }
// Don't touch! NServiceBus uses these internally! public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
MySagaDataId
Originator
OriginalMessageId
MessagesReceived
Fluent NHibernatepublic class MySagaData : BaseSagaData{ public virtual int MessagesReceived { get; set; }}
public abstract class BaseSagaData : IContainSagaData{ public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
MySagaDataId
MessagesReceived
BaseSagaDataId
Originator
OriginalMessageId
Fluent NHibernatepublic class MySagaData : IContainSagaData{ public virtual List<string> MessageTypesReceived { get; set; }
// Don't touch! public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
Fluent NHibernatepublic class MySagaData : IContainSagaData{ public virtual List<string> MessageTypesReceived { get; set; }
// Don't touch! public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
Fluent NHibernatepublic class MySagaData : IContainSagaData{ public virtual List<ReceivedMessage> MessageTypesReceived { get; set; }
// Don't touch! public virtual Guid Id { get; set; } public virtual string Originator { get; set; } public virtual string OriginalMessageId { get; set; }}
public class ReceivedMessage{ public virtual Guid Id { get; set; } public virtual string MessageTypeReceived { get; set; }}
MySagaDataId
Originator
OriginalMessageId
ReceivedMessageId
MySagaDataId
MessageTypeReceived
Raven Saga Persistencepublic class MySagaData : ContainSagaData{ public List<string> MessageTypesReceived { get; set; }}
-OR-
public class MySagaData : ContainSagaData{ public HashSet<Type> MessageTypesReceived { get; set; }}
Back to ISagaPersister
public interface ISagaPersister{ T Get<T>(Guid sagaId); T Get<T>(string property, object value); void Save(IContainSagaData saga); void Update(IContainSagaData saga); void Complete(IContainSagaData saga);}
Need to be able to load a saga by any identifiable property that matches on the incoming message and saga data.
More Real-Life SagaShipping Saga:• Waits for OrderAccepted + OrderBilled events• Publishes OrderShipped event
public class ShippingData : ContainSagaData{ [Unique] public long OrderId { get; set; } public List<string> MessageTypesReceived { get; set; }}
UniqueAttribute• NHibernate persister would create a unique index on the column• RavenDB indexes are asynchronous and NOT consistent
– Consistency only guaranteed on Store/Load by ID
Pointer Documents// PubSub.Shipping.ShippingData/OrderId/{GUID-1}{ "SagaId": "{GUID-2}", "UniqueValue": 12345, "SagaDocId": “ShippingData/{GUID-2}"}
// ShippingData/{GUID-2}{ "OrderId": 12345, "Originator": "PubSub.Sales@ILM-LION", "OriginalMessageId": "{GUID-3}"}
OpenID/OAuth
• Login with multiple 3rd-party identities also requires pointer document modeling
• OpenID identity URLs make for horrible document IDs– Use the SHA1 hash of the login provider and login
identifier in the document ID instead.• Easy way with MVC5
– RavenDB.AspNet.Identity NuGet package– https://github.com/ILMServices/RavenDB.AspNet.Identity– At least look at the source and make fun of the author
(That’s me)
RavenDB.AspNet.Identity{ // ApplicationUsers/DavidBoike (Some boring properties removed) "UserName": "DavidBoike", "Logins": [ { "LoginProvider": "Google", "ProviderKey": "https://www.google.com/accounts/o8/id?id=AItOawnt..." } ]}
{ // IdentityUserLogins/95c8f50aa725d62af9c7e397ca3a89889dd76514 "UserId": "ApplicationUsers/DavidBoike", "Provider": "Google", "ProviderKey": "https://www.google.com/accounts/o8/id?id=AItOawnt..."}
NServiceBus: Lessons
• Document databases are ideal for storing random objects with little friction
• Unique constraints must be treated specially to ensure consistency– Custom pointer documents– Unique Constraints bundle for more generic
functionality (though I prefer not to overuse it)• Use RavenDB.AspNet.Identity for OpenID– Better yet send me a pull request!
Final Thoughts
Document modeling is not the same as SQL.
If you try to make it so, you will fail.
So don’t be that guy.
Final Thoughts
Above all else, think about how and why data changes.– Units of Change– Transactional Boundaries– Everything else is window dressing
Final Thoughts
Like relational modeling, document modeling is a skill that is acquired through experience– Model things like you would in the real world– Don’t be afraid of trying and failing– Create a /FakeData/Create action in your site to
bootstrap your database to make this easy– Especially in the early stages of an application,
experiment often and regenerate when needed
Questions?
@DavidBoikemake-awesome.com