being rdbms free -- alternate approaches to data persistence
TRANSCRIPT
Being RDBMS FreeAlternative Approaches to Data PersistenceDAVID HOERSTER
About MeC# MVP (Since April 2011)
Sr. Solutions Architect at Confluence
One of the Conference Organizers for Pittsburgh TechFest
Past President of Pittsburgh .NET Users Group and organizer of recent Pittsburgh Code Camps and other Tech Events
Twitter - @DavidHoerster
Blog – http://blog.agileways.com
Email – [email protected]
Goals
To allow you to achieve a zen-like state by never having to decide between a left and right outer join
GoalsThat a non-relational solution may be considered an option
What are some areas of a traditional application that could be a non-relational solution
Introduce some non-relational tools
How those tools would be used in a .NET solution (CODE!)
Traditional ArchitectureData persistence is central to application
Generally monolithic
Jack of all trades; master of none
Traditional Architecture
ClientWeb
ServerApp
ServerData
Repository
App DataSessionCache (?)Full Text SearchAudit
Consider…An online employment application
Wizard interface, with 9-12 steps
Most data is 1:1 across steps, but some data is 1:many
How to best structure 1:1 data◦ 6-8 tables, linked by ID?
◦ Or one wide table with lots of nullable columns?
◦ What about joining?
How about 1:many data◦ Several tables with 1:* relationships, which also needs to be joined
Don’t forget searching!!!
Applicant
General
Disclosure
Attestation
Skills
EmplEducat’
n
Database ThawI'm confident to say that if you starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.
-- Martin Fowler (http://martinfowler.com/bliki/PolyglotPersistence.html)
Monolithic Data PersistenceProvides consistency, but…
Is it always best tool for all jobs?
Is it easy for prototyping / rapid development?
Consider◦ How data will be used
◦ What kinds of data you’ll have
Why Non-RelationalUse Case – Company Intranet / CMS
Overall object is a CMS-like app for a company’s intranet content
Usage is mostly read-only, with pages and attachments◦ Pages, attachments, searching, admin, etc.
Traditional database could be multiple tables with 1:1 relationships and some 1:many relationships
Lots of joins for a page
…or a single document
What if…We could break some pieces out◦ Flatten structures for querying
◦ Highly efficient search services
◦ Pub/sub hubs
◦ Remote caching with excellent performance
◦ Session management outside a DB for load balanced environments
How would app then be architected?
…but consider the costsLearning curve
Distributed systems
Compensating transactions
Consider this with◦ Data
◦ Searching
◦ Caching/Session
◦ Auditing
Data StorageTypically, RDBMS is the de facto standard◦ SQL Server
◦ MySQL
◦ PostgreSQL
◦ Oracle (Yikes!!)
But do you really need it?
Data StorageGet all the orders for user ‘David’ in last 30 days
SELECT c.FirstName, c.MiddleName, c.LastName, soh.SalesOrderID, soh.OrderDate,sod.UnitPrice, sod.OrderQty, sod.LineTotal,p.Name as 'ProductName', p.Color, p.ProductNumber,pm.Name as 'ProductModel',pc.Name as 'ProductCategory',pcParent.Name as 'ProductParentCategory'
FROM SalesLT.Customer c INNER JOIN SalesLT.SalesOrderHeader sohON c.CustomerID = soh.CustomerIDINNER JOIN SalesLT.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderIDINNER JOIN SalesLT.Product p ON sod.ProductID = p.ProductIDINNER JOIN SalesLT.ProductModel pm ON p.ProductModelID = pm.ProductModelIDINNER JOIN SalesLT.ProductCategory pc ON p.ProductCategoryID = pc.ProductCategoryIDINNER JOIN SalesLT.ProductCategory pcParent ON pc.ParentProductCategoryID = pcParent.ProductCategoryID
WHERE c.FirstName = 'David'AND soh.OrderDate > (GETDATE()-30)
Data Storage
Wouldn’t it be great if it were something like this?
SELECT FirstName, MiddleName, LastName, SalesOrderID, OrderDate,UnitPrice, OrderQty, LineTotal, ProductName, Color, ProductNumber,ProductModel, ProductCategory, ProductParentCategory
FROM CustomerSalesWHERE FirstName = 'David'
AND OrderDate > (GETDATE()-30)
Data StorageMaybe a document database can be of use
Number out there◦ MongoDB
◦ RavenDB
◦ Couchbase
Consolidated structures without relational ties to other collections
Object databases
Why Document DatabaseQuick prototyping
Application usage that lends itself to persisting objects
Consider usage of your data before using
Avoid “cool factor”
Consider performance◦ “NoSQL is so much faster...”
◦ Um, not always…
Looking at MongoDBServer can have databases
Databases contain collections (like a table)
Collections contain documents (like rows)
Documents can be structured, have hierarchies, indexes, primary key
Working with Mongo’s C# Clientpublic class MongoContext<T> : IContext<T> where T : class, new() {
private IDictionary<String, String> _config;private readonly MongoCollection<T> _coll;
public MongoContext(IDictionary<String, String> config) {_config = config;
var client = new MongoClient(config["mongo.serverUrl"]);var server = client.GetServer();var database = server.GetDatabase(config["mongo.database"]);
_coll = database.GetCollection<T>(config["mongo.collection"]);}
public IQueryable<T> Items {get { return _coll.FindAll().AsQueryable(); }
} }
Working with Mongo’s C# Client
Encapsulate my queries and commands
public class FindPageById : ICriteria<Page> {private readonly String _id;public FindPageById(String pageId){
_id = pageId;}
public IEnumerable<Page> Execute(IContext<Page> ctx){
return ctx.Items.Where(p => p.Id == _id);}
}
Working with Mongo’s C# ClientInvoke my query/command
public class TemplateController : MyBaseController {private readonly IContext<Page> _pageCtx;
public TemplateController(IContext<Page> ctx) : base() {_pageCtx = ctx;
}
[HttpGet]public IportalPageMetadata Section(String cat, String page) {
var id = String.Format("{0}/{1}", cat, page);
var thePage = new FindPageById(id).Execute(_pageCtx).FirstOrDefault();
...}
}
Working with Mongo’s C# Client
Writing to Mongo is just as simple...
[HttpPost]public Boolean Post(Page page){
var userId = await GetUserId();
new CreatePage(page, userId).Execute(_pages);
_searchPage.Insert(page);
return true;}
Evolving Architecture
ClientWeb
ServerApp
ServerData
Repository
SearchSome data (?)SessionCache (?)
Document Repository
Write
Query
SearchHow do you search?◦ LIKE ‘%blah%’ ?
◦ Dynamic SQL
◦ Full-Text
LIKE and Dynamic SQL can be quick to create◦ Tough to maintain
Full-Text gives power◦ Limited in search options
SearchNumber of search services out there like◦ Lucene
◦ Solr
Lucene is a search engine◦ Embed in apps
◦ .NET port (Lucene.NET)
Solr is search service◦ Built on Lucene
◦ Connect apps to it
Searching with SolrDisconnected from your application
Search content via HTTP REST calls
Can use SolrNet as a client◦ https://github.com/mausch/SolrNet
Document-based
Searching with Solrprivate readonly ISolrOperations<T> _solr;public SolrSearchProvider(ISolrOperations<T> solr) { _solr = solr; }
public IEnumerable<T> Query(String searchString) {var options = new QueryOptions() {
Fields = new List<String> {"title", "body", "lastModified" }.ToArray(),Highlight = new HighlightingParameters() {
BeforeTerm = "<strong><em>",AfterTerm = "</em></strong>",Fields = new List<String> { "title", "body" }.ToArray(),Fragsize = 100
}};var results = _solr.Query(new SolrQuery(searchString), options);return results;
}
Evolving Architecture
ClientWeb
ServerApp
ServerData
Repository
Some data (?)SessionCache (?)
Search Service
Query
Write
Document Repository
Write
Query
Session and Cache DataGenerally short-lived for users
Fairly static for cached data
Key/value stores can serve us well here◦ Redis
Redis has two good .NET client libraries◦ StackExchange.Redis
◦ ServiceStack.Redis
Using Redispublic class RedisSessionManager : ISessionManager {
private static ConnectionMultiplexer _redis = null;private readonly IDictionary<String, String> _config;
public RedisSessionManager(IDictionary<String, String> config) {if (_redis == null) {
_redis = ConnectionMultiplexer.Connect(config["session.serverUrl"].ToString());}_config = config;
}public async Task<Boolean> CreateSessionAsync(String portalId, String userId, String fullName) {
var time = DateTime.UtcNow.ToString();var timeout = _config.ContainsKey("session.timeout");
var vals = new HashEntry[] {new HashEntry("userid", userId), new HashEntry("login", time),new HashEntry("lastAction", time), new HashEntry("fullName", fullName)
};
await RedisDatabase.HashSetAsync(portalId, vals);return await RedisDatabase.KeyExpireAsync(portalId, TimeSpan.FromMinutes(timeout));
}}
Using Redis
public async Task<Boolean> ExtendSessionAsync(String portalId) {var timeout = _config.ContainsKey("session.timeout");await RedisDatabase.HashSetAsync(portalId, "lastAction",
DateTime.UtcNow.ToString());return await RedisDatabase.KeyExpireAsync(portalId,
TimeSpan.FromMinutes(timeout));}
public async Task<Boolean> ExpireSessionAsync(String portalId) {return await RedisDatabase.KeyDeleteAsync(portalId);
}
Using RedisAt login (to stick session id in a cookie):
await Session.CreateSessionAsync(userId, fullName);
Upon log out:
await Session.ExpireSessionAsync(sessionCookie.Value);
Evolving Architecture
ClientWeb
ServerApp
Server
Data Repository
Some data (?)Search Service
Query
Write
Document Repository
Write
Query
Session/Cache
Service
Why Data StoreWe’re left with a database with not much use◦ Transactional data in document store
◦ Search documents in Solr
◦ Session, caching, etc. in key/value or caching service like Redis
What it probably ends up acting as is…
Evolving Architecture
ClientWeb
ServerApp
Server
Event Store2-3 flat tablesEvent data
Search Service
Query
Write
Document Repository
Write
Query
Session/Cache
Service
Queue?
(D)Evolved Architecture
Client
Web Server
App Server
Event Store
Search Service
Query
Write
Doc Repo
Write
Query
Session/Cache
Service
Queue?
(D)Evolved ArchitecturePick and choose what components work best
Don’t use them just to use them
Proof-of-Concept / Prototype
Why look to be RDBMS freeSearching◦ More than just full-text needs
Data◦ Choose a system that you can model the business
◦ Not the other way around
Caching / Session Values / PubSub◦ Offload necessary?
◦ Ensure performance
Maintenance and support big factors to consider
Consider data usage/architecture before just jumping in
ToolsMongoDB◦ http://mongodb.org
◦ RoboMongo http://robomongo.org
◦ Perf Best Practices http://info.mongodb.com/rs/mongodb/images/MongoDB-Performance-Best-Practices.pdf
◦ Operations Best Practices http://info.mongodb.com/rs/mongodb/images/10gen-MongoDB_Operations_Best_Practices.pdf
Solr◦ http://lucene.apache.org/solr/
Redis◦ http://redis.io/
◦ Redis Manager http://redisdesktop.com/