GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Post on 10-Apr-2017
FASTER LAP TIMES WITH NEO4J Srinivas Suravarapu Chief Architect Scribestar Ltd @srinivas_s Scribestar A content collaboration platform for the legal community. The solution is targeted at lawyers and how they draft legal content. From an information systems point of a view it is a collaboration platform and is concerned with getting related users changing related content to work effectively. We are about 20 people and the platform is built on on .NET and uses Neo4j as its primary store. (now) Relational Stores Looking back at some Content management systems. They tend to base themselves on a Relational DBMS and serve content using BLOBs As content grows, a monolithic store pretty quickly starts affecting users ability to perform functions which do not have anything to do with content itself. BLOBs are good for a few pieces of content but when all you have is content, you have to go back to the drawing board on where you store it. NOSQL Stores Some have managed to use document oriented databases Able to serve large content to the web quickly. When you combine content in some form of ML format with fragments of relational data inevitably present in every system, they rely heavily on how you model your aggregates. Working with multiple aggregates sitting behind service boundaries tend to bring up consistency issues, and we tend to offload a lot of implementation complexity to the application tier. Polyglot Persistence Using multiple storage technologies to store the information is inevitable. The type of data and how its consumed by parts of your application should be the driver to choose your data store. We store our relation information using a graph and store content in a file store. The ability of the user to collaborate effectively on the content is isolated and not affected by users collaborating on the metadata. Overview Wish list Constantly being compared to the capabilities of desktop publishing tools Should be fast and secure We cannot loose content Corruption of content is not an option Everyone should be able to change the same piece of content at the time (You Only think you need it) Modeling the domain Collaboration Administration Versioning Modeling the domain Stay as close as possible to the domain and let the graph be a reflection of the users actions in the system over time Bounded contexts still apply and the rules of how you share information between two aggregates remain, think how you can have multiple graphs that are smaller. Keeping the graph acyclic and directed keeps it simple however this is entirely based on the context of your problem. Code Tips Principles of how to interact with a database havent changed The notion of using parameters, indexes or constraints exists. Dont read the same information repeatedly if it doesnt change Writes, the principles of concurrency haven't changed. Watch out for queries that are reused, its easier to write separate queries for separate concerns, duplication is fine. The unexpected side effects of query reuse for different concerns turns out to be a killer. Use the profile and explain options to analyse your queries Cypher Tuning Neo4j Switch query logging on to capture slow running queries, threshold is subject to what you want Switch metrics to be output to graphite or CSV files Have a suite of tests which run regularly and test concurrency and load. Use the feedback and tweak any slow running queries. Repeat the exercise until you dont find any queries being written into the querylog, that should ensure you have fast queries Always look at getting to the node you are interested first like SELECT on SELECT in SQL , MATCH on MATCH is effective. The business benefits We built our new solution in 8 months, compared to the former that was built for about 2+ years We did this with half the size of the original team The system is at least X times quicker than where it used to be, where X is a two digit number J The complexity of work has reduced Indicators Team hasnt come up with I dont know how big this is in a while With the definition of the cycle time widening, the cycle times have dropped for the same complexity The business benefits The complexity of work reduced With the definition of the cycle time widening, the cycle times to drop the same level of complexity reduced significantly. Being able to visualize the data real time provides valuable analytics for the user Cypher is absolutely powerful in its ability to get you to where you need to on the graph Reduces the need to understand the implementation immediately to some degree. Moving into the future Visualizing the information using tools to get some insight into user behavior, this will help us evolve the product. Some of the principles we have used should help us scale out without contention famous last words , remains to be seen. Who knows we may be able to store large files in hybrid technology of Neo and something else, alternative stores like Riak or any self hosted S3 styled products Would be great to have a light-weight Linkurious plugin on the neo dashboard Precedents and taxonomy subject to research The agility you obtain using a graph is great, changing the underlying model is no where near as painful or dreadful, the value of visualization simply exceeds any cost involved in the transition See Some Code?