detailed look at consistent nosql data storage with modeshape
DESCRIPTION
ModeShape 3 is an elastic, strongly-consistent hierarchical database that supports queries, full-text search, versioning, events, locking and use of schema-rich or schema-less constraints. It's perfect for storing files and hierarchically structured data that will be accessed by navigation or queries. You can choose where (if at all) you want ModeShape to enforce your schema, but your structure and schema can always evolve as your needs change. Sequencers make it easy to extract structure from stored files, and federation can bring into your database information from external systems. It's fast, sits on top of an Infinispan data grid, and open source. This presentation provides an introduction to how ModeShape 3 works, and was given at the Alpes JUG (in Grenoble, France) and the Geneva JUG (in Switzerland).TRANSCRIPT
![Page 1: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/1.jpg)
Elastic consistent NoSQL data storage withModeShape 3
April 2013
Randall HauchPrincipal Software Engineer at Red Hat@rhauch@modeshape
![Page 2: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/2.jpg)
SQL databases
2
BLOBor
CLOB
recursive JOINsand queries
SQL types (CHAR, VARCHAR, etc.)
![Page 3: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/3.jpg)
NoSQL databases
3http://www.flickr.com/photos/8431398@N04/2680944871
![Page 4: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/4.jpg)
NoSQL databases
4
Document
Key/Value
Column-oriented
Graph
Others, including hierarchical...
![Page 5: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/5.jpg)
ModeShapeAn open source
elastic in-memory hierarchical database with queries, transactions, events & more
5
![Page 6: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/6.jpg)
Hierarchical
• Organize the data into a tree structure – A lot of data has natural hierarchies– Conceptually similar to a file system– Nodes with properties– References enable graphs (not limited to parent/child)
• Navigate or query– Quickly navigate to related (or contained) data– Use queries to find data independently of location
6
![Page 7: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/7.jpg)
Nodes and names
• Node names– consist of a local part and a namespace (like XML names)– need not be unique within a parent node (but it is recommended)
• Namespaces– are URIs that are registered and can be assigned a prefix– prefixes are repository-wide, but can be permanently changed or
overridden locally by clients
7
Each node has a name.
Namespace prefix: “” (empty string)
Local part: “equipment”Namespace prefix: “jcr”Local part: “system”
![Page 8: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/8.jpg)
Node paths
• Absolute paths– the sequence of names from the root to the node in question– always start with a ‘/’ signifying the root node– may use a 1-based same-name-sibling positional index (which can
change if order of children are changed)
8
Each node is identified by a path.
These paths are equivalent: /facilities/San Fransisco/Eastford Plaza /facilities[1]/San Fransisco[1]/Eastford Plaza[1]
![Page 9: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/9.jpg)
Node paths (cont’d)
• Relative paths– the sequence of names from one node to another– never start with a ‘/’– similar to file system relative paths
9
Paths can be relative and can use “.” and “..”
From the “passenger” node to the “Eastford Plaza” node: ../../facilities/San Fransisco/Eastford Plaza
![Page 10: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/10.jpg)
Node identifier
• Used to lookup that node directly– no navigation is required– will never change after a new node is saved*, even if moved
(unlike paths)– behaves as a “unique key” within the workspace
(shared nodes behave differently)• A node in one workspace can “correspond” to a node in
a different workspace– they have the same identifier– one was created by “cloning” the other (in separate workspaces)– state can be transferred with “update” process– corresponding nodes share the same version history– this behavior is critical to understanding when to use separate workspaces
10
Each node has an opaque string identifier
* ModeShape assigns identifier at creation and never changes it
![Page 11: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/11.jpg)
Properties
• Nodes can have 0+ properties– each property must have
a unique name in a node• Properties have values
– single-valued: exactly 1 non-null value
– multi-valued: 0 or more possibly null values
• Values – are immutable– have an implicit type– are accessed by desired
type with auto-conversion;e.g., value.getString() or value.getNode()
11
The only place to store data on the nodes
Property Type Java typeSTRING java.lang.String
NAME java.lang.String
PATH java.lang.String
BOOLEAN java.lang.Boolean
LONG java.lang.Long
DOUBLE java.lang.Double
DATE java.util.Calendar
BINARY javax.jcr.Binary
REFERENCE javax.jcr.Node
WEAKREFERENCE javax.jcr.Node
DECIMAL java.math.BigDecimal
URI java.lang.String
![Page 12: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/12.jpg)
BINARY property values
• Any size binary content– read/written via streams
• Separate storage– content keyed by SHA-1– property value stored with node
contains SHA-1 and resolved when stream is read
– streamed content always buffered• Automatic text extraction
– text is used for full-text searching• Choices for binary storage
– File, DBMS, MongoDB, data grid– Custom–
12
Binary Storage
![Page 13: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/13.jpg)
Workspace
• Comprised of– a single root node– the “/jcr:system” branch containing the system-wide information– other nodes that have child nodes and properties
13
Named segments of a repository
![Page 14: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/14.jpg)
Putting the pieces together• Repository contains
– named workspaces– namespaces, node types, version storage, etc.
• Workspaces have– hierarchy of nodes– access to the shared system area
• Nodes have– name (can change)– identifier (doesn’t change)– path (can change)– properties (can change)
• Properties have values– single-valued: exactly 1 non-null value– multi-valued: 0 or more possibly null values
• Values – are immutable & can be reused– have an implicit type– are accessed by desired type with auto-
conversion; e.g., value.getString()14
![Page 15: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/15.jpg)
Session
• Authenticated and authorized– only sees content authorized by credentials– only changes content authorized by credentials– use the built-in auth service or integrate with your own
• Stateful– changes are kept in the session’s transient state until the session is saved– changes can be dropped without saving (e.g., “refreshing the session”)
• Lightweight– intended to be created, used, then closed– pooling sessions is more trouble than it’s worth
• Self-contained– exposed objects are tied to the session; can’t be shared w/ others
15
An authenticated connection to a repository, used to access a single workspace
![Page 16: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/16.jpg)
With or without schema
• Choose how much schema is enforced– define patterns for values and structure– use different patterns for different parts of the database– change the patterns over time– use the “best” levels of schema validation– evolve as necessary
16
STRICT ENFORCEMENT
NO ENFORCEMENT
![Page 17: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/17.jpg)
Queries• Find the data independently of the hierarchy• SQL-like language (including full-text search)
17
SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() LIKE $path
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe'))
SELECT file.*,content.* FROM [nt:file] AS file JOIN [nt:resource] AS content ON ISCHILDNODE(content,file) WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'
![Page 18: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/18.jpg)
Sequencing• Automatically extract structured content
– just write BINARY or STRING property values on nodes, then save– sequencers run asynchronously based upon path rules & MIME types– output stored in repository at configurable location
• Sequencers– DDL (variety)– text (fixed width, delimited)– Microsoft Office™– Java (source & class)– ZIP (and JAR/WAR/EAR)– XML, XSD, and WSDL– Teiid VDBs– audio (MP3)– images– CND– custom
18
1) upload
2) notify
3) derive and store
Sequencers
4) navigate or query
![Page 19: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/19.jpg)
Federation
• Access data in external systems– external data projected as nodes
with properties and node types– supports read and optional write
with same validation rules– transparent to applications
• Connector options– File system– Local git– CMIS repository– custom– (more are planned)
19
External source B
External source A
![Page 20: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/20.jpg)
Other features• Events
– register listeners to be notified of changes in content– optional criteria limits what listeners are interested in
• Versioning– checkin/checkout nodes & subtrees– branch, merge, restore
• Locking– short-lived locks (longer than transaction scope)
• Namespace management– programmatically (un)register namespaces
• Node type management– programmatically/declaratively define or update node types
• Monitoring– statistics for a variety of metrics
20
![Page 21: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/21.jpg)
Public APIs
21
![Page 22: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/22.jpg)
Java API• Standard Java API (JSR-283)
– javax.jcr packages– programmatically access,
find, update, query content– commonly needed features:
events, versioning, etc.– 95% of API
• ModeShape extensions– additional node type management methods– additional event types– additional Binary value methods (hash)– additional JCR-QOM language objects– cancel queries– sequencer and text extraction SPIs– monitoring API
22
![Page 23: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/23.jpg)
Other APIs• JDBC driver
– connect to local or remote repository– execute queries– access database metadata– enables existing applications to access content
• RESTful API– POST, PUT, GET, DELETE methods– JSON representations of one or multiple nodes– Streams large binary values– Execute queries
• WebDAV API– Exposes content as files and directories– Mount repository using file system
23
![Page 24: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/24.jpg)
ModeShapeAn open source
elastic in-memory hierarchical database with queries, transactions, events & more
24
![Page 25: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/25.jpg)
Elastic• Add more processes to increase storage
capacity and/or throughput– Transparent to applications! – No master, no slaves– Data is rebalanced as needed– Optionally separate database engine from storage
processes• Fault tolerant
– Processes can fail without loss of data– Cross-data center distribution (in near future)
25
![Page 26: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/26.jpg)
In-memory• Memory is really fast (and cheap)• Why not keep all data in application memory?
– practical limits to memory on particular machines– memory isn’t shared between machines– data stored in memory isn’t durable– no queries, structure, or transactions
• ModeShape– distributes multiple copies of data across the combined
memory of many machines– persist data to disk or DB (if really needed)– transparent to applications
26
![Page 27: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/27.jpg)
Strongly consistent• ACID
– Atomic, Consistent, Isolated, Durable– Already familiar to most developers– Easy to reason about code– Writes don’t block reads (MVCC)– Writes to one node don’t block writes to others
• JTA– Participate in user transactions– Works with Java EE
27
![Page 28: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/28.jpg)
Why not eventually-consistent?
• In eventually-consistent databases– changes made by one client will eventually (but not
immediately) be propagated to all processes– other clients won’t see latest data right away, yet can still make
other changes– there may be multiple versions of a particular piece of data
• Can be ideal for some scenarios– read-heavy and/or best-effort
• Applications that update data may need to– expect inconsistencies (and/or multiple versions)– specify conflict strategies – resolve conflicts (inconsistencies)
28
![Page 29: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/29.jpg)
Clustering topologies
29
![Page 30: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/30.jpg)
Single process
30
...
...
ModeShape
Infinispan cache(local)
Persistent Store
data
![Page 31: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/31.jpg)
Small cluster
31
...
...
ModeShape
Infinispan cache(replicated)
...
...
ModeShape
Infinispan cache(replicated)
...
...
ModeShape
Infinispan cache(replicated)
Persistent Store
data
events
data
events
datadatadata
![Page 32: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/32.jpg)
Moderate single- or multi-site cluster
32
...
...
ModeShape
Infinispan (distributed)
...
...
ModeShape
Infinispan (distributed)
data
events...
...
ModeShape
Infinispan (distributed)
data
events ...
...
ModeShape
Infinispan (distributed)
data
events
...
![Page 33: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/33.jpg)
Large single- or multi-site cluster
33
...
...
ModeShape
...
...
ModeShape
events...
...
ModeShape
events ...
...
ModeShape
events
...
Infinispan data grid
datadata data data
![Page 34: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/34.jpg)
Philosophy
• Often useful to start writing the code to create the nodes– use unstructured nodes to start out– create a node hierarchy using the natural data, and not too narrow or wide– determine when to use properties vs children with properties– determine how nodes will be named– set the properties using the desired names
• Step back– look at structural patterns and naming conventions– identify patterns, constraints, cardinalities, and limitations in property values– identify sets of properties that form characteristics– identify children and properties that are always there
• Create node types and mixin node types– use node types for essential/critical aspects that will likely never change– use mixin types wherever possible– consider how the data will be queried (more on this later)
34
Start creating content, and then design the structure
![Page 35: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/35.jpg)
Node type
• Define required and allowed properties – rules applied based upon: name, type, and multiplicity (wildcards allowed)– rules define: default value(s), attributes (e.g., autocreated, mandatory, protected,
versioning semantics, applicable query operators, used to order query results, full-text searchability), value constraints
• Define required and allowed child nodes– rules applied based upon: child name and required type (wildcards allowed)– rules define: default type for child node of a given name pattern, and attributes
(e.g., autocreated, mandatory, protected, versioning semantics, whether same-name-siblings are allowed)
• Node type characteristics – the supertypes (for inheritance)– whether it can be added/removed (“mixed in”) after a node is created– whether it is abstract– whether child nodes are ordered– whether it can be queried– which property or child node should be considered the “primary item” (if any)
35
Enforce structural restrictions on nodes and properties
![Page 36: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/36.jpg)
Node and node types
• Every node has a “primary type”– specified when the node is created; defaults to “nt:unstructured”– name of type is stored in the single-valued mandatory “jcr:primaryType” property– implementations may support changing primary type (ModeShape does)
• Every node has 0+ “mixin types”– can be added to a node at any time– can be removed from a node (assuming the node is valid without it)– name of mixins are stored in the multi-valued “jcr:mixinTypes” property
• A node is valid only when – every property has an applicable property definition rule in the primary type (or its
supertypes) or in any of the mixin types (or their supertypes)– every child node has an applicable child node definition rule in the primary type (or
its supertypes) or in any of the mixin types (or their supertypes)– validation is performed when saving a session
36
How are node types used?
![Page 37: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/37.jpg)
Wrap-up
37
![Page 38: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/38.jpg)
Best practices (1 of 2)
• Build structure first, then node types– most important to get your node structure right– it will change over time anyway, so don’t define the node types too soon
• Use mixin node types and mixins– where possible define sets of properties as mixins– use in primary types and dynamically add to nodes
• Limit use of same-name-siblings– useful when required, but can be expensive and difficult to use (i.e., paths change)
• Prefer hierarchies– moderate numbers of child nodes, use multiple levels if necessary
• Store files and folders with ‘nt:file’ and ‘nt:folder’– use it wherever appropriate; not for all binary data, though!
• Verify features are enabled– improves portability and safety with configuration changes
• Import and export– avoid document view; use system view wherever possible
38
![Page 39: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/39.jpg)
Best practices (2 of 2)• Prefer JCR-SQL2 and JCR-QOM over other query languages
– by far the richest and most useful– do this even when it appears the queries are more complicated
• Only Repository is thread-safe; no other APIs are– don’t share sessions– don’t share anything between sessions
• Register all listeners in special long-lived sessions– do nothing else with these sessions, however (Session is not threadsafe)– get off the notification thread ASAP, using work queues where necessary– Session is not threadsafe
• Create new sessions rather than reusing a pool of sessions– Sessions are intended to be lightweight as possible– Create a session, use it, log out (even web applications and services!)
• Avoid deprecated APIs– either perform poorly or are a bad idea; besides, they’ll be removed eventually
• Use Session.save() not Node.save()
39
![Page 40: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/40.jpg)
• Project ! http://modeshape.org• Blog ! http://modeshape.wordpress.com• Twitter ! @modeshape• IRC ! #modeshape (irc.freenode.org)• Code ! http://github.com/modeshape
40
Want more ModeShape?
![Page 41: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/41.jpg)
Questions?
41
![Page 42: Detailed look at consistent NoSQL data storage with ModeShape](https://reader034.vdocuments.us/reader034/viewer/2022052619/5563a822d8b42aae0d8b50bb/html5/thumbnails/42.jpg)