nosql simplified: schema vs. schema-less
DESCRIPTION
A look at the many facets of schema-less approaches vs a rich schema approach, ranging from performance and query support to heterogeneity and code/data migration issues. Presented by Leon Guzenda, Founder, ObjectivityTRANSCRIPT
The Database for Big Data Solutions
© Objectivity, Inc. 2014
NoSQL Simplified: Schema vs Schema-less
Leon Guzenda & Nick Quinn Meetup - February 20, 2014
!1
© Objectivity, Inc. 2014
Overview
• Objectivity Inc.
• Pros & Cons:
• Schema • Schema-less
• What We Provide
• A Compromise
!2
© Objectivity, Inc. 2014
Objectivity, Inc.
• Headquartered in San Jose, CA • Over two decades of NoSQL and Big Data experience • Enables complex data virtualization and Big Data
solutions for the enterprise • Software products: • Objectivity/DB • InfiniteGraph • InfiniteGraph Social App
• Embedded in hundreds of enterprises, government organizations and products, with millions of deployments.
!3
© Objectivity, Inc. 2014 !4
Objectivity/DB
• Fully distributed object database.
• Handles complex, highly inter-related data. "
• Extremely fast navigational access.
• Scalable collections and B-Tree indices
• ACID transactions plus Multi-Reader, One Writer mode.
• Highly scalable - Single Logical View plus simple servers
• Parallel Query Engine and Relationship Analytics
• Fully interoperable C++, C#, Java, Python and SQL++ on Windows, Unix, Linux and Mac OS X.
!4
© Objectivity, Inc. 2014
ODBMS Deployments
!5
Monitoring & Response Telecom Infrastructure
Big Science Complex Financial Systems
Data Fusion
© Objectivity, Inc. 2014 !6
InfiniteGraph
• Fully distributed graph database
• High throughput and scalability "
• Extremely fast navigational access
• ACID transactions for online operation
• Relaxed consistency during batch-mode parallel ingest
• Parallel queries
• Flexible indexing, including Lucene for text
• Java API and Gremlin support!6
© Objectivity, Inc. 2014
Graph DBMS - Finding The Links
!7
OTHER DATABASE(S)
GRAPH DATABASE
© Objectivity, Inc. 2014
Objectivity’s Disruptive Big Data Architecture
!8
Uses Data Virtualization to hide the nodes and focus on the connections
© Objectivity, Inc. 2014
Schema: Pros & Cons
!9
© Objectivity, Inc. 2014
Who's Who?
• SCHEMA: • Network [CODASYL] databases - DDL [1972] • Relational Databases - Data Dictionary • Object Databases - ODMG'93 • Most Graph Databases "
• Schema-less: • KSAM/ISAM/DSAM/ESAM • IMS (hierarchical) • Pick OS Database (hash-tables) • MUMPS (hierarchical array-storage) • MongoDB - a specialized JSON (and JSON-like)
document store. • CouchDB - a JSON document store.
!10
© Objectivity, Inc. 2014
Schema: Pros...
• Global data definitions "
• Optimal access "• Enables Query By Example "
• Interoperability "
• Schema change control "• Schema contents can be manipulated via standard
APIs and tools
!11
© Objectivity, Inc. 2014
• Global data definitions: • Data types and the relationships between them • Makes queries more efficient • Actions can be restricted by data type, field values, relationship types "
• Optimal access: • Used to determine how to best store, manage and access particular data types "
• Enables Query By Example by showing: • Types of information available • Relationships between them "
• Interoperability: • DBMS can change the shape of data items to suit the language/environment "
• Schema change control: • Can be used to enforce workflows that will keep applications and data in sync. "
• Schema contents can be manipulated via standard APIs and tools: • Easier learning curve • Uniform security controls:
• The schema can use the same security controls as the data • Query and visualization tools can be used for both data and schema
!12
...Schema: Pros
© Objectivity, Inc. 2014
Schema: Cons
• The database designer and application developers have to create and maintain the schema.
"• Applications have to be kept in sync with schema
changes. "• Applications and programmers have to be aware of data
types • Though this is one of the major claimed advantages of object-
oriented programming. "
• There is a perceived loss of flexibility • Though this is more a function of the user interface to the
database than the underlying mechanisms.
!13
© Objectivity, Inc. 2014
Schema-less: Pros…
• Flexibility "
• Can be more tolerant of variable Acidity and Consistency models "
• Ease of use and maintenance:
!14
© Objectivity, Inc. 2014
…Schema-less: Pros• Flexibility - Users can, in theory: "
• Put any kind of data into the system • Create new kinds of relationships between things (in a few
products) • Find data without worrying about the types of data
involved. "
• Can be more tolerant of variable Acidity and Consistency models "
• Ease of use and maintenance: • No need to worry about data types • No need for a DBA • Applications will [probably] work when new data arrives
!15
© Objectivity, Inc. 2014
Schema-less: Cons…
• Confusion "
• Performance suffers "
• poor Integrity "
• Ambiguity
!16
© Objectivity, Inc. 2014
…Schema-less: Cons• Apparent tolerance of variable CAP models is actually orthogonal to
the schema vs schema-less debate [as is support for sharding]. "
• Performance suffers "
• Integrity is practically non-existent • Maintaining referential integrity is hard • Queries may misinterpret values within an object
• 54686973206973206120737472696e6720706c7573206120666c6f6174696e6720706f696e74206e756d62657258585858706c757320616e6f7468657220737472696e67
!17
© Objectivity, Inc. 2014
Schema-less: Cons• Apparent tolerance of variable CAP models is actually orthogonal to
the schema vs schema-less debate [as is support for sharding]. "
• Performance suffers "
• Integrity is practically non-existent • Maintaining referential integrity is hard • Queries may misinterpret values within an object
• 54686973206973206120737472696e6720706c7573206120666c6f6174696e6720706f696e74206e756d62657258585858706c757320616e6f7468657220737472696e67
!18
Floating Point
© Objectivity, Inc. 2014
Schema-less: Cons• Apparent tolerance of variable CAP models is actually orthogonal to
the schema vs schema-less debate [as is support for sharding]. "
• Performance suffers "
• Integrity is practically non-existent • Maintaining referential integrity is hard • Queries may misinterpret values within an object
• 54686973206973206120737472696e6720706c7573206120666c6f6174696e6720706f696e74206e756d62657258585858706c757320616e6f7468657220737472696e67
• A ZIPcode may be stored as an integer (01234) or a string (“01234”) in JSON, causing query and display problems.
!19
Floating Point
© Objectivity, Inc. 2014
The NoSQL Players
!20
Intersystems MarkLogic McObject
Operational
AppEngine Cloudant CouchDB MongoDB RavenDB
Document
Object/GraphObjectivity/DB
Progress Versant "
AllegroGraph InfiniteGraph
Neo4j Titan Berkeley DB
Cassandra Redis Riak
Voldemort
Key-Value
Couchbase
Column Family
HBase HyperTable SimpleDB
*
* *
* Fully or partially schema-less
© Objectivity, Inc. 2014
A Compromise
Provide Flexibility With The Advantages Of Having A Schema
!21
© Objectivity, Inc. 2014
Objectivity/DB Schema Usage
• Has an internal schema in its system database (the Federated DB). "• User schemas are created and updated by:
• Creating .ddl files and pre-processing them with the DDL processor. • Creating and compiling Java, C# or Python header files. • Declaring or dynamically creating/modifyingSmalltalk classes (defunct). • Declaring and changing table definitions with Objectivity/SQL++.
"• SQL++ table/column definitions are updated automatically when classes are
declared or modified using other languages. • This allows SQL++ to access C#, C++, Java and Python objects and vice-versa.
"• A Federated Database can contain multiple named Schemas:
• Reduces re-compilation and re-building after a localized schema change. • May facilitate security mechanisms in the future.
!22
© Objectivity, Inc. 2014
Objectivity Active Schema"
• API and tools for creating, modifying, reading and deleting class definitions, which include association (relationship) definitions. • If used with a dynamic language, such as Smalltalk, creating or
modifying a class doesn't need to affect existing programs. • In general, only generic access (via the ooObj base clase) can be used
without creating the files needed to recompile programs and methods for accessing the new object types.
"• Helps application developers build tools that need to access the schema,
e.g.: • Graphical query tools • highly flexible object modeling capabilities for end users. "
• An end-user, such as a field technician or an analyst: • Can add local object classes, populate, maintain and query them,
but... • Cannot interfere with the correct operation of the pre-built
applications.
!23
© Objectivity, Inc. 2014
Use Cases
!24
© Objectivity, Inc. 2014
Use Case 1 - Intelligence Gathering Framework…
!25
• An integrated application development framework that focuses on adaptability.
• Dynamic modeling of entities, services and workflows.
• Versioning and temporality features support system evolution.
The screenshots show a location that is under surveillance and everything known about it in the database.
1 2of
© Objectivity, Inc. 2014 !26
2 2of
• Eliminates the mapping layer between the user defined objects and the database.
• Performance and scalability.
• Active Schema facilitates object migration.
Design and Information Feeds Users
Database
…Use Case 1 - Intelligence Gathering Framework
© Objectivity, Inc. 2014
Use Case 2 - GDMO Framework
!27
"• Operations, Administration, and"Maintenance interface for the CDMA"system RF infrastructure
• Controls the Base Station Controller and Base Station Transceiver Subsystem
• GDMO* Schema and CMIP agent-manager"messaging
• A SPARC-based BSC rack supports a"peak load of 150,000 simultaneous callers
• Deployed in CDMA networks worldwide,"including SprintPCS"
* GDMO is the Guideline for the Definition of Managed Objects
© Objectivity, Inc. 2014
Use Case 3 - Ontology Framework
!28
"• Uses standard objects to define a meta-
schema
• It is used to define concept templates
• They can be inherited from, combined or extended to support a “class specification”
• The data is combined with Horn Logic to build complex ontologies."
* GDMO is the Guideline for the Definition of Managed Objects
SCHEMA
CONCEPT
CLASS COMPONENTS
STRUCT ARRAY FIELDRELATIONSHIP
LOGIC
© Objectivity, Inc. 2014
Summary
• Don’t confuse CAP issues with Schema considerations
• Schemas make the DBMS more powerful
• Schema-less architectures are more flexible
• It’s possible to build flexible systems with Schema-based infrastructure
!29
© Objectivity, Inc. 2014
THANK YOU
• Please visit objectivity.com for:
• Features • Use Cases • White Papers • Free downloads (60 day evaluation) • Sample Applications • Application Developer’s Wiki "
• For further information: "• Email: [email protected]
!30