master meta data
TRANSCRIPT
Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch.
Hello!I am Akhil Agrawal
Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData
Started BIZense in 2008 & Digikrit in 2015
1.Problem
Let’s start with what problem we are addressing – why mastermetadata ?
Less Frequently Changing
Master data and meta data both have one common behavior of less frequent changes although their purpose is different.
The less frequently changing data whether it is data about real world entities (master data) or data about other data (meta data), both can be stored, accessed and managed in very similar ways.
Why MasterMetaData ?
No Open Source Option
There are MDM solutions (mostly from ERP vendors like SAP, Oracle etc. & analytics companies like Informatica, SAS) but the master meta data intersection is being explored only recently.
There is no open source alternatives for smaller companies or something that can be embedded with SAAS products.
Why MasterMetaData ?
2.Definitions
Let’s start with some definitions around data categories
Definition of Data Categories
Meta Datameta information about other forms of data (can describe master, transaction or lower level meta data)
Master Datareal world entities like customer, partner etc. (only the stable attributes are considered part of master data)
Transaction Datareal world interactions which have very short lifespan and occurrence is linked with time/space(unstable/changing attribute values, although definition/description is stable but each new data point is unique)
Master Meta Datacombination of master and meta data defined at application, enterprise or global level (although the volume and variety of master & meta data is very different, they have lot of common access patterns)
3.Implementation
Let’s discuss the implementation – technologies & concepts involved
Background
◎ Faced difficulty with managing master and meta data in previous projects
◎ Implemented custom solution while building mobile ad platform
◎ Currently implementing same features required for the communication platform
◎ Have worked with elasticsearch + kibana while kong + cassandra seems useful
Build With Following Technologies
neo4jhighly scalable native graph database that leverages data relationships as first-class entities, handles evolving data challenges
elasticsearchsearch and analyze data in real time, defacto standard for making data accessible through search and aggregations
cassandraright choice when you need linear scalability and high availability without compromising performance & durability
kongthe open-source management layer for APIs and microservices, delivering security, high performance and reliability
lualua is a powerful, fast, lightweight, embeddable scripting language. For writing kong plugins for access to various meta master data
kibanaexplore and visualize data in elasticsearch, opensource project from elasticsearch team, intuitive interface, visualization & dashboards
Opensource,
Scalable,Searchable
,Ready to
UseProject mastermetadata needs to be ready to use for atleast few of the use
cases like location, device, movie, tour etc.
Challenges
Complex & hierarchical data sets
Real-time query performance
Dynamic structure
Evolving relationships
Why neo4j for mastermetadata ?
Why neo4j ?
Native graph store
Flexible schema
Performance and scalability
High availability
Referenced fromhttp://neo4j.com/use-cases/master-data-management
Why elasticsearch for mastermetadata ?
Scale
◎ Real-Time Data
◎ Massively Distributed
◎ High Availability
◎ Multitenancy
◎ Per-Operation Persistence
Search
◎ Full-Text Search
◎ Document-Oriented
◎ Schema-Free
◎ Developer-Friendly, RESTful API
◎ Build on top of Apache Lucene™
Analytics
◎ Real-Time Advanced Analytics
◎ Very flexible Query DSL
◎ Flexible analytics & visualization platform - Kibana
◎ Real-time summary and charting of streaming data
Referenced from https://www.elastic.co/products/elasticsearch
Why kong for mastermetadata ?
Secure, Manage & Extend your APIs and Microservices
RESTful Interface
Plugin Oriented
Platform Agnostic
Referenced fromhttps://getkong.org/
Without Kong With Kong
4.Interesting
What are interesting things happening around this ?
Master & Metadata Management InteresectionMaximized Metadata Model
◎data model describing the metadata needs to be “maximized” to cover as many use cases possible
◎meta data model needs to be inclusive of all metadata in the organization as well as cover the master data
◎governance of metadata model requires the ability to describe maximum metadata in the system to provide ability to govern data describing other data
Minimalistic Master Data Model
◎master data model describing master data needs to be “minimalist”
◎master data model is neither inclusive of all data in the organization, nor specific to applications using it for specific purpose
◎central governance of master data requires that data model backing it is minimalistic to be able to govern without application specific details
◎master data model is basically metadata describing the master data
Referenced from http://blogs.gartner.com/andrew_white/2011/04/26/more-on-metadata-and-master-data-management-intersection/
From Big Data To Smart DataZero Latency Organization
data◎ latency linked to the data
(capturing)
◎ latency linked to analytical processes (processing)
structural◎ latency linked to decision
making processes
◎ time needed to implement actions linked with decisions
action◎ data latency added with
structural latency
◎ time needed from capturing of data till the action takes place
valuedata is considered smart based on the value it brings in decision making and action taking (than anything else like size, source, etc)
masterdata which represents real world entities and also remains stable over time is the smart data as it helps with common data reference
metadata which describes other data whether master, transactional or lower level meta data is also smart data as it helps in understanding
Types Of Latency
Smart Data
5.Get Involved
Let’s discuss ways to get involved in this project
Areas where you can get involved ?
DEMO
Functional Tests,Integration Tests,
Run Demo
CODE
Implement Ideas,Fix Bugs,
Enhance Features
DOCUMENT
User Documentation,
Developer Documentation
Current Focus
Devices
Storage: Device, Browser, OS
Access: User Agent
Locations
Storage: Country, State, City
Access: IP Address
Tours
Storage: People, Interest, Culture, Destination, City, Activity, Duration
Access: What, Where, For
Storage & Access
Master Data StorageStorage which is highly efficient for read but at the same time efficient for writes. Additional requirement to be able to search the stored data as well as flexible efficient query interface to enable faster access
Meta Data StorageStorage which is highly flexible in defining relationships like inheritance, composition or other relationships. Graph modeled relationships are most flexible to change as and when the model evolves
Diagram featured by poweredtemplate.com
Meta Data Access
CRUD, Fill in the blanks, Semantic Query, Search
Master Data Access
CRUD, Query (Structured / Unstructured) & Search
References
https://getkong.org/ http://neo4j.com/ http://cassandra.apache.org/ https://www.elastic.co/ http://
booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf
http://blogs.gartner.com/andrew_white/2011/04/26/more-on-metadata-and-master-data-management-intersection/
http://neo4j.com/use-cases/master-data-management/
Thanks!Any questions?
You can find me at:@[email protected]
Special thanks to all the people who made and released these awesome resources for free: Presentation template by SlidesCarnival Presentation models by SlideModel & PoweredTemplate To companies behind kong, cassandra, neo4j & elasticsearch