case study polyglot persistence in pharmaceutical industry
TRANSCRIPT
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Big Data Innovation Conference
Case Study: Polyglot Persistence in Pharmaceutical Industry
Ashutosh BijoorDirector, Reach1to1 Technologies Pvt. Ltd.
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Contents
● Customer Requirements
● Existing Architecture & Limitations
● Approach - Polyglot Persistence
● Challenges & Addressing Them
● Proposed Architecture
● Performance Results
● Similar Cases from Different Industries
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Customer Requirements
Information Sources User Applications
I.P.Research
Repository
Web Content Intranet
Data Files CustomerPortals
AnalyticalDashboards
Documents
Databases Admin Control
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Customer Requirements
● Information Sources– Integrate wide range of IPR related information sources
– Different document formats, size and frequency of updates
– Both structured and unstructured information
– Single repository to handle wide variety and large volume of data
● User Applications– Unified API to access and manipulate all data sources
– High performance of search and analytics as well as batch operations
– Flexibility of adding new data sources with minimal or no code change
– Extensible, high performance data processing architecture
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Existing Architecture
Information Sources User Applications
Files Archive
RDBMS
LoadingScripts
File APIDocuments
Web Content
Data Files
Databases
Dashboards
Intranet
CustomerPortals
Admin Control
LoadingScripts
SQL
ParsingScripts
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Existing Architecture Limitations
● Information Sources– Structured data in RDBMS – fixed schema
– Unstructured data in File Archive – no analytics
– Database unable to handle large volume of data
– Limits on volume and variety of data sources
● User Applications– Performance of search and analytics slowing down – not usable
– Inability to add new search & analytics features
– Batch ingestion of new data very cumbersome
– Stagnation of performance and capabilities
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Existing Architecture Performance
Performance Benchmarks
Batch
4 secs / 100 docs
SearchBatch
+
15 secs
Search
5 secs
Estimated time to add new data source: 3 months
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Approach
Single repository to handle wide variety and large volume of of data
Extensible, high performance data processing architecture
+
Which database do we choose?
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Which database do we choose?
Currently about 150 NoSQL Databases Listed!
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Factors affecting database choice
● Data Models– What type of data sources do we want to integrate?
– How do we want to manipulate / analyze the data?
– What is the volume, variety and velocity of data?
● Consistency, Availability, Partitioning (CAP)– Consistency: Only one value of an object to each client (Atomicity)
– Availability: All objects are always available (Low Latency)
– Partition Tolerance: Data split into multiple network partitions (Clustering)
– CAP Theorem: Choose any two - which two should we choose?
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Databases - Models and CAPability
● Data Models– Relational
– Key-Value
– Column Oriented
– Document Oriented
– Graph
● CAP ability– Consistency
– Availability
– Partition Tolerance
– Pick any two!
AA
CC PP
Pick Two!
APCA
CP
RDBMSsAster DataGreenplumVertica
CassandraSimpleDBCouchDBRiakDynamoVoldermort
BigTableHypertableHBase
MongoDBTerrastoreScalaris
MemcacheDBRedisNeo4j
Source:Visual Guide to NoSQL Systems by Nathan Hurst
Over 10 different models!
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Polyglot Persistence
Any one database does not fit all needs!
Documents
MongoDB
Analytics
RDBMS
Search
Apache Solr
Relationships
Neo4j
● Document-oriented● Flexible schema● Replication & High
Availability● Auto-sharding● Rich, document-
based queries● Fast In-Place Updates● GridFS● Aggregation
Framework
● Advanced text search● Flexible schema● Support for
highlighting, pivoted faceting, spell check, clustering
● Support for replication & sharding
● High-performance graph database
● Nodes and edges can have indexed meta data
● Graphs of several billion nodes on a single machine
● Powerful traversal framework
● Legacy data and apps● Structured data● Support for legacy
applications
Solution: Polyglot Persistence – use more than one database!
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Challenges
● Synchronization– How to manage consistency between multiple engines?
– How to maintain low latency of CRUD operations?
● Scalability– How to ensure high throughput of batch operations?
– How to handle large number of concurrent operations?
● Extensibility– How to allow new engines to be added with minimal architecture
change?
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Challenges – Addressing them
● High Performance Synchronization Engine– Logical Locking – flexible synchronization models
– Event-driven – distributed control logic
– Kanban Queues – balanced resource utilization
● Horizontally Scalable– Distributed processing – automatic
– Asynchronous I/O – high concurrency
● Component-based extensions– Application-specific Controller modules
– Re-usable Synchronization patterns
– Re-usable plugins for various databases
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Polyglot Persistence Platform
● Reusable customizable platform– Open source license
– Modular, extensible architecture
– Commercial plugins for various databases and indexing engines
● Proven performance– Based on NodeJS
– High performance in high load conditions
– Developed and supported by strongly invested team
http://oodebe.org
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Proposed Architecture
Information Sources User Applications
MongoDB ApacheSolr
RDBMS Neo4j
Web Content
Data Files
Documents
Databases
Intranet
CustomerPortals
Dashboards
Admin Control
SynchronizationEngine
Custom-builtWeb Services
Loading APIs
DB-specificAPIs
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Sample Operation
User Application User Application
Batch API Controller
SourceProcessor Doc
Processors
DBHandler
DBHandler
DBHandler
DBHandler
Data Source DB Engine 1 DB Engine 2 DB Engine 3
REST API
Kanban Queue
Asynchronous I/OAsynchronous I/O
Messages / Events
Locks
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Deployment Architecture
Controllers Cluster
Database Cluster
Data Processing Cluster
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Customer Requirements
● Information Sources– Integrate wide range of IPR related information sources
– Different document formats, size and frequency of updates
– Both structured and unstructured information
– Single repository to handle wide variety and large volume of data
● User Applications– Unified API to access and manipulate all data sources
– High performance of search and analytics as well as batch operations
– Flexibility of adding new data sources with minimal or no code change
– Extensible, high performance data processing architecture
Copyrights: Reach1to1 Technologies Pvt. Ltd.
New Architecture Performance
Performance Benchmarks
Batch
4 secs / 100 docs
SearchBatch
+
15 secs
Search
5 secs
Time to add new data source: 3 months 1 day
<1 sec 1.5 secs / 100 docs <1 sec
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Similar Cases from Other Industries
AirlinesCustomer Loyalty
Integration of flight schedules, ancillary services, bookings and payments into a single point interface for customers
InsuranceClaims Analysis
Integration of claims, feedback forms, customer info, call center logs into central repository for search and analytics
TelecomCRM Analytics
Call center logs, IVR logs, email and social media feeds archived for analysis and preventive fault alerts
BFSIInvestment Advisor
Integration of social media feeds, analyst opinions, web content and trading data with search and sentiment analysis
PublishingContent Repository
Aggregated and original content processed with text mining, automatic and assisted classification and annotation
MediaOnline TV
Broadcast schedules, ratings, social media feeds and user recordings for a TV Anywhere platform
Copyrights: Reach1to1 Technologies Pvt. Ltd.
About Reach1to1
● Over 10 years experience with NoSQL and Big Data– Implemented solutions in various industries
● Wide skill sets spanning emerging technologies – Big data, cloud and mobile applications
● Variety of engagement models– Projects, Consulting, Extended Delivery Centers
● Strong investor backing– Basil Partners, Singapore
● Low operating costs and high reach– Sales team in US, delivery team in Mumbai and Bangalore
Copyrights: Reach1to1 Technologies Pvt. Ltd.
Ashutosh Bijoor
[email protected]://bijoor.me
Big Data Innovation Conference (c)
Thank you!