data-centric infrastructure for agile development
DESCRIPTION
Most data centers are filled with rigid data servers that are tightly linked to specific applications, leading to data duplication, lengthy development cycles, and unnecessary costs. Learn how you can use an Enterprise NoSQL database platform to help create a flexible, agile data fabric that will allow you to iterate your application development, optimize your data, and reduce costs. When your enterprise infrastructure is data-centric instead of application-centric, you make it easy for anyone to pull crucial data without spending unnecessary time and money on plumbing...freeing resources for building better applications. Learn how other companies have built –and benefited from– a data-centric infrastructure for agile development. Ingest and manage all your data, documents, and semantic triples in a flexible, schema-agnostic platform – without sacrificing the ACID transactions, granular security, database management tools and other features you’ve come to expect in a mature database platform Quickly build complex, interactive search applications Deliver robust, real-time search and alerting within your applications Use – and optimize – modern infrastructure including Hadoop and cloud to attain operational agility Simplify implementation of data governance requirements around security, privacy, provenance, retention, continuity, and compliance – while reducing risk, cost, and timeTRANSCRIPT
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The Data-Centered Data CenterPresented by: Jim Clark, Senior Director of Product Management
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
THE WORLD IS VERY APPLICATION-CENTRIC
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
2. Determine needed data 3. Determine needed queries
?
?
1. Design the application
7. Load the data 8. Code the application 5. Build a database 6. Design the ETL strategy
4. Design the schema and indexing strategy
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4
OLTP
Warehouse
Data MartsArchives
“Unstructured”
“ ”
VideoAudio
Signals,Logs,Streams
Social
Documents,Messages
{ }Metadata
Search🔍
ReferenceData
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
HOW DO YOU DETERMINE IN ADVANCE WHAT'S USEFUL?
Love the application...can
you go back and include the
data from 1990 – 1995?
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
TOO MUCH DATA TO BE COPYING FOR EVERY NEW APPLICATION
Serious?! Third time this
month I'm moving that
data around!
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
ETL CONSUMES ALL RESOURCES
With all of the new data
we're trying to get into the
database, there's no time to
build new features!
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8
TOO MANY TECHNOLOGIES CREATES SCALING HEADACHES
To scale this system, we've got to buy
new hardware. We can take the old
hardware and move it to this other
system. That one can't get any bigger.
Period.
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9
TOO MUCH AND TOO MANY COPIES...YOU'VE LOST CONTROL
Who's reading it? Who's
editing it? Where's the
master copy? What's
happened to it over time?
Is it reliable?
How up-to-date is this data
store? Are the security
models consistent? Are there
different backup models? Are
the lifecycles, retention,
disposal policies the same?
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
APPLICATION-CENTRICDATA CENTER
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
APPLICATION-CENTRICDATA CENTER
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
The data-centered data center
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
5. On-premises, Cloud... both!
3. Elasticity with no downtime
6. Create powerful data services
1. Hadoop4. Manage
the data lifecycle2. Low-cost Tiered Storage
7. Complete database platform
How?
8. Enterprise Readiness
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
Enter Hadoop…
Hadoop
Staging Analytics
Persistence
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
Legacy RDBMS Indexes Transactions Security Enterprise operations
“NoSQL” Flexible data model Commodity scale out Distributed, fault-tolerant Hadoop sink/source
Why must we choose?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Enterprise NoSQL
Flexible data model, comprehensive indexeso Documents: Hierarchy, text, values, tags—schema “when you need it”
o Scalars: Aggregates and range filters, including geospatial
o Triples: Linked facts and inferencing
o Permissions: Users, roles, compartments, and privileges
o Queries: Reverse indexes for alerting, matching
Ad hoc queries, lock-free reads Real-time transformation Strict consistency, security throughout
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
Data-centered
EnterpriseNoSQL
HadoopMarkLogic
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
NoSQL Online applications Delivery Decision-making Real-time Granular updates Distributed indexes
Hadoop Offline analytics Staging Model-building Long-haul batch Write-once, read-many Distributed file system
Complementary approaches
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
With Tiered Storage You Can
Provide multiple Service Level Agreements (SLAs)
in a single system
Decrease time and costs of ETL to bring
offline content back online
Empower your operations team without
imposing burdens on your developers
SLIDE: 22 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Tiered StorageHere’s how you enable tiered storage…
Define data tiers based on a range index
Have content balanced into forests by tier
Move an entire tier to different storage
Query one tier…
…or the other tier…
…or both at once!
All with no downtime, and 100% consistency!
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
OPERATIONAL TRADE STORE
Case Study:
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
Tier 1 Bank: Operational trade store
“What are the bank’s obligations?”
ETL
Trade execution
Post-trade processing
Reporting
Analytics
Trade stores
Reference data
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
Legacy trade store challenges
Long development cycles for new instrument types Complex combinations of ETL and data models Limited visibility across the business Governance risk, maintenance costs of siloed infrastructure Varied SLAs and access patterns created inefficiencies
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
Preserving Context with Documents
Trade Cashflows
Party Identifier Net Payment
Payment Date
Party Reference
Payer Party
Trade ID
Payment AmountReceiver
Party
ApplicationModel
ProviderModel
PersistenceModel
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27
Information lifecycle
Active Historical Archive
Time
SSDDASSANHadoop
DASSANNASHadoopS3
NASHadoopS3
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28
Active
Active Local 10K SAS, RAID10
Replication for HA
Merge overhead for updates
20 hosts, 320 shards
4 TB of SSD cache
96 TB
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29
Compliance
Active
Compliance Shared NAS
63 hosts
Effective 8 TB/host
504
96
TB
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
Active
Compliance
Analytic Hadoop
120 hosts
Effective 12 TB/host
10 MarkLogic hosts
Analytic
1,044
504
96
TB
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
Active
Compliance
Analytic
Online migration
TB
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
96 504 1,044
592 2,066 2,080
Total Size (TB)
Total Cost ($000)
Effective Unit Cost ($/GB)
$4
Compliance
$1.50
AnalyticOperational
$25
($/GB)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33
Align infrastructure with objectives
Data volumes are increasing, but IT budgets are not Storage is the dominant factor in the overall cost Value of data and pattern of access varies widely and changes over time
Last month’s news
Current quarter’s open transactions
Latest message traffic
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 36
With Elasticity You Can
Know when to scale
How much to scale
Programmatically expand and contract
On premises or in the cloud
SLIDE: 37 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Elasticity
Scale up and down with Tools to understand in detail how your cluster
is performing, and to find bottlenecks
Fine-grained tuning parameters for optimization of indexes, cache sizes, etc.
Cloud orchestration APIs to expand and contract clusters programmatically on-prem or in the cloud
Continuous, online rebalancing of content across nodes in a cluster to keep performance optimal for your cluster size
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39
The data-centered data center
Index once
Single security model
Flexible data model
Transactions
Elastic operations
…when you need themSimplified governance
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40
SECUREMinimize duplication,
costly ETL, reduce risk
REAL-TIMEEnterprise-class database for real-time search, delivery &
analytics
THE DATA-CENTERED DATA CENTER
RUN APPLICATIONSRun mission critical applications
directly on HDFS
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41
PowerfulDeliver more value, build more powerful applications
Full Text Search
Scalable
Analytic Functions
Alerting & Event
Processing
Geospatial Query
In-database MapReduce
Visualization Widgets
Semantics: RDF &
SPARQL
Flexible Indexes
JSON Storage
REST & Java APIs
Triple Index
POWERFULDeliver more value, build more powerful applications
AGILEPrepare for and respond quickly to change
BI Integration
HDFS & Amazon S3
Storage
Elastic
ProgrammaticControls &
Metering
Application Builder
Information Studio
SQL Support
HadoopConnector
Tiered Storage
CloudReady
Schema-Agnostic
mlcp Content
Pump
TRUSTEDEnterprise-ready and secure for mission-critical apps
ACID Transactions
XA Distributed
Transactions
Database Rollback
Backup/Restore
Automated Failover
Journal Archiving
Replication
Point-in-time
Recovery
Monitoring &
Management
Role-based Security &
LDAP Support
Common Criteria
Security Certification
ConfigurationManagement
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42
Take-Aways
New and more data is both an opportunity and a threat Last generation of data management is not sufficient More copies, representations, transformations increase risk and slow innovation Index once and reuse across workloads, lifecycle
NoSQL: indexing and updates for interactive apps
Hadoop: staging, persistence, and analytics
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43
SEARCHDATABASE
APPLICATION SERVICES