big data & oracle technologies
TRANSCRIPT
![Page 1: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/1.jpg)
BIG DATA & ORACLE
TECHNOLOGIES
KIEV
OCT 2013
PRACTIC CONSULTINGAlliance of Professional IT & Management Consultants
HTTP://PRACTIC-CONSULTING.COM
![Page 2: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/2.jpg)
Agenda
• ABOUT BIG DATAWHAT
• INDUSTRY EXAMPLES OF BIG DATAWHEN
• ORACLE NO SQL
• ORACLE R
• ORACLE ENDECA
HOW
![Page 3: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/3.jpg)
WHAT IS BIG DATA?
PART I
![Page 4: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/4.jpg)
What Is Big Data?
Big Data – is data that becomes large
enough that it cannot be processed
using conventional methods
Big Data – is the new generation
of data warehousing and
business analysis systems
010101101010100101010101010101010010101010100101010101001010101010010101010101010100101010100101010101010101001010101010010101010101010010101001010101001010101001010101001010101010101001010101010100101010100101010100101010100101010101001010101001010101001010101010010101010100101010010101001010101001010101001010100101010010101010010101010010101010010101010010101001010101010101010101010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010100101010100101010100101010
![Page 5: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/5.jpg)
A Wider Variety of Data
Internet
Data
Clickstream
Social media
Social media stream
Web site logs
Research
Data
Experiments
Observations
Surveys
Marketplace data
Healthcare
Data
Treatment data
Telehealth
National Electronic Health Records
Procedures
Image
Data
Image
Video
Satellite image
Surveillance
Device
Data
RF Devices
Sensors
EDI
Telemetry
![Page 6: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/6.jpg)
Why Is Big Data Important?
Big Data - Just another buzzword
or powerful business & science enabler?
SQL Analytics
• Count
• Mean
• OLAP
Descriptive Analytics
• Univariatedistribution
• Central tendency
• Dispersion
Data Mining
• Association rules
• Clustering
• Feature extraction
Predictive Analytics
• Classification
• Regression
• Forecasting
• Spatial
• Machine Learning
• Text Analytics
Simulation
• Monte Carlo
• Agent-based modeling
• Discrete event modeling
Optimization
• Linear Optimization
• Non-Linear Optimization
Business Intelligence Advanced Analytics
![Page 7: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/7.jpg)
INDUSTRY EXAMPLES
OF BIG DATA
PART II
![Page 8: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/8.jpg)
Marketing & Sales + Big Data
TO DELIVER AN ANSWER
100 milliseconds
COUNT OF ADS
100,000 per SECOND
http://www.dataxu.com/
ADVERTISING
PLATFORMClickstream, Behavior
![Page 9: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/9.jpg)
Retail + Big Data
CAPTURE
1,000 tweets per SECONDS
INCREASE OF DATA
+10 TB per DAY
http://www.walmart.com/
WAL-MART ONLINE
MARKETINGSocial Media
![Page 10: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/10.jpg)
Health Care + Big Data
INCREASE OF DATA EACH MONTH
+10 TB
PATIENTS INVOLVED
10,000
https://cghub.ucsc.edu/index.html/
CANCER GENOMICS
HUBDNA and RNA data
![Page 11: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/11.jpg)
Science + Big Data
SEVEN TELESCOPES CAPTURE
2 MB per SECOND
IN NEXT 10-15 YEARS ALL
TELESCOPES WILL RECEIVE
30 TB per SECOND
http://www.skatelescope.org/
THE CATALOG OF
UNIVERSEData from Telescope
![Page 12: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/12.jpg)
ORACLE TECHNOLOGIES
PART III
![Page 13: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/13.jpg)
Oracle NoSQL
Hadoop Distributed File
System (HDFS)Oracle NoSQL Database
File System Database
Parallel scanning Indexed storage
No inherent structure Simple data structure
High volume writesHigh volume random reads
and writes
Batch Oriented Real-Time
Big Data Storage Choices
![Page 14: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/14.jpg)
Oracle NoSQL
• RDBMS
– High value, high density,
complex data
– Complex data relationships
– Schema-centric
– Designed to scale up & out
– Lots of general purpose
features/functionality
High overhead ($ per
operation)
• NoSQL architectures
– Low value, low density, simple
data
– Very simple relationships
– Schema-free, unstructured or
semi-structured data
– Distributed storage and
processing
– Stripped down, special
purpose data store
Lower overhead ($ per
operation)
![Page 15: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/15.jpg)
Oracle NoSQL
Simple Data Model
Small, distributed footprint
Highly scalable, available
Transparent load balancing
Integrates with Oracle Stack
Application
Storage NodesDatacenter B
Storage NodesDatacenter A
NoSQL Database Driver
Application
NoSQL Database Driver
A Distributed, Scalable Key-Value Database
![Page 16: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/16.jpg)
Oracle NoSQL
Key-value pairs
• Simple data model – key-value pair (major+minor-key paradigm)
• Simple operations – read/insert/update/delete, RMW support
• Scope of transaction – records within a major key, single API call
• Unordered scan of all data (non-transactional)
userid
addresssubscriptions
email idphone #expiration date
Major key:
Sub key:
Value:
Strings
Byte Array
![Page 17: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/17.jpg)
Oracle NoSQL
On Line Display Advertising
![Page 18: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/18.jpg)
Oracle NoSQL
Getting Started with Oracle NoSQL DB
1. Download from OTN:
www.oracle.com/technetwork/products/nosqldb/downloads/index.html
2. Review Quick Start & Getting Started Guide
3. Review Programmatic API Guide
4. Start writing Java code
![Page 19: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/19.jpg)
What is R?
• R is an Open Source language and
environment for statistical computing
and graphicshttp://www.R-project.org/
• Started in 1994 as an Alternative to
SAS, SPSS & Other proprietary
Statistical Environments
• The R environment– R is an integrated suite of software facilities for data
manipulation, calculation and graphical display
• Around 2 million R users worldwide– Widely taught in Universities
– Many Corporate Analysts know and use R
• Thousands of open sources R
packages to enhance productivity such
as:– Bioinformatics
– Spatial Statistics
– Financial Market Analysis
– Linear and Non Linear Modeling
![Page 20: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/20.jpg)
Why statisticians/data analysts use
R?
R environment is ..
• Powerful
• Extensible
• Graphical
• Extensive statistics
• OOTB functionality with
many ‘knobs’ but
smart defaults
• Ease of installation and use
• Free
![Page 21: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/21.jpg)
Limitations of R
• R is a client and server bundled together as 1 executable
– Single user tool, like Excel
– Single-threaded
– Cannot leverage multi-CPU capacity without use of special
packages and coding
• R requires data to be loaded into memory first
– Loading data may not be a limitation given RAM available on
laptops/desktops
– R’s call by value semantics means that as data flows into functions,
for each function invocation, a complete copy of the data is made
– As a result you can quickly run into memory limits
![Page 22: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/22.jpg)
Oracle R Connector for Hadoop
• Provides transparent access to Hadoop Cluster, which
consists of MapReduce and HDFS-resident data
• R users not required to learn new language or interface to
work with Hadoop
• R users can execute jobs on a Hadoop cluster without
requiring knowledge of Hadoop internals, Hadoop CLI, or
IT infrastructure
• Ability to leverage open source contributed R packages to
work on HDFS-resident data
![Page 23: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/23.jpg)
Oracle R Enterprise
• Provides familiar R environment to operate on database-
resident data
• Overloads base R functions for scalable execution in
Oracle Database
– Automatically generates SQL from R and submits query to
database
– Leverages table parallelism where applicable
• Enables embedded execution of R scripts at Oracle
Database server
– Provides database-controlled data-parallel execution framework
– Enables leveraging CRAN open source R packages
• Enables integration of structured results and graphics with
OBIEE dashboards and BI Publisher documents
![Page 24: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/24.jpg)
Oracle R Links
• Blog: https://blogs.oracle.com/R/
• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397
• Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html
• ROracle: http://cran.r-project.org/web/packages/ROracle
• Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise
• Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview
![Page 25: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/25.jpg)
Other Oracle Big Data Products
Oracle Endeca Information Discoveryhttp://www.oracle.com/us/solutions/business-analytics/business-
intelligence/endeca/overview/index.html
Oracle Data Integrator Application Adapter for Hadoophttp://www.oracle.com/us/products/middleware/data-
integration/hadoop/overview/index.html
Oracle Loader for Hadoophttp://www.oracle.com/technetwork/bdc/hadoop-loader/learnmore/index.html
![Page 26: Big Data & Oracle Technologies](https://reader034.vdocuments.us/reader034/viewer/2022042715/55941b851a28abf72b8b46cd/html5/thumbnails/26.jpg)
The End
The best way to predict the future is to
create it!
- Peter F. Drucker