big data analytics - cleveland state...
Post on 20-May-2020
6 Views
Preview:
TRANSCRIPT
Big Data Analytics
Sunnie ChungElectrical Engineering and Computer Science
2
Big DataHow Much of Data ? In Peta Bytes !
• Google processes 40 PB a day (2016)• eBay has 11 PB of user data + 50 TB/day (2015)• Facebook has 36 PB of user data + 80-90 TB/day
(2013)• CERN’s LHC: 15 PB a year (~2015)• LSST: 6-10 PB a year (~2015)
How many female WWF fans under the age of 30 visited the Toyota
community over the last 4 days and saw a Class A ad?
How are these people similar to those that visited
Nissan?
Unstructured Text Stream in PB a day
What Your Big Data Stream Looks Like?
3
1. Data Cleaning/Extraction/Transformation
2. Data Staging/Processing
3. Data Mining Strategies: Data Modeling/ Validation
4. Data Visualization
Massively Parallel Processing Systems• Hadoop Based Multi Node Cluster: NoSQL Stack• Cloud Based Hadoop Cluster (20 – 2000 Nodes)Software: Automatic Parallel Execution in MapReduce
Analytic Parallel Data Warehouse Systems
Information Retrieval
∑∑
∑
==
==•=
•=
V
i i
V
i i
V
i ii
dq
dq
d
d
q
q
dq
dqdq
1
2
1
2
1),cos( r
r
r
r
rr
rrrr
Machine Learning: Neural Network, SVM, Classification
Database Research Based Methods:Multi Level Association Rule Mining
Statistics Based Methods ; Cluster
4
010002000300040005000600070008000
Pacific
…
Paris,
Lo
ndo
n,
Easte
rn…
Am
ste
rda
m,
Ath
en
s,
Ce
ntr
al…
Jakart
a,
Gre
en
lan
d,
Bang
ko
k,
Bra
sili
a,
Ha
waii,
Atla
ntic…
Arizona
,
Lju
blja
na
,
Beiji
ng,
Belg
rade
,
Ne
w D
elh
i,
Berlin
,
Topics Most Talked About on Nov 22, 2015
Regions Most Tweeted on Nov 22, 2015
Data Extraction/Transformation
Your data Tweets Looks like on Nov 22, 2015
5
Top Job titles recently listedlocations of jobs listed 1 day ago
Profile Headlines with Highest Connections
6
Tweets Data Stream on Nov 5, 2016 Tweets Topics on Nov 5, 2016
Leads to the Company Stock FallUnusual Negative Tweets on the Company
Unusual Cluster on the Company Name
7
Tweets Data Stream on Nov 13, 2016
Tweets Per Topic on Nov 13, 2016
8
Database Security on Cloud
Encrypting Database on Cloud for Retrieving the Sensitive Data Without Decrypting
Achieving Cyber Security with Big Data Analytics
Fraud Detection in Credit Card
Intrusion Detection in Systems with sensitive data
Machine Fault Detection
9
Annual Big Data Workshop at CSU Big Data Analytics Curriculum at EECS
Big Data Analytics Research Group
Math, Statistics and DatabasesBig Data Specific Processing TechniquesCloud Computing Massively Parallel Big Data Processing SystemsData Source ModelingData Mining Strategies
Data Driven solutions
President’s Advisory Committee for Center Of ExcellenceData AnalyticsCyber SecurityCloud Computing
top related