are we reaching a data science singularity? how cognitive computing is emerging from machine...

Post on 16-Apr-2017

146 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 Natalino Busa - @natbusa

Natalino BusaHead of Data Science Teradata

2 Natalino Busa - @natbusa

3 Natalino Busa - @natbusa

4 Natalino Busa - @natbusa

5 Natalino Busa - @natbusa

6 Natalino Busa - @natbusa

What about (data) science?

- technologies and tools are driving innovation in data analytics -

7 Natalino Busa - @natbusa

Man - Machineas cognitive systems

8 Natalino Busa - @natbusa

Learning: The Scientific Method

Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method

observation hypothesis deduction synthesis

Hans Christian Ørsted

experiment

Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY

9 Natalino Busa - @natbusa

Innovation in Data Analytics

Cloud Community AI & ML

10 Natalino Busa - @natbusa

Cloud

11 Natalino Busa - @natbusa

“we live in an age of open source datacenters, so we can stack all these things together and we have open source from the ground to ceiling.”

Sam Ramji, CEO of Cloud Foundry

https://www.youtube.com/watch?v=7oCSFcUW-Qk

12 Natalino Busa - @natbusa

Analytics in the cloud

Bare Metal: Physical Machines

IAAS: Virtual Resources

CAAS: Containers,

dPAAS: Datastores, Data Engines iPAAS: Tools Integration, Flows & Processes

DAAAS: Data Analytics as a Service

13 Natalino Busa - @natbusa

DAAAS: AI and ML API’s

Cloud Computing for Deep Neural Networks > Models, Compute (Train, Score), and Data

AI and ML models for:

● Speech (audio)● Language (text)● Vision (images/video)

● Data (classification, regression, clustering, anomaly detection)

14 Natalino Busa - @natbusa

Ephemeral Computing Clusters on a Cloud

data

create load compute storetimeline

destroy

15 Natalino Busa - @natbusa

dPaaS: Analytical clusters

Ephemeral

Short-Lived

Data Exploration

Isolated, Personal

Simple Access Management

Permanent

Long Lived

Production / Operations

Co-Ordinated

Complex Access Management

vs

16 Natalino Busa - @natbusa

GPU’s and Distributed ComputingGPU support is coming in Kubernetes, Mesos, Spark

https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpushttp://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark

out

up

CPUR,Python

SparkTensorFrames

17 Natalino Busa - @natbusa

Community

18 Natalino Busa - @natbusa

Community

Develop - Use - Share

19 Natalino Busa - @natbusa

Sharing is caring … speed

github.com + Jupyter notebooks, share ideas, code, and data

arxiv.orgshare innovation and scientific results

20 Natalino Busa - @natbusa

Artificial Intelligence Machine Learning

21 Natalino Busa - @natbusa

Google: open-sources NLP parserscoring 95% in grammar accuracy

https://github.com/tensorflow/models/tree/master/syntaxnet

22 Natalino Busa - @natbusa

Deep Learning in Language Parsing

https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png

23 Natalino Busa - @natbusa

Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vechttps://arxiv.org/pdf/1405.4053v2.pdfhttps://arxiv.org/pdf/1301.3781v3.pdf

24 Natalino Busa - @natbusa

Lip reading

LipNet achieves 93.4% accuracy,on GRID corpus.

https://arxiv.org/pdf/1611.01599v1.pdf

26 Natalino Busa - @natbusa

Ask me Anything

http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial

Dynamic Memory Networks for Natural Language Processinghttps://arxiv.org/pdf/1603.01417v1.pdf

http://www.socher.org/Local context

Wider context

NLP, Attention Masks

Semantic Embeddings from Text, Images

27 Natalino Busa - @natbusa

Network Traffic Patterns Classification

28 Natalino Busa - @natbusa

Network Intrusion Detection

http://billsdata.net/?p=105

It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release).

Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes

Techniques: TDA, Dimensionality Reductionhttps://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

29 Natalino Busa - @natbusa

Approaching (Almost) Any Machine Learning Problem- Abhishek Thakur, Kaggle Grandmaster -

data labels

raw data: tables, files Useful dataData munging Feature Engineering

Tabular Data ready for ML

http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

30 Natalino Busa - @natbusa

AutoML challenge

- based on scikit-learn- 15 classifiers, - 14 feature preprocessing methods- 4 data preprocessing methods- 110 hyperparameters

- Supervised classification challenge:100 different datasets

Natalino Busa - @natbusa

31 Natalino Busa - @natbusa

Artificial + Human Intelligence

32 Natalino Busa - @natbusa

Human cognitive biases :

Too much information

Not enough meaning

What should we remember?

Need to act fast

https://en.wikipedia.org/wiki/List_of_cognitive_biases

33 Natalino Busa - @natbusa

Man vs Machine cognitive limits

Model generation

Explanation

Unsupervised

Planning

Too much information

Not enough meaning

Need to act quickly

Memory limits

34 Natalino Busa - @natbusa

Theorems often tell us complex truths about the simple things, but only rarely tell us simple truths about the complex ones

Marvin MinskyK-Linesː A Theory of Memory (1980)

35 Natalino Busa - @natbusa

Data Science: wear the AI/ML LensesWe are entering a new era of intelligent machines

Boost our understanding of data

Focus on higher level analyses

36 Natalino Busa - @natbusa

Intelligent Data Systems:Long live the “database”

Wikipedia:A database is an organized collection of data.

DATA

New-SQL

ML

AI

SQL

Python - Scala - R

NLP

UX

Speech

COG

37 Natalino Busa - @natbusa

The Database.is never going to be the same.

38 Natalino Busa - @natbusa

Thank you.@natbusa

39 Natalino Busa - @natbusa

credits

40 Natalino Busa - @natbusa

bonus slides

top related