are we reaching a data science singularity? how cognitive computing is emerging from machine...

1 Natalino Busa - @natbusa

Natalino BusaHead of Data Science Teradata

https://twitter.com/natbusa


What about (data) science?

- technologies and tools are driving innovation in data analytics -



Man - Machineas cognitive systems



Learning: The Scientific Method

Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method

observation hypothesis deduction synthesis

Hans Christian Ørsted

experiment

Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY


https://en.m.wikipedia.org/wiki/History_of_scientific_method

https://en.m.wikipedia.org/wiki/History_of_scientific_method

https://en.m.wikipedia.org/wiki/Hans_Christian_%C3%98rsted

https://en.m.wikipedia.org/wiki/Hans_Christian_%C3%98rsted

http://www.flaticon.com/authors/gregor-cresnar

http://www.flaticon.com

http://creativecommons.org/licenses/by/3.0/


Innovation in Data Analytics

Cloud Community AI & ML



Cloud



“we live in an age of open source datacenters, so we can stack all these things together and we have open source from the ground to ceiling.”

Sam Ramji, CEO of Cloud Foundry

https://www.youtube.com/watch?v=7oCSFcUW-Qk





Analytics in the cloud

Bare Metal: Physical Machines

IAAS: Virtual Resources

CAAS: Containers,

dPAAS: Datastores, Data Engines iPAAS: Tools Integration, Flows & Processes

DAAAS: Data Analytics as a Service



DAAAS: AI and ML API’s

Cloud Computing for Deep Neural Networks > Models, Compute (Train, Score), and Data

AI and ML models for:

● Speech (audio)● Language (text)● Vision (images/video)

● Data (classification, regression, clustering, anomaly detection)



Ephemeral Computing Clusters on a Cloud

data

create load compute storetimeline

destroy



dPaaS: Analytical clusters

Ephemeral

Short-Lived

Data Exploration

Isolated, Personal

Simple Access Management

Permanent

Long Lived

Production / Operations

Co-Ordinated

Complex Access Management

vs



GPU’s and Distributed ComputingGPU support is coming in Kubernetes, Mesos, Spark

https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpushttp://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark

out

up

CPUR,Python

SparkTensorFrames


https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus

https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus

http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark

http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark


Community



Community

Develop - Use - Share



Sharing is caring … speed

github.com + Jupyter notebooks, share ideas, code, and data

arxiv.orgshare innovation and scientific results



Artificial Intelligence Machine Learning



Google: open-sources NLP parserscoring 95% in grammar accuracy

https://github.com/tensorflow/models/tree/master/syntaxnet





Deep Learning in Language Parsing

https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png





Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vechttps://arxiv.org/pdf/1405.4053v2.pdfhttps://arxiv.org/pdf/1301.3781v3.pdf


https://arxiv.org/pdf/1405.4053v2.pdf




http://billsdata.net/?p=108


Lip reading

LipNet achieves 93.4% accuracy,on GRID corpus.






Ask me Anything

Dynamic Memory Networks

for Natural Language

Processinghttps://arxiv.org/pdf/1603.01417v1.pdf

https://youtu.be/oGk1v1jQITw

Caiming Xiong, Stephen Merity, Richard Socher






https://arxiv.org/find/cs/1/au:+Xiong_C/0/1/0/all/0/1



https://arxiv.org/find/cs/1/au:+Merity_S/0/1/0/all/0/1



https://arxiv.org/find/cs/1/au:+Socher_R/0/1/0/all/0/1

https://arxiv.org/find/cs/1/au:+Socher_R/0/1/0/all/0/1


Ask me Anything

http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial

Dynamic Memory Networks for Natural Language Processinghttps://arxiv.org/pdf/1603.01417v1.pdf

http://www.socher.org/Local context

Wider context

NLP, Attention Masks

Semantic Embeddings from Text, Images






http://www.socher.org/

http://www.socher.org/


Network Traffic Patterns Classification



Network Intrusion Detection


It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release).

Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes

Techniques: TDA, Dimensionality Reductionhttps://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction




https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction


Approaching (Almost) Any Machine Learning Problem- Abhishek Thakur, Kaggle Grandmaster -

data labels

raw data: tables, files Useful dataData munging Feature Engineering

Tabular Data ready for ML

http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/


https://www.kaggle.com/abhishek





AutoML challenge

- based on scikit-learn- 15 classifiers, - 14 feature preprocessing methods- 4 data preprocessing methods- 110 hyperparameters

- Supervised classification challenge:100 different datasets

Natalino Busa - @natbusa




Artificial + Human Intelligence



Human cognitive biases :

Too much information

Not enough meaning

What should we remember?

Need to act fast

https://en.wikipedia.org/wiki/List_of_cognitive_biases





Man vs Machine cognitive limits

Model generation

Explanation

Unsupervised

Planning

Too much information

Not enough meaning

Need to act quickly

Memory limits



Theorems often tell us complex truths about the simple things, but only rarely tell us simple truths about the complex ones

Marvin MinskyK-Linesː A Theory of Memory (1980)



Data Science: wear the AI/ML LensesWe are entering a new era of intelligent machines

Boost our understanding of data

Focus on higher level analyses



Intelligent Data Systems:Long live the “database”

Wikipedia:A database is an organized collection of data.

DATA

New-SQL

ML

AI

SQL

Python - Scala - R

NLP

UX

Speech

COG


https://en.wikipedia.org/wiki/Data_(computing)


The Database.is never going to be the same.



Thank you.@natbusa



credits



bonus slides


are we reaching a data science singularity? how cognitive computing is emerging from machine...

Technology