introduction to google cloud platform technologies

48
Chicago, October 19 - 22, 2010 How to analyze your data and take advantage of machine learning in your application Chris Schalk - Google

Post on 18-Oct-2014

4.002 views

Category:

Technology


5 download

DESCRIPTION

This is a presentation given by Google Developer Advocate Chris Schalk at Spring One 2GX on Oct 21st, 2010. It introduces Google Storage for Developers, Prediction API, and BigQuery.

TRANSCRIPT

Page 1: Introduction to Google Cloud platform technologies

Chicago, October 19 - 22, 2010

How to analyze your data and take advantage of machine learning in your application

Chris Schalk - Google

Page 2: Introduction to Google Cloud platform technologies

SpringOne 2GX 2009. All rights reserved. Do not distribute without permission.

Google’s new Cloud Technologies

Topics covered •  Google Storage for Developers •  Prediction API (machine learning) •  BigQuery

Page 3: Introduction to Google Cloud platform technologies

Google Storage for Developers Store your data in Google's cloud

Page 4: Introduction to Google Cloud platform technologies

What Is Google Storage?

•  Store your data in Google's cloud

o  any format, any amount, any time

•  You control access to your data

o  private, shared, or public

•  Access via Google APIs or 3rd party tools/libraries

Page 5: Introduction to Google Cloud platform technologies

Sample Use Cases

Static content hosting e.g. static html, images, music, video

Backup and recovery e.g. personal data, business records

Sharing e.g. share data with your customers

Data storage for applications e.g. used as storage backend for Android, AppEngine, Cloud based apps

Storage for Computation e.g. BigQuery, Prediction API

Page 6: Introduction to Google Cloud platform technologies

Google Storage Benefits

High Performance and Scalability Backed by Google infrastructure

Strong Security and Privacy Control access to your data

Easy to Use Get started fast with Google & 3rd party tools

Page 7: Introduction to Google Cloud platform technologies

Google Storage Technical Details

•  RESTful API  o  Verbs: GET, PUT, POST, HEAD, DELETE  o  Resources: identified by URI o  Compatible with S3 

•  Buckets  o  Flat containers 

•  Objects  o  Any type o  Size: 100 GB / object

•  Access Control for Google Accounts  o  For individuals and groups

•  Two Ways to Authenticate Requests  o  Sign request using access keys  o  Web browser login

Page 8: Introduction to Google Cloud platform technologies

Performance and Scalability

•  Objects of any type and 100 GB / Object •  Unlimited numbers of objects, 1000s of buckets

•  All data replicated to multiple US data centers •  Leveraging Google's worldwide network for data delivery

•  Only you can use bucket names with your domain names •  Read-your-writes data consistency •  Range Get

Page 9: Introduction to Google Cloud platform technologies

Security and Privacy Features

•  Key-based authentication •  Authenticated downloads from a web browser

•  Sharing with individuals •  Group sharing via Google Groups

•  Access control for buckets and objects •  Set Read/Write/List permissions

Page 10: Introduction to Google Cloud platform technologies

Demo

Page 11: Introduction to Google Cloud platform technologies

Demo

•  Tools: o  GS Manager o  GSUtil

•  Upload / Download

Page 12: Introduction to Google Cloud platform technologies

Google Storage usage within Google

Haiti Relief Imagery USPTO data

Partner Reporting

Google BigQuery

Google Prediction API

Partner Reporting

Page 13: Introduction to Google Cloud platform technologies

Some Early Google Storage Adopters

Page 14: Introduction to Google Cloud platform technologies

Google Storage - Pricing

o  Storage  $0.17/GB/Month

o  Network  Upload - $0.10/GB  Download

 $0.15/GB Americas / EMEA  $0.30/GB  APAC

o  Requests  PUT, POST, LIST - $0.01 / 1000 Requests  GET, HEAD - $0.01 / 10000 Requests

Page 15: Introduction to Google Cloud platform technologies

Google Storage - Availability

•  Limited preview in US currently o  100GB free storage and network from Google per

account o  Sign up for waitlist at http://code.google.com/apis/

storage/

•  Note: Non US preview available on case-by-case basis

Page 16: Introduction to Google Cloud platform technologies

Google Storage Summary

•  Store any kind of data using Google's cloud infrastructure •  Easy to Use APIs

•  Many available tools and libraries o  gsutil, GS Manager o  3rd party:

 Boto, CloudBerry, CyberDuck, JetS3t, and more

Page 17: Introduction to Google Cloud platform technologies

Google Prediction API Google's prediction engine in the cloud

Page 18: Introduction to Google Cloud platform technologies

Introducing the Google Prediction API

•  Google's sophisticated machine learning technology •  Available as an on-demand RESTful HTTP web service

Page 19: Introduction to Google Cloud platform technologies

How does it work?

"english" The quick brown fox jumped over the lazy dog.

"english" To err is human, but to really foul things up you need a computer.

"spanish" No hay mal que por bien no venga.

"spanish" La tercera es la vencida.

? To be or not to be, that is the question.

? La fe mueve montañas.

The Prediction API finds relevant features in the sample data during training.

The Prediction API later searches for those features during prediction.

Page 20: Introduction to Google Cloud platform technologies

A virtually endless number of applications...

Customer Sentiment

Transaction Risk

Species Identification

Message Routing

Legal Docket Classification

Suspicious Activity

Work Roster Assignment

Recommend Products

Political Bias

Uplift Marketing

Email Filtering

Diagnostics

Inappropriate Content

Career Counselling

Churn Prediction

... and many more ...

Page 21: Introduction to Google Cloud platform technologies

A Prediction API Example

Automatically categorize and respond to emails by language

•  Customer: ACME Corp, a multinational organization •  Goal: Respond to customer emails in their language •  Data: Many emails, tagged with their languages

•  Outcome: Predict language and respond accordingly

Page 22: Introduction to Google Cloud platform technologies

Using the Prediction API

1. Upload

2. Train

Upload your training data to Google Storage

Build a model from your data

Make new predictions 3. Predict

A simple three step process...

Page 23: Introduction to Google Cloud platform technologies

Step 1: Upload Upload your training data to Google Storage

•  Training data: outputs and input features •  Data format: comma separated value format (CSV)

"english","To err is human, but to really ..." "spanish","No hay mal que por bien no venga." ... Upload to Google Storage gsutil cp ${data} gs://yourbucket/${data}

Page 24: Introduction to Google Cloud platform technologies

Step 2: Train Create a new model by training on data

To train a model:

POST prediction/v1.1/training?data=mybucket%2Fmydata

Training runs asynchronously. To see if it has finished:

GET prediction/v1.1/training/mybucket%2Fmydata

{"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}}

Page 25: Introduction to Google Cloud platform technologies

Step 3: Predict Apply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

Page 26: Introduction to Google Cloud platform technologies

Step 3: Predict Apply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

{ data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}}

Page 27: Introduction to Google Cloud platform technologies

Step 3: Predict Apply the trained model to make predictions on new data

import httplib

header = {"Content-Type" : "application/json"}

#...put new data in JSON format in params variable conn = httplib.HTTPConnection("www.googleapis.com")conn.request("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict”, params, header)

print conn.getresponse()

An example using Python

Page 28: Introduction to Google Cloud platform technologies

Prediction API Capabilities

Data •  Input Features: numeric or unstructured text •  Output: up to hundreds of discrete categories

Training •  Many machine learning techniques •  Automatically selected •  Performed asynchronously

Access from many platforms: •  Web app from Google App Engine •  Apps Script (e.g. from Google Spreadsheet) •  Desktop app

Page 29: Introduction to Google Cloud platform technologies

Prediction API v1.1 - new features

•  Updated Syntax •  Multi-category prediction

o  Tag entry with multiple labels •  Continuous Output

o  Finer grained prediction rankings based on multiple labels •  Mixed Inputs

o  Both numeric and text inputs are now supported

Can combine continuous output with mixed inputs

Page 30: Introduction to Google Cloud platform technologies

Demo

Using the Prediction API to Predict a Cuisine

Page 31: Introduction to Google Cloud platform technologies

Google BigQuery Interactive analysis of large datasets in Google's cloud

Page 32: Introduction to Google Cloud platform technologies

Introducing Google BigQuery

•  Google's large data adhoc analysis technology o  Analyze massive amounts of data in seconds

•  Simple SQL-like query language •  Flexible access

o  REST APIs, JSON-RPC, Google Apps Script

Page 33: Introduction to Google Cloud platform technologies

Why BigQuery?

Working with large data is a challenge

Page 34: Introduction to Google Cloud platform technologies

Many Use Cases ...

Spam Trends

Detection

Web Dashboards Network Optimization

Interactive Tools

Page 35: Introduction to Google Cloud platform technologies

Key Capabilities of BigQuery

•  Scalable: Billions of rows •  Fast: Response in seconds

•  Simple: Queries in SQL

•  Web Service o  REST o  JSON-RPC o  Google App Scripts

Page 36: Introduction to Google Cloud platform technologies

Using BigQuery

1. Upload

2. Import

Upload your raw data to Google Storage

Import raw data into BigQuery table

Perform SQL queries on table 3. Query

Another simple three step process...

Page 37: Introduction to Google Cloud platform technologies

Writing Queries

Compact subset of SQL o  SELECT ... FROM ...

WHERE ... GROUP BY ... ORDER BY ... LIMIT ...;

Common functions o  Math, String, Time, ...

Statistical approximations o  TOP o  COUNT DISTINCT

Page 38: Introduction to Google Cloud platform technologies

BigQuery via REST

GET /bigquery/v1/tables/{table name}

GET /bigquery/v1/query?q={query}

Sample JSON Reply: { "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } } Also supports JSON-RPC

Page 39: Introduction to Google Cloud platform technologies

Security and Privacy

Standard Google Authentication •  Client Login •  OAuth •  AuthSub

HTTPS support •  protects your credentials •  protects your data

Relies on Google Storage to manage access

Page 40: Introduction to Google Cloud platform technologies

Large Data Analysis Example

Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z

Wikimedia Revision History

Page 41: Introduction to Google Cloud platform technologies

Using BigQuery Shell Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/

Page 42: Introduction to Google Cloud platform technologies

BigQuery from a Spreadsheet

Page 43: Introduction to Google Cloud platform technologies

BigQuery from a Spreadsheet

Page 44: Introduction to Google Cloud platform technologies

Recap

•  Google Storage o  High speed data storage on Google Cloud

•  Prediction API o  Google's machine learning technology able to

predict outcomes based on sample data

•  BigQuery o  Interactive analysis of very large data sets o  Simple SQL query language access

Page 45: Introduction to Google Cloud platform technologies

Further info available at:

•  Google Storage for Developers o  http://code.google.com/apis/storage

•  Prediction API o  http://code.google.com/apis/predict

•  BigQuery o  http://code.google.com/apis/bigquery

Page 46: Introduction to Google Cloud platform technologies

SpringOne 2GX 2010. All rights reserved. Do not distribute without permission.

Demo

Page 47: Introduction to Google Cloud platform technologies

SpringOne 2GX 2010. All rights reserved. Do not distribute without permission.

Q&A

Page 48: Introduction to Google Cloud platform technologies

SpringOne 2GX 2010. All rights reserved. Do not distribute without permission.

Thank You!

Chris Schalk Google Developer Advocate

http://twitter.com/cschalk