analyzing petabytes of smartmeter data using cloud bigtable, cloud dataflow, and bigquery

57
Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery Edwin Poot & Erik van Wijk, Energyworx Max Luebbe, Google

Upload: edwin-poot

Post on 20-Jan-2017

142 views

Category:

Technology


0 download

TRANSCRIPT

Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

Edwin Poot & Erik van Wijk, Energyworx

Max Luebbe, Google

2

ENERGY TRANSITION IN PROGRESS2

3

● rise of renewable energy sources

● regulation & market demands

● competition & increased costs

● intelligent devices in the home or along the utilities infrastructure (“Internet of Things”)

● two-way flow of information instead of one-way

● increase of consumption

4

1. increasing density brings increasing data quality problems

2. strict regulations for safeguarding user privacy

3. redistribution of economic power and energy demand

4. rising competition between distributed and central

5. innovation outpaces regulation

Top 5 industry challenges

www.energyworx.com

CHINA435 M

USA132 M

JAPAN58.7 MFRANCE

35 M

UK53 M

NL8 M

Italy32 M

Ontario4.7 M

British Columbia

1.2 M

Quebec3.8 M Germany

50 M

5

conventional utility systems cannot cope with this data diversity and endless stream of all types, shapes and sizes

smart meters

smart grid equipmentsensors

home automation

multichannel customer interactions

consumers’ usage behavior

weather

social

spatial

creating a single, centralized view of data – accessible to many, and for many use cases, that is the key to success

6

“We enable the energy evolution by uncovering and monetizing the hidden value of your data!”

ingest, process, analyze & learn

7

8

Enabling data-driven business models for the Energy & Utility industry since 2012

Offices in The Netherlands and in the United States,

Delivering a revolutionary data management & intelligence cloud

service disrupting the global Energy & Utilities market

Pushing out established vendors using pure play SaaS

Creating actionable information - sparking new

business concepts and models

Crunching data without being limited by scale,

speed and obsolete pricing models

9

generation

Meter Data Management

Renewable Energy Management

transmission trading distribution supply

Social EnergyConsumer Engagement

imbalancessettlements

Energy insights for wholesale connections

energyworx and the energy value chain

10

ENERGY INTELLIGENCE

ENERGY PROSUMERS & RETAILERS

Demand Response (price)

Energy Insights

Demand Response (load)

Grid InsightsRenewables Engagement

Gamification Benchmarking

Balancing Congestion

Optimization Anomalies

MARKETS & SOLUTIONS

ENERGY DATA MANAGEMENTMeter Data Management Energy Data Hub

ENERGY SYSTEM OPERATORS

11

● Always supporting the latest IoT products and/or equipment

● Protocol agnostic data ingestion and limitless computation capacity

● Cloud Machine learning to support new business concepts and models

● Pay as you grow SaaS model, so no large upfront investments

OUR ADVANTAGES

1212

Our platform

13

PLATFORM EVOLUTION HIGHLIGHTS

2012 2013 2014 2015 2016

- batched data- temporal aggregations - VEE- utility connectivity- API

- multi-tenancy- permissions- custom querying- grouping- tag properties

- datalabs (EDA)- Machine learning- CloudML- (A)DR

- streaming data- pseudonymisation- tagging- analytics- dynamic profiling- PayPerUse model

- IoT devices- many new adapters- performance- web console- Sheets addon

Data ingestion & management Insights & analysis Intelligence & IoT control

14

DELIVER A DATA MANAGEMENT & ANALYTICS SERVICE FOR ENERGY & UTILITY COMPANIES

PUBLIC

&

PRIVATECLOUD

15

1616

Big Data Challenges at Google

17

Google's mission to "organize the world’s information" presents new challenges.

18

Big Data technologies invented at Google

2012 20132002 2004 2006 2008 2010

GFS

MapReduce

Bigtable Colossus

Dremel Flume

Millwheel

1919

How do we … ?

20

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

21

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

Solution: GFS (replaced by higher-scale Colossus in 2010)

22

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

Solution: GFS (replaced by higher-scale Colossus in 2010)

Google Cloud Storage

23

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

… build a petabyte database?

24

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

Solution: Bigtable (internal service launched 2006)

… build a petabyte database?

25

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

Solution: Bigtable (internal service launched 2006)

Google Cloud Bigtable

… build a petabyte database?

26

Need: Ad hoc queries over massive quantities of data, in just seconds.

… query a trillion rows in seconds?

27

Need: Ad hoc queries over massive quantities of data, in just seconds.

Solution: Dremel

… query a trillion rows in seconds?

28

Need: Ad hoc queries over massive quantities of data, in just seconds.

Solution: Dremel

Google BigQuery

… query a trillion rows in seconds?

29

Need: Process petabytes of static and streaming data, quickly.

… build data-processing at Google scale?

30

Need: Process petabytes of static and streaming data, quickly.

Solution: MapReduce, Flume, and Millwheel

… build data-processing at Google scale?

31

Need: Process petabytes of static and streaming data, quickly.

Solution: MapReduce, Flume, and Millwheel

Google Cloud Dataflow

… build data-processing at Google scale?

3232

Imagine what one can build...

33

.. when scale is a solved problem.

34

Google Cloud Platform is the same infrastructure

Cloud Storage BigQuery Cloud DataflowCloud Bigtable

35

Cloud Bigtable is the same service Google uses

Cloud Bigtable

Bigtable Service

36

What is Cloud Bigtable?

NoSQL database for large datasets / large throughput

Supports sequential scans

Auto-adjusts to access patterns

37

Bigtable Node

Bigtable Node

Bigtable Node

How does Cloud Bigtable work?

Colossus Filesystem

Client Client Client Client Client Client

Processing

Storage

Clients

38

Node

Cloud Bigtable learns access patterns...

Filesystem

Node Node

Client Client Client Client Client Client

Processing

Storage

Clients

A B C D E

39

Node Node Node

… and rebalances data accordingly

Filesystem

Client Client Client Client Client Client

Processing

Storage

Clients

A B C D EB C

40

Throughput can be controlled by node count

Node Node Node

Nodes

80,000

60,000

40,000

20,000

QPS

Bigtable Nodes

864200

41

Throughput can be controlled by node count

400,000

300,000

200,000

100,000

QPS

Bigtable Nodes

4030201000

Nodes

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

42

Throughput can be controlled by node count

4,000,000

3,000,000

2,000,000

1,000,000

QPS

Bigtable Nodes

40030020010000

NodesNode Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

43

Years of engineering to...

Teach Bigtable to configure itself

Isolate performance from “noisy neighbors”

React automatically to new patterns, splitting and balancing

Cloud Bigtable

44

Google has had an internal cloud for over a decade

The same engineering that has made our internal services better makes our Cloud better:

Simpler control planes Multi-tenancy Adapts to large, new patterns

4545

Why we chose Google

46

Why did we choose

● Fastest with consistent performance

● Competitive and transparent pricing

● Autoscale to millions of users (and back)

● Unlimited flexible storage and caching

● Big Data & Machine Learning capabilities

● Development SDK & tools

● 24/7 access to expert support resources

47

5 things we’ve learned along the way

1 2 3 4 5

SKILLS, KNOWLEDGE &

TRAININGREQUIRED

IMPLEMENTATION TIME CODE

ABSTRACTION USING API’S

PAAS SANDBOX

IMPACT ON BUSINESS MODEL

understand all PaaS possibilities and components to

prevent reinventing what already exists

and speed-up implementation &

migration

shorter release cycles require smaller feature sets per release, adapt

your software development &

release management method

to be cloud agnostic you need code

abstraction layers per PaaS service

you use

design and modify your software

architecture to fit the PaaS sandbox

adapt your business model to PaaS cost

model

4848

Our service architecture

49

INGEST PROCESS ANALYZESTORE

App Engine

Cloud PubSub

App EngineCloud Storage

Datastore

Bigtable

BigQuery

Cloud SQL

Dataflow

Dataproc CloudML

Datalab

BigQuery

API

Events

Devices

Validate

Aggregate

Calculate

Timeseries

Metadata

Tags

Insights

Predict

Decide

50

Data Ingestion Process

Cloud PubSub DataFlow

IoT EquipmentBig Table

BigQuery

5151

Use cases

“Creating actionable insights - sparking new business

concepts and models. Crunching data without being

limited by scale, speed and obsolete pricing models.”

52

5353

Uncovering hidden value from data

54

• Classification• Clustering• Regression• Anomaly detection• Prediction/forecasting• Motif discovery• Association rules

Exploratory Data Analysis with Energyworx

Uncover hidden value from your data!

Features:- part of Energyworx SaaS- autoscaling with demand- notebook development

environment - private & public models- Energyworx shared models

5555

Demo: Clustering time series data from Smart Meters

5656

Q & A

5757

Thank you!