energy issues in data analytics domenico talia carmela comito università della calabria &...

22
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy [email protected]

Upload: isabel-norris

Post on 11-Jan-2016

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Energy Issues in Data Analytics

Domenico TaliaCarmela ComitoUniversità della Calabria & [email protected]

Page 2: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Motivations for Taking Care of Data

Data is everywhere (Big, complex, real-time, unstructured)

Putting data at the center of research work on energy issues

may bring some benefits. (Today the focus is on algorithms).

Cost metrics of data management techniques

(communication, storing, access, query, analysis) will help

professionals and users to save energy in data-intensive

apps.

Energy-scalable data management is important for

sustainable data science.

2

Page 3: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Data Availability or Data Deluge?

• Every life process today is data intensive.

• The information stored in digital data archives is enormous and its size is still growing very rapidly.

3

Page 4: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Data Availability or Data Deluge?

• Some decades ago the main problem was the shortage of information, now the challenge is

• the very large volume of information to deal with and

• the associated complexity to process it and to extract significant and useful parts or summaries.

4

Page 5: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Complex Big Problems

• Bigger and more complex

problems must be solved

by using large-scale distributed

computing systems.

• DATA SOURCES are

larger and larger and ubiquitous

(Web, sensor networks, mobile

devices, telescopes, …).

5

Page 6: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

…andBig Data• Even where accessible, much

data in many fields cannot be read by humans

so

• The huge amount of data available today requires smart data analysys techniques to aid people to deal with it

and

• Scalable algorithms, techniques, and systems are needed (time and energy scalability).

6

Page 7: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Data: From Storing to Analysis

• Storing data is not the only main problem.

• A key issue is analyse, mine, and process data for making it useful.

Source: The Economist

7

Page 8: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Towards Models for Energy-aware Data Management

The main focus today is on energy-aware

algorithms, tasks, applications.

The other side of the coin is data and costs of

operating on it.

Abstract energy-cost models for exchanging, accessing

and transform data are primary elements for energy-

aware data management at large scale.

They are useful for sustainable data science.

8

Page 9: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

An Example:Energy-aware Mining of Data

We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices.

Our interest was mainly on how the same technique consumes energy when dimension of data change.

Tests with different

• Data set dimensions,

• Attribute number,

• Class number.

9

Page 10: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Data Mining Techniques Energy characterization of data mining techniques running on

mobile devices k-means (data clustering) J48 (data classification) Apriori (association rules)

Common performance parameters Number of instances (data set size) Number of attributes

Algorithm-specific performance parameters k-means: number of clusters J48: decision tree size Apriori: Number of rules, minimum support and minimum

confidence

10

Page 11: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

k-means (1) 11

Increasing the number of instances,with different produced clusters

Page 12: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

k-means (2) 12

Increasing the number of attributes with different produced clusters

Page 13: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Apriori (1) 13

Increasing the number of instances with different number of attributes

Page 14: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Apriori (2) 14

Increasing the data set size with different number of rules

Page 15: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Apriori (3)15

Increasing the data set size with different minimum confidence

Page 16: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

J48 16

Increasing the number of instances with different number of attributes

Page 17: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Results on different devices

Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM

17

Page 18: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Results on different devices18

Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB

RAM

Page 19: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Results on different devicesResults obtained with different smart phones

Sony Xperia P: 1 GHz Dual Core ARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM

19

Page 20: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Concluding Remarks

Data-intensive applications demands for energy cost models

based on data characteristics.

This should be done for sensors, smart phones, HPC servers,

and clouds. In general, for large scale computing systems.

Sustainible data center services and applications may benefit

from these models.

Preliminary experiments show useful data.

20

Page 21: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

Data Sets Census (http://archive.ics.uci.edu/ml/datasets/Census+Income)

Used with K-means Data set size: 14 MB Number of instances: 244348 Number of attributes: 11

Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with Apriori Data set size: 19 MB Number of instances: 333011 Number of attributes: 11

Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) Used with J48 Data set size: 14.5 MB Number of instances: 114556 Number of attributes: 55

21

Page 22: Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it

22

Method Algorithm Data Set

Size

RAM Memory (MByte)

Virtual Memory (MByte)

CPU (%)

Battery Charge

Depletion (mAh)

Energy Consumption

(J)

Time (sec)

Association Rules

Rule Induction

Apriori

CENSUS_DISC.arff

0,1 MB 15,86 95,19 96,92 0 0 6

0,2 MB 16,97 105,36 98,03 0 0 12

0,4 MB 18,06 104,95 98,24 0 0 26

0,8 MB 19,87 102,75 98,13 2,7 35,964 73

1,6 MB 23,32 103,99 96,87 13,5 179,82 300

3,2 MB 26,92 100,01 95,44 23,3 310,356 3960

6,4 MB --- --- --- --- --- ---

Classification

Trees J48

COVERTYPE.arff

0,1 MB 19,47 104,94 96,23 13,4 178,488 300

0,2 MB 20,15 104,92 98,21 29,8 396,936 540

0,4 MB 23,87 105,6 97,43 59,4 791,208 2040

0,8 MB 27,68 103,87 97,36 194,64 2592,6048 8160

1,6 MB --- --- --- --- --- ---

3,2 MB --- --- --- --- --- ---

6,4 MB --- --- --- --- --- ---

Clustering

Instance-based/La

zy Learning

K-Means

CENSUS.arff

0,1 MB 16,73 96,56 98,03 6,75 89,91 55

0,2 MB 17,95 102,05 97,65 8,1 107,892 150

0,4 MB 19,72 102,16 97,02 18,9 251,748 300

0,8 MB 23,08 101,86 97,97 18,9 251,748 600

1,6 MB 26,4 95,96 97,82 43,2 575,424 1320

3,2 MB --- --- --- --- --- ---

6,4 MB --- --- --- --- --- ---