data analytics in the domain of smart cities and e-government · 2 • smart cities and ict •...

295
Data Analytics in the domain of Smart Cities and e-Government Jose Aguilar CEMISID, Dpto. de Computación, Facultad de Ingeniería

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Data Analytics in the domain of

Smart Cities and e-Government

Jose Aguilar

CEMISID, Dpto. de Computación, Facultad de Ingeniería

Page 2: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

2

• Smart cities and ICT

• e-Government

• Introduction to Data Analytics

• Neighbor concepts:

• Business intelligent,

• Big data,

• Mining Problems

• Case study

Outline

Page 3: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

33/295

Goal

This tutorial will analyze the transformation of the cities and government due to data analytic. We will review the applications of data analytics to support smart cities. We will talk about the decision making

processes using data analytics, so that citizens, policy makers and businesses, can work together to

manage the life of the city. Additionally, we will discuss the transformation of the public service

provision model and, the utilization of data analytics to enable new forms of governance.

Page 4: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

In the last twenty years, the most innovative industries such as

aerospace and automotive, began to develop varying degrees of

automation in their activities/products

Now follow the environments common life of human beings, with

a high level of spatiotemporal integration of technologies in their community settings (home, school,

etc.).

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 5: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

We live in a society where our relationship with "hard

technologies" is constant

We communicate by phone with others, dishwasher in

our homes, we watch TV, our offices with computers,

vehicles, etc.

There is a long list of everyday objects, which are incorporated into our lives, almost

without realizing it…

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 6: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

6/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Smart cities can benefit of the Information and

communications technology (ICT) , however it needs

sophisticate mechanisms and appropriate software

technologies to collect, store, analyze and visualize the

data from the city environment and the citizens.

• The urban environments are of the main data

generators worldwide.

• In the context of smart cities, there is an abundance of

data that can be mined by applying data analytic

techniques.

Motivation

Page 7: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Motivation

• There are different data sources,

– traditional information held by public institutions (data about the traffic, the health care, etc.),

– the information generated by the citizens (using their smartphones, etc.),

– sensor systems on the city (camera networks, etc.),

– Internet of Things, etc.

• Data is stored and analyzed to define services that the world needs.

Data is the new gold, data has become one of the most precious treasures, because they can generate

information and knowledge.

Page 8: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Why?

Urban Mobility issues:

Environment and CO2 emmissions

Urban congestion:

Half of the world population is living in cities

in 2008

2020 climate and energy package

Accidents and safety

Freight distribution

Financial issues

Quality of life

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 9: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

9/295

Water System

Essential systems in which a city is based

Infrastructure Systems (health, education,

etc.).

Productive system of

entrepreneurship

Transportation system

Population

Energy system

Comunication system

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 10: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

CITIZEN INFRASTRUCTURE

ADMINISTRATION

Transformation is possible by harmonic integration of «infrastructure»,

«citizen» and «administration»

Page 11: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

A “city” that uses information and communications technologies to make the critical infrastructure components

and services of a city — administration, education, healthcare, public safety, real estate, transportation, and utilities— more

aware, interactive, and efficient.Forrester Research

11

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 12: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

12/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Definitions of Smart City

SMART CITY GOALS

• Achieve a sustainable development

• Increase the quality of life of its citizens

- Valuing usage above ownership

- Focusing on non-monetary values

- Having wider opportunities for work and study

- Overcoming restrictions of time and place

- Being both a consumer and a producer

• Achieve a sustainable development

- Managing the lifecycles of cities

- Improving economic performance over the entire Lifecycle

- Enhancing city competitiveness

“A smart sustainable city is an innovative city that uses information and communication technologies (ICTs) and other means to improve quality of life, efficiency of urban operation and services, and competitiveness, while ensuring that it meets the needs of present and future generations with respect to economic, social and environmental aspects.”

by Boyd Cohen :Smart cities use information and communication technologies (ICT) tobe more intelligent and efficient in the use of resources, resulting incost and energy savings, improved service delivery and quality of life,and reduced environmental footprint--all supporting innovation andthe low-carbon economy.

by ITU-T Focus Group on Smart Sustainable Cities

Page 13: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

13/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

RELATION BETWEEN OTHER

CONCEPTS AND THE SMART CITY MODEL

Page 14: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Many global initiatives related to enhancing the

capacities of cities to respond to the demands of

the future

smart cities, cities of knowledge, ...

IEEE initiative about “Smart Cities”

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 15: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

"Smart Cities" includes

• Smart Energy (Renewable generation & storage, Energy efficient in buildings

• intelligent Lighting, Smart grids, Irrigation remote control, etc)

• Smart Waste Management(Recycling of waste, residual management, Recovery of waste organics)

• Smart Living

• Smart Building & Home

• Smart Transportation/Mobility

• Smart Education(e-Education)

• Smart Governance(e-governance)

• Smart Medical Facility(e-Medical)

• Smart Communications

• Smart Economy (Innovation Centre, job-search resource centres, e-commerce )

• Smart Environment (environmental Information and alerts, Containers with sensors , Monitoring distribution networks)

• Smart People

Page 16: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

16/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

16

http://es.slideshare.net/giselledellamea/smart-city-ciudades-sostenibles-e-inteligentes

Smart Energy

Smart Public

Services

Smart PublicSafetySmart

Home / Office / Building

Smart Educatio

n

Smart Healthcar

e

Smart Transpor-tation

Page 17: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

17/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

The solution set working on a common infrastructure turn into initiatives

which vary by the industry

Smart HealthSmart Public

ServicesSmart Building

Smart Transportation

• Smart Care

Management

• Connected Health

• Smart Medicine

Supply

• Mobile Health

• Remote

Healthcare

Management

• Smart Citizen

Services

• Smart Tax Administration

• Smart Customs, Immigration, Border Management

• Smart Crime

Prevention

• Smart Emergency

Response

• Smart Financial

Management

• Energy

Optimization

• Asset

Management

• Facility

Management

• Video Surveillance

• Recycling and

Power Generation

• Automatic Fault

Detection

Diagnosis

• Supervisory

Control

• Audio / Video

Distribution

Management

Smart Education

• Smart Classroom

• Performance Man.

• Asset

Management

Smart Governance

• Participation

• Transparency:

Open Data, e-

municipality

• Public and social

services

Smart People

• Digital education

• Creativity

Specific Initiatives

Page 18: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Energy

Generation

networking

Efficiency

Environment

Buildings and

Infrastructure

Efficiency in

buildings

Urban

planification

Mobility:

Logistics

Mobility and

intermodality

Vehicles

and

alternative

fuels

Government

and social

services:

e-gov, tourist

destinations

Health

Management

, accessibility

ITC

Materials and sensors

Security

Page 19: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

19/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Impact of Transformation

Within 20 years, a smart city of 5

million can drive

+$

Revenue

+% Growth +% Energy

Efficiency

+K New

Jobs

Improved city management

Continious Economic Growth

Enhanced Quality of Life

Sustainable Urbanization

Page 20: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

20/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Smart

Buildings

Smart

Infrastructure

Smart Water

Management

Cyber-Security

and ResilienceEM

System

Climate

Change

Adaptation

Integrated

Management

Open Data

Different looks of SMART CITY

Page 21: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Remote communication services for Education & Healthcare

Different looks of SMART CITY

Page 22: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

FEMS example:-• Visualization

system interconnected with various production information such Monitoring & Control on a real time basis

Source: Toshiba Group

Energy System

Page 23: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Energy System

Source: Hitachi Group

Shared use of neighborhood facilities

Page 24: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

24/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Ubiquitous City: services

Source [Lee, Han, Leem y Yigitcanlar]

Page 25: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

25/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Sl No. Title Core Indicators

1 The long road to zero waste cities Solid waste collection & its recycling.

2 Economic indicators in the new smart city standard

City’s unemployment rate and population living in poverty.

3 Why education may be the most important smart city indicator of all?

students completing primary education & secondary education, student/teacher ratio.

4 Does your city's air quality measure up to the new smart city standard?

Particulate matter (PM2.5- PM10) concentration and Greenhouse gas emissions measured in tonnes per capita.

5 How debt, spending and tax collections add up in new smart city standard?

Debt service ratio.

6 Fire and emergency response indicators -- how safe is your city?

Number of firefighters, fire related deaths and natural disaster related deaths.

The SMART CITY standard: Dissecting ISO 37120

Page 26: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

The SMART CITY standard: Dissecting ISO 37120

Sl No. Title Core Indicators

7 How voting, women and corruption figure in the smart city standard

Voter participation in last municipal election and Women as a percentage of total elected to city-level office.

8 How healthy is your city? Average life expectancy, no. of in-patient hospital beds & no. of physicians, mortality rate.

9 How fun is YOUR city? None

10 How safe is your city? Number of police officers & homicides.

11 The homeless challenge cities face

City population living in slums.

12 What the new smart city standard says about energy?

Residential electrical use per capita (kWh / year), city population with authorized electrical service, energy consumption of public buildings per year (kWh / m2) and energy derived from renewable sources.

Page 27: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

27 272727

European SmartCities Project

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 28: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Transportation time:

Maximum travel time 30 minutes in small & medium size cities and 45 minutes in metropolitan areas and High frequency mass transport within 800 meters (10-15 minute walking distance)

Footpath:

Continuous unobstructed footpath of minimum 2 meter wide on either side of all street

Bicycle tracks:

Dedicated and physically segregation of bicycle tracks on all streets with carriageway more than 10 meters

Additional infrastructure:

95% of residences should have retail outlets, parks, primary schools & recreational areas within 400 meters walking distance

Water Management

100% household100% households should be connected to waste water network100% households are covered by daily door-step solid waste collection systemNo water logging incidents in a year

Electricity Supply100% metering of electricity supply100% recovery of cost100% of the city has wi-fi connectivity & 100 Mbps internet speed

Medical Facility:

30 minutes emergency response time for patients

Geospatial Information System (GIS) Services

Integration of Disaster Rescue Information Map Navigation

Benchmark for Smart Cities

Page 29: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

No. 1: COPENHAGEN

• Led the Siemens Green City Index for Europe

• One of the lowest carbon footprints/capita in theworld (less than two tons/capita).

• Aspire to achieve carbon neutrality by 2025

• All new buildings to be carbon neutral (greenbuilding ).

• Approximately 40% of all commutes are conductedby bike.

• The city also recently collaborated with MIT todevelop a smart bike equipped with sensors todeliver to provide real-time info to not only therider but also to administrators for open dataaggregation on issues of air contamination andtraffic congestion.

No. 2: AMSTERDAM

67% of all trips are done by cycling or walking.

First bike sharing project in the world was occurredin Amsterdam decades ago.

At present 40 smart city projects ranging fromsmart parking to the development of home energystorage for integration with a smart grid.

No. 3: VIENNA

The “Citizen Solar Power Plant“ being developedwith a goal of obtaining 50% of their energy fromrenewable sources by 2030.

Testing out a range of electric mobilitysolutions from expanding their charging networkfrom 103 to 440 stations by 2015.

Residents are sharing vehicle with neighbors.

No. 4: BARCELONA

Bike-sharing project with more than 6,000 bikes.

Using various sensors from noise and aircontamination to traffic congestion and even wastemanagement.

The life expectancy in Barcelona is among the highestof cities ( approx 83 years).

No. 5: PARIS•The city has more than 20,000 bikes for sharing.

•5% reduction in vehicle congestion in the city.

•The city partnered with Bolloré to create one of the world’s firstand most expansive EV car sharing programs.

•Autolib’ will soon have 3,000 EVs in its car sharing fleet.

•Paris’ ecosystem was rated 11th best in the world.

22

Source: A report prepared by Boyd Cohen,

The 5 SMARTEST CITIES in EU

Page 30: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

30/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Big DataBusiness

IntelligenceData Analytics M2M

Reporting

Internet of Things

CloudComputing

UCC

Broadband MobileBuilding

AutomationNext Gen

Device

Wireless Sensor Netw.

IT Security

E-cards

E-government

Smart city concept is founded on a set of solutions which are

combination of today’s standalone technologies

Combination of Technology

Page 31: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

31/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

31

• The management and analysis such data is a new challenger that can help to answer questions in governance, planning, etc. and support the decision making.

• Currently, there is a lack of data analytical frameworks for urban decision makers.

• Particularly, the field of smart city based on data analytics is quite broad, complex and is rapidly evolving.

A Smart City, as a “system of systems”, generates

vast amounts of data of energy, environment,

transport, socio-economic, among others.

Page 32: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

32/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• The complexity in the smart city data analytics is due to

– requirements of cross-thematic applications (energy, transport, etc.),

– multiple sources and type of data (unstructured, semi-structured or structured)

– Integrity of data.

• Some questions what data analytics can help to response about a smart city are:

– How is the behavior of the people in some places?

– How can predict the areas of dense traffic in the future?

– How are reached and leaved some sites during an especial event?

Page 33: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Data analytics has a crucial role in the smart city since it acts as the platform to discover information and knowledge, to understand how the city is functioning, etc.

• A smart city predicts, integrates, etc., specific incidents or events, with the end of improving the quality of life or informing to the citizens.

• For that, it requires:– Be instrumented to allow the collection of data about city life;

– Have mechanism for the aggregation of data from different sources;

– Have mechanism for the representation of the data;

– Have knowledge (detailed, in real-time) available about the city;

– Have automated city functions, to be delivered reliably, and effectively.

Page 34: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

organization of the resource management standards under the smart city environment.

Integrated smart city management

standards

Resources integration standards

Fusion processing

technical specifications

Management service technical specifications

Observation Process Metadata Standard

Observation Metadata Standard

Model Metadata Standard

Node Metadata Standard

Event Metadata Standard

Technical Specification for

Resource and Toponym Matching

Technical Specification for Resource and Map Fusion

Data Service Interface Specification

Model Service Interface Specification

Event Service Interface Specification

integrated management for smart sustainable cities

Page 35: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

35 35353535/295

produceconsume processstore

From the IT point of view, a city or urban area is a concentration of entities

(individuals and corporations-family, business, businesses, schools, institutions

...) that

information

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 36: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

36 36363636/295

Transmitters (channels) classic:

Oral transmissionAdvertisementsIn the posts, in the street, in the store.Loudspeakers, bells, sirens.MailTelephone faxNewspaper, books, magazinesRadioTV

Transmitters (channels) Recent:

digital television, Internet: Email, Chat, Twitter ...Internet phone (Skype)Google, search enginesWikipediaCell phoneSMS messages

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 37: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

37 373737

ProduceConsume ProcessStore

Much information generated by humans

• The information is heterogeneous: text, pictures, videos,…

• They are a good complement to the information collected

by sensors and smart devices

• Many applications are a combination of social networks,

smart devices and the cloud

• Recommendation systems books, games, ..

• Assistance systems in traffic, travel

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 38: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

38 383838

ProduceConsumeProcessStore

My data My results

Many data !!!

Many capacity !!

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 39: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

39 393939

Other data

Other data Results for

someone

Many data !!!This data could come from or

be on the Web

My data My results

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 40: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

40 40404040/295

Other data

Other data Results for

someone

My data My results

Many capacity !!

Part of my computing can be done outside in the cloud

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 41: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

41 41414141/295

ProduceConsume ProcessStore

data results

Classics

schoolslibraries

newspapersFiling of documents

Store magazines, books, etc.

Recent

hard drives, DVDs, CDs, ...digital song files, movies, photos

WikipediaYoutube

Web Repositories

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 42: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

E-Governance is the application of ICT for delivering

government services, exchange of information

communication transactions, integration of various stand-

alone systems and services as well as back office processes

and interactions within the entire government frame work.

“E-Government is an ongoing process of transformation of

Government towards the provision of government services

(information, transactions) through electronic means,

including access to government information and the

completion of government transactions on an ‘anywhere,

anytime’ basis.”PricewaterhouseCoopers

Page 43: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• e-Gov uses ICTs to improve and/or enhance on the efficiency and effectiveness of its services.

• ITC is used: to deliver more targeted information or better tailored to citizens, to increase the participation of the citizens, both in the service delivery as in the policy making, among other things.

The development of an efficient and effective e-government is a prerequisite for the development

of Smart Cities.

Page 44: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

44/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

e-government is the

transformation of government

to provide

Efficient

Convenient &

Transparent

Services

to

the Citizens & Businesses

through

ICTs

Particularly, e-Government (e-Gov) has

transformed interactions between

governments, citizens and other

members in the society.

e-Gov (also known as Internet government,

online government, etc.) consists of the

digital interactions between a

government and citizens, government

and businesses, government and

employees, among others.

Page 45: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

45/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

e-Government is not about computers & websites

but about citizens & businesses!

e-Government is not about translating processes

but about transforming processes !

Page 46: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

46/48

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Costs/

benefitsof public

sector IT

Computerisation: databases and back office automation

Benefit realisationeGov 1.0:

Online Service Delivery

eGov 2.0: Transformational Government

brief history of e-Government From automation to transformation

Page 47: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

?Citizen-centric

business model

Lower cost

Happier customers

Higher policy impact

Empowered citizens

Business

Customers

Channels

Technology

Business

Customers

Channels

Technology

Business

Customers

Channels

Technology

Business

Customers

Channels

Technology

Transformational Government

Page 48: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

48/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Governance

Way govt. worksSharing of

informationService delivery

Page 49: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

49/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

citizensGovt.

business

Government service delivery

Page 50: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

50/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Government To Citizen (G2C)ICT to enable Citizen convenience and participatione.g. SARS e-Filing, DoL Unemployment Fund submission (U-Filing), DHA Identity Tracking , Info Portals (Gateway, Web sites)

Government To Government (G2G)ICT to improve internal efficiency and admin of Governmente.g. Fin Accounting (BAS), Logistics, HR (PERSAL), Crime Administration (CAS), Population Register (NPR), Social Pension (SOCPEN), Health Admin (HIS, PaaB, Pharm), Education (NSC, ANA), Transport (eNatis)

Government To Business (G2B)ICT to serve Private Business, Industry and Tradee.g. CIPC Companies Register, Electronic Payments, SARS Customs

e-Government Domains

Page 51: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

51/48

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Costs/

benefitsof public

sector IT

Computerisation: databases and back office automation

eGov 1.0: Online Service Delivery

eGov 2.0: Transformational Government

Benefit realisation

Fragmented

Interoperable

Integrated

Citizen-focused

Citizen-enabled

TransformationAutomation PCMainframe Internet Cloud

“Governments are shifting from a government-centric paradigm to a citizen-centric paradigm”

Rethinking e-government services: user-centric approaches, OECD, 2009

Page 52: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

E-Government Transformational Government

Government-centric Citizen-centric

Supply push Demand pull

Government as sole provider of citizen

services

Government also as convener of multiple

competitive sources of citizen services

Unconnected vertical business silos New virtual business layer, built around

citizen needs, operates horizontally across

government

“Identity” is owned and managed by

government

“Identity” is owned and managed by the

citizen

Public data locked away within government Public data available freely for reuse by all

Citizen as recipient or consumer of services Citizen as owner and co-creator of services

Online services

IT as capital investment

Multi-channel service integration

IT as a service

Producer-led Brand-led

Bolting technology onto the existing business model of government

Focusing first on the business changes needed

to unlock benefits for citizens, and only then on

the technology

Page 53: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

E-Government Evolution

De

liveri

ng V

alu

e T

o C

itiz

en

s

Complexity of Implementation and Technology

Web

Presence

Agency web

sites provide

citizens with

information on

rules and

procedures

Limited

Interactions

Intranets link

departments

allowing for Email

contact, access to

online databases

& downloadable

forms

Transactions

Electronic

delivery of

services

automated.

Applications

include issue

of certificates

and renewal of

licenses

Transformation

Joined up

government. All

stages of

transactions

including payments

are electronic.

Applications include

government portals.

New models of

service delivery with

public private

partnerships

Page 54: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

54/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Is e-Gov always based on Internet?

NO !

The following forms are also e-Government

• Telephone, Fax, Mobile

• CCTV, Tracking Systems, RFID, Biometrics

• Smartcards

• Non-online e-Voting

• TV & Radio-based delivery of public services

Page 55: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

55/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

What do leading nations aim in eGov?

• Interactive Public Services

• Public Procurement

• Public Internet Access Points

• Broadband Connectivity

• Interoperability

• Culture & Tourism

• Secure G2G Communications

Page 56: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Government Gateway

Multiple access

channels

Portal

infrastructure

Common web services

Inter-operable

departmental systems

website

Local govt.

portals

Private sector

portals

E-Government Interoperability

Framework

• Registration and enrolment

• Authentication

• Secure e-mail

• Rules engine

• Circumstances and personalisation

• Payments

• Notifications

• Appointments

Life events

email Telephone

Internet

enabled deviceKioskInternet

site

Interactive

TV

E-Government IT Infrastructure

Page 57: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Government

Backbone

Network

(GNET)

Government Office

Automation (GOA)

Departmental

network

Central

Internet

Gateway

(CIG)

Certification

Authority

(CA)

Government

Communication

Network

(GCN)

Mail Service

Central

Cyber

Government

Office

(CCGO)

E-Government IT Infrastructure

Page 58: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

58/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

European Interoperability Framework v 2.0

Page 59: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Examples of e-Services – G2C

• Birth Certificate

• Health Care

• School Admission

• Scholarships

• e-Learning

• Examination Results

• Employment Services

• Vehicle Registration

• Driver’s License

• Passport/Visa

• Agriculture

• Land Record

• Property Registration

• Marriage Certificates

• Taxes

• Utility Services

• Municipality Services

• Pensions

• Insurance

• Health Care

• Death Certificate

Page 60: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

60/4295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• A single government portal that crosses ministerial and agencies & links to all other public websites.

• Local content production in key ministries and processes for regular updating.

• Computerized and web-enabled key processes.

• Legal and technical bases for transactions through the portal.

• Capacity for civil servants to facilitate such transactions.

Elements of E-Government

Page 61: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Public Information

Kiosks

Public

Computer

Facilities

Community

Cyber Points

Wide variety of channels to access

Page 62: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

62/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Applications

• Submission of tax return

• Renewal of driving and vehicle

licenses

• Registration as a voter

• Payment of Government fees

• Tourist information

Page 63: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Applications

Electronic procurement

• Electronic Tendering System

• Electronic Marketplace

• Electronic Product Catalogue

Page 64: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Department

Centric

Approach

Process

Orientation

Output-Based

Assessment

Departmental

View

Customer

Centric

Approach

Service

Orientation

Outcome-based

Assessment

Integrated

View

Principle # 1: e-Government is

about Transformation

Page 65: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

65/48

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

7 Areas of

Management

Process

Reform

Management

Resource

Management

Procurement

Management

Technology

Management

Knowledge

Management

Change

Management

Principle # 2: e-Government requires

A Holistic Approach

Program

Management

Page 66: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

66/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Pe

op

le

Pro

cess

Tech

no

log

y

Re

so

urc

es

e-Government

Principle # 2: e-Government requires

A Holistic Approach

Page 67: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

E-GOVERNMENT:

• E-government policy and strategy

• Government wide process reengineering and change

management

• Strategic applications such as unified citizens database

• Prioritized multi-year ICT investment program

SOCIETAL APPLICATIONS FUND:

• Low-cost technology solutions

• Scalable social and business models

• Local content industry promotion, multimedia

INFORMATION

INFRASTRUCTURE & ACCESS:

• Telecom & Internet policies

& regulation

• Rural access subsidy scheme

• Telecenters

HUMAN RESOURCES:

• Specialized ICT education

and training

• Use of ICT in education

LEADERSHIP, POLICY & INSTITUTIONS:

• Overall vision, e-laws

• ICT Agency

• CIOs in different ministries

• ICT industry promotion

Principle # 2: e-Government requires

A Holistic Approach

Page 68: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

•Lack of Process Models

•Status Quo-ism

•Poor Legal Frameworks

•Complex Procurement

1 PROCESS

•Lack of Political Will

•Official Apathy

•Shortage of Champions

•Lack of Skills in Govt

2 PEOPLE

•Lack of Architectures

•Lack of Standards

•Poor Communication

Infrastructure

•Hardware-approach

3 TECHNOLGY

•Budget Constraints

•Disinterest of Pvt Sector

•Lack Project Mgt Skills

4 RESOURCES

Principle # 3: e-Government requires us

to overcome A Number of Challenges

Page 69: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Principle # 4: e-Government necessitates

Change Management

• Senders & Receivers of communications must be in Sync

• Assess the levels of resistance & comfort

• Authority for change must be sufficient & continuous

• Value systems in the organization should support Chg Mgt

• Change should be of right quantum

• The ‘right’ answer is not enough

• Change is a process and not an event.

Page 70: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

70/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Principle # 4: e-Government necessitates

Change Management

1. Awareness of Change

2. Desire to Change

3. Knowledge of Skills

4. Ability to apply Knowledge

5. Reinforcement to Sustain Change

Page 71: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

71/48

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

•Policy Formulation

•Committing Resources

•Taking hard decisions

•Preparing Roadmaps

•Prioritization

•Frameworks, Guidelines

•Monitoring Progress

•Inter-agency Collaboration

•Funds Management

•Capacity Management

•Conceptualization

•Architecture

•Definition (RFP, SLA…)

Leadership & Vision

Program Development

Program Management

Project Development

Project Management•Bid Process Management

•Project Monitoring

•Quality Assurance

Principle # 5: e-Government

necessitates Capacity Building

Page 72: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Law & Policy-making– e-Government can be a catalyst for legal reform

– Wider & faster dissemination of laws

– Faster & better formulation of policies

• Better Regulation– Registration & Licensing - speedier

– Taxation – better revenues

– Environmental Regulations – better compliance

– Transportation & Police – more transparency

• More efficient Services to Citizens & Businesses– Better Image

– Cost-cutting

– Better targeting of benefits

– Control of corruption

– Improved accountability of politicians and civil servants.

Benefits to Government

Page 73: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Benefits to Citizens

• Cost and time-savings

• Certainty in getting services

• Higher penetration due to automation

• Increased participation of citizens in government decisions and actions.

• Better quality of life

• Ease of access of information

• Added convenience – multiple delivery channels

• Possibility of self-service

Page 74: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

74/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Benefits to Business

• Increased velocity of business

– E.g Tradenet of Singapore

• Ease of doing business with Government

– e-Procurement

• Better Investment climate

• Transparency

PROCESS RE-ENGINEERING –technology only a tool not

panacea

Page 75: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

75/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Again, data analytics comes to underpin their progress.

• Examples of possible application linked to data analytics and e-government:– E-participation: It is the ability of all citizens to communicate with one

another and with agencies or groups that represent them.

– Online Direct Democracy: It is based on to give citizens decision making power on social issues.

online collective decision making.

– Public Talent in Use: Smart cities must use crowdsourcing approach to support problem solving.

The crowdsourcing model develops the collective

intelligence of online communities.

– New Notions of Public Services. In a smart city, the citizens are partners. The key idea is that the pursuit of public ends is the responsibility of everybody.

Page 76: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Examples of possible application linked to data analytics and e-government:

– New Notions of Public Services. In a smart city, the citizens are partners. The key idea is that the pursuit of public ends is the responsibility of everybody.

the prosumer Era

– Constellation of active agencies and groups, where the governance and coordination can be constituted dynamically from bottom up.

In a smart city, the decentralization of governance is one of the main aspect.

– Government Cloud Data: Governments must go to the cloud computing to allow transparency and collaboration.

Cloud computing allows to cover the whole city with e-government solutions.

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 77: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

77 777777

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

It’s a Snake It’s a Spear

It’s a Bridge

It’s a Tree Trunk

It’s a Blanket

It’s a Python

Blindfolded Men Describe Elephant

Page 78: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

78 78787878/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Blindfolded Men Describe Business

Analytics

It’s Data Warehousing It’s BD

It’s Statistics

It’s Mathematical

Models

It’s BI

It’s Executive Dashboards

It’s Computer

Science

Page 79: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

79 797979

It is the science that examines raw data for the purpose of seeking

knowledge, draw conclusions, generate information, among other

things.

It is used in many areas:

• Industry to make better business decisions

• Science to verify existing models or theories.

...

Data is the new oil economy

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 80: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

80 80808080/295

Data analysis is not simply data mining

Data mining navigates through large datasets using

sophisticated software to identify patterns.

Data analysis focuses on inference to draw a conclusion

based on what is known, to discover hidden relationships

and establish

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 81: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Data analytics is the science of collecting, storing, extracting, cleansing, transforming, aggregating and

analyzing data, with the purpose of discovering information and knowledge.

• Data analytics is been used in different fields: finances, education, industry, etc.

• Analytics uses descriptive, identification and predictive models in order to produce knowledge from data, to be used to guide decision making.

• The high degree of datification embedded in a Smart City demands new tools and mechanisms for data manipulation and representation that facilitate the extraction of meaningful knowledge.

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Page 82: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

82 82828282/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Approach to de-synthesizing data, informational, and/or factual elements to answer research questions

• Method of putting together facts and figuresto solve research problem

• Systematic process of utilizing data to address research questions

• Breaking down research issues through utilizing controlled data and factual information

Page 83: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

83 838383

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Utilizing Data to Increase Shareholder Value

Data =

Big and Small

Internal and External

Structured and Non-structured

Traditional and “New”

“Free” and Purchased

Page 84: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

84 848484

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• With vast amounts of data now available, companies in almost every industry are focused on exploiting data for competitive advantage.

• In the past, firms could employ teams of statisticians, modelers, and analysts to explore datasets manually, but the volume and variety of data have far outstripped the capacity of manual analysis.

• At the same time, computers have become far more powerful, networking has become ubiquitous, and algorithms have been developed that can connect datasets to enable broader and deeper analyses than previously possible.

• The convergence of these phenomena has given rise to the increasing widespread business application of data science principles and data mining techniques.

The Ubiquity of Data Opportunities

Page 85: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

85 85858585/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

The Ubiquity of Data Opportunities

• Data mining is used for general customer relationship management to analyze customer behavior in order to manage attrition and maximize expected customer value.

• The finance industry uses data mining for credit scoring and trading, and in operations via fraud detection and workforce management.

• Major retailers from Walmart to Amazon apply data mining throughout their businesses, from marketing to supply-chain management.

• Many firms have differentiated themselves strategically with data science, sometimes to the point of evolving into data mining companies.

The primary goals of DA are to help you view business problems from a data perspective and understand principles of extracting

useful knowledge from data.

Page 86: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

86 868686

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Data can’t “talk”

An analysis contains some aspects of scientific

reasoning/argument:

* Define

* Interpret

* Evaluate

* Illustrate

* Discuss

* Explain

* Clarify

* Compare

* Contrast

Page 87: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

87 878787

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Goal of an analysis:

* To explain cause-and-effect phenomena

* To relate research with real-world event

* To predict/forecast the real-world

phenomena based on research

* Finding answers to a particular problem

* Making conclusions about real-world event

based on the problem

* Learning a lesson from the problem

Page 88: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

88 88888888/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

An analysis must have four elements:

* Data/information (what)

* Scientific reasoning/argument (what?

who? where? how? what happens?)

* Finding (what results?)

* Lesson/conclusion (so what? so how?

therefore,…)

Page 89: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

89 898989

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic guide to data analysis:

* “Analyse” NOT “narrate”

* Go back to research flowchart

* Break down into research objectives and

research questions

* Identify phenomena to be investigated

* Visualise the “expected” answers

* Validate the answers with data

* Don’t tell something not supported by

data

Page 90: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

90 90909090/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Shoppers Number

Male

Old

Young

6

4

Female

Old

Young

10

15

More female shoppers than male shoppers

More young female shoppers than young male shoppers

Young male shoppers are not interested to shop at the shopping complex

Page 91: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

91 919191

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• When analyzing:

* Be objective

* Accurate

* True

• Separate facts and opinion

• Avoid “wrong” reasoning/argument. E.g. mistakes in interpretation.

Page 92: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

92 929292

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• The success of analytics can only be measured in terms of how well they help achieve their strategic objectives

• So a managers role is to:

– Identify business goals

– Collect the data necessary to measure performance towards goals

– Analyze the data

– Draw conclusion based on the information generated

Managing using Analytics

Page 93: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

93 939393

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Data science involves principles, processes, and techniques for understanding phenomena via the (automated) analysis of data.

• Data science is in the context of various other closely related and data related processes in the organization.

• We can distinguish data science from other aspects of data processing that are gaining increasing attention in business.

Data science

Page 94: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

94 94949494/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of data, rather than purely on intuition.– For example, a marketer could select advertisements based purely

on his long experience in the field and his eye for what will work.

Or,

– he could base his selection on the analysis of data regarding how consumers react to different ads.

He could also use a combination of these approaches.

• DDD is not an all-or-nothing practice, and different firms engage in DDD to greater or lesser degrees.

Data-driven decision-making

Page 95: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

95 95959595/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

~ Exploratory Methods ~

This method often involves a lot of calculating averages and percentages, and displaying the information on a graph. Although Exploratory methods may provide many pieces of information, it may not answer specific questions or make definite statements about a problem.

~ Confirmatory Methods ~

This method is used to conclude the results of the survey and the statistical information by answering specific questions. For example, using a confirmatory method, a statistician can say “Oil Prices leaving Saudi Arabia has been increasing, and will increase in prices.”

Not one of these methods should be overlooked. Both methods should be used extensively to analyze the results of a statistical activity and will have to come to varieties of extremely specific conclusions with credibility and accuracy.

Analyzing the Data

Page 96: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

96 969696

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Quantitative and qualitative methods produce different types of data

– Quantitative data produces numerical values

– Qualitative data produces narratives

But for both quantitative and qualitative data, the same analytical strategies are

used for data interpretation

Page 97: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

97 97979797/48

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Descriptive Predictive Prescriptive

Questions What happened?What’s happening?What actions are needed?What exactly is the problem?What actions are needed?

Why is this happening?What will happen next?Why will it happen?

What should I do?Why should I do it?What’s the best that can happen?What if we try this?

Enablers • Ad hoc Reports• Dashboards• Data Warehousing• Alerts

• Data Mining• Text Mining• Web/Media Mining• Forecasting

• Optimization• Simulation• Decision Modeling• Randomized Testing

Outcomes Well defined business problems and opportunities

Accurate projections of the future states and conditions

Best possible business decisions and transactions

Analytics and Types of Questions

Page 98: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

98 98989898/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic analytical strategies:

Describing

Factoring

Clustering

Comparing

Classification

Finding commonalities

Finding covariation

Ruling out rival

explanations

Page 99: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

99 999999

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Narrative (e.g. laws, arts)

• Descriptive (e.g. social sciences)

• Statistical/mathematical (pure/applied sciences)

• Audio-Optical (e.g. telecommunication)

• Others

Most research analyses, arguably, adopt the first

three.

The second and third are, arguably, most popular

in pure, applied, and social sciences

Categories of data analysis

Page 100: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

100 100100100100/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Privacy

• Security

• Drawing decisions on incomplete data

• Drawing decisions on inaccurate data

• Using only data that supports our gut decisions

• Drawing the wrong conclusion from the data

– Stock prices example

Dangers in Analytics

Page 101: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

101 101101101

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Data mining

• Statistical analysis

• Predictive analysis

• Correlation

• Regression

• Forecasting

• Process Modeling

• Optimization

• Simulation

Analytic Tools

Two main categories:* Descriptive statistics* Inferential statistics

Page 102: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

102 102102102

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Use summary measures to describe central tendency of a distribution (mean, Mode, Median)

• For dispersion (variability) use standard deviation, variance, and range to tell you how spread out the data are about the mean.

• Count (frequencies)

• Percentage

• Mean (Sum of all values ÷ no. of values)

• Mode most frequent value)

• Median (middle value)

• Range

• Standard deviation

• Variance

• Ranking

Basic Descriptive Statistics

Page 103: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

103 103103103

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Frequency Distributions

To what extent did you increase your skills in

putting together a household budget?

A lot Some A little Not at all

Women (N=30) 14 9 5 2

Page 104: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

104 104104104104/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Percentage Distributions

To what extent did you increase your skills in

putting together a household budget?

A lot Some A little Not at all

Women (N=30) 46% 30% 17% 7%

Page 105: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

105 105105105105/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

40 50 55 94 100 100 100

40 92 93 94 95 96 98

Mean = 81

Mean = 87

Page 106: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

106 106106106106/4

8

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Two different bar graphs are made from the same survey of favorite foods:

Page 107: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

107 107107107

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Favorite Foods

Pizza

33%

Hot Dogs

34%

Hamburgers

33%

Pizza

Hot Dogs

Hamburgers

The same information can

be accurately presented in a

non-misleading way :

Page 108: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

108 108108108

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

A simple glance at this graph will

make us conclude that smoking

is the leading cause of death

among Americans. However, an

in-depth analysis of this graph

will easily tell us that it is greatly

misleading.

A person who smokes has died from a heart disease. What was

his cause of death?

Page 109: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

109 109109109

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Percentage of Smokers in Each Cause of Death

0%

20%

40%

60%

80%

100%

120%

AID

S

Alc

ohol

Mot

or V

ehic

le

Fires

Hom

icid

e

Illicit

Dru

gs

Suicide

Can

cer

Hea

rt D

isea

se

Percent of Smokers

Page 110: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

110 110110110110/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on

Jan. 1

$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

$16.00

1973 1974 1975 1976 1977 1978 1979

Years

Pric

e P

er B

arre

l

Page 111: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

111 111111111

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Price Per Barrel of Light Crude Oil Leaving Saudi

Arabia on Jan. 1

$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

$16.00

1973 1974 1975 1976 1977 1978 1979

Years

Pri

ce

s P

er

Ba

rre

l

- Another adequate way of fixing the graph, showing the gradual

increase in the oil prices effectively through a line graph.

Page 112: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

112 112112112

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Graphing comparisons

Satisfaction with Services

0

5

10

15

20

25

30

35

40

A B C D E

Clinic Name

Sati

sfa

cti

on

Sco

re

Page 113: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

113 113113113113/295

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Basic Descriptive Statistics

Satisfaction with Services

0

2

4

6

8

10

12

14

16

A B C D E

Clinic

Sati

sfa

cti

on

Sco

re

Staff

Advice

Facility

Page 114: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

A main aspect in data analytic is the capability of measuring and mining urban data.

• There are a lot of city spatio-temporal data, available in various forms, about a lot of our activities (human mobility data, etc.).

• To interpret such data, there is a variety of data mining and visualization techniques.

• Particularly, smart cities need reality mining, which concerns pervasive sensing in the social systems using ubiquity technology (for example, smartphones).

Page 115: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Using reality data mining and network analysis, we are able to:

– Produce models and methods, using urban data with different spatio-temporal scales.

– Develop services that ensure equity, fairness and a better quality of city life.

– Explore the notion of the city as a laboratory for innovation.

– Enhance mobility for city populations.

– Produce new forms of urban governance and organization.

A real-time analytic is very important in a smart city, in order to create a

catalogue of the behaviors in a city.115/295

Page 116: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

• Collecting the right data (historical perspective)

• Developing a Data Warehouse (all data in one place

• Having a staff to analyze the data

• Managers that understand the business & embrace managing by the numbers

What it takes to succeed using this

technique?

Page 117: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts

Case study

Big data, data mining tools, business

intelligence platforms, open data, Internet of

things (IoT), ubiquitous sensor networks,

among others, are essentials in a data

analytics infrastructure.

Page 118: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

90% of the world’s Data has been generated since 2010 , Big Data combines data from human and

computer, everyday, we create 2.5 quintillion bytes of data

• Big data defines a collection of data so large, complex and rapidly changing, which becomes difficult to process with traditional data processing systems.

• Big data include capture, curation, storage, search, sharing, transfer, analysis and visualization of the data.

118/295

Page 119: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Page 120: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Social media and networks

(all of us are generating data)Scientific instruments

(collecting all sorts of data)

Mobile devices

(tracking all objects all the time)

Sensor technology and

networks

(measuring all kinds of data)

• The progress and innovation is no longer hindered by the ability to collect data

• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion

Who’s Generating Big Data

120/295

Page 121: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Where Is This “Big Data” Coming From ?

12+ TBs

of tweet data

every day

25+ TBs

of

log data

every day

? T

Bs

of

data

every

day

2+

billionpeople

on the

Web by

end 2011

30 billion RFID

tags today

(1.3B in 2005)

4.6

billioncamera

phones

world

wide

100s of

millions

of GPS

enableddevices

sold

annually

76 million smart

meters in 2009…

200M by 2014

Page 122: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming

data

120/295

Page 123: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Type of Data

• Relational Data (Tables/Transaction/Legacy Data)

• Text Data (Web)

• Semi-structured Data (XML)

• Graph Data– Social Network, Semantic Web (RDF), …

• Streaming Data – You can only scan the data once

Page 124: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

What to do with these data?

• Aggregation and Statistics

– Data warehouse and OLAP

• Indexing, Searching, and Querying

– Keyword based search

– Pattern matching (XML/RDF)

• Knowledge discovery

– Data Mining

– Statistical Modeling

Page 125: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Big Data Is New

Big Data Is Only About Massive Data Volume

Big Data Means Hadoop

Big Data Need A Data Warehouse

Big Data Means Unstructured Data

Big Data Is for Social Media & Sentiment Analysis

The Myth About Big Data

125/295

Page 126: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• No single standard definition…

“Big Data” is data whose scale, diversity, and

complexity require new architecture,

techniques, algorithms, and analytics to manage

it and extract value and hidden knowledge from

it…

126/295

Page 127: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Page 128: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

- Government

In 2012, the Obama administration announced the Big Data Research

and Development Initiative

84 different big data programs spread across six departments

- Private Sector

- Walmart handles more than 1 million customer transactions every hour,

which is imported into databases estimated to contain more than

2.5 petabytes of data

- Facebook handles 40 billion photos from its user base.

- Falcon Credit Card Fraud Detection System protects 2.1 billion active

accounts world-wide

- Science

- Large Synoptic Survey Telescope will generate

140 Terabyte of data every 5 days.

- Large Hardon Colider 13 Petabyte data produced in 2010

- Medical computation like decoding human Genome

- Social science revolution

- New way of science (Microscope example)

Importance of Big Data

Page 129: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Data Analysis prediction for US 2012 Election

Drew Linzer, June 2012

332 for Obama,

206 for Romney

Nate Silver’s, Five thirty Eight blog

Predict Obama had a 86% chance of winning

Predicted all 50 state correctly

Sam Wang, the Princeton Election Consortium

The probability of Obama's re-election

at more than 98%

media continue reporting the race as very

tight

Usage Example in Big Data

Page 130: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

- Ad-hoc querying and reporting

- Data mining techniques

- Structured data, typical sources

- Small to mid-size datasets

- Optimizations and predictive analytics

- Complex statistical analysis

- All types of data, and many sources

- Very large datasets

- More of a real-time

What’s driving Big Data

130/295

Page 131: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Big Data ExplorationFind, visualize, understand all big data to improve decision making

Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources

Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time

Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency

Operations AnalysisAnalyze a variety of machinedata for improved business results

The 5 Key Big Data Use Cases

Page 132: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Big data is more real-time in nature than traditional DW applications

• Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps

• Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps

Value of Big Data Analytics

132/295

Page 133: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Page 134: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Volume

of Tweets

create daily.

12+terabytes

Variety

of different

types of data.

100’sVeracity

decision makers trust

their information.

Only 1 in 3

trade events

per second.

5+million

Velocity

Page 135: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Data Volume– 44x increase from 2009 2020– From 0.8 zettabytes to 35zb

• Data volume is increasing exponentially

Exponential increase in

collected/generated data

1-Scale (Volume)

135/295

Page 136: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Various formats, types, and structures• Text, numerical, images, audio, video,

sequences, time series, social media data, multi-dim arrays, etc…

• Static data vs. streaming data • A single application can be

generating/collecting many types of data

To extract knowledge all these types of data need to linked together

2-Complexity (Varity)

Page 137: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Data is begin generated fast and need to be processed fast

• Online Data Analytics

• Late decisions missing opportunities

• Examples– E-Promotions: Based on your current location, your purchase history,

what you like send promotions right now for store next to you

– Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction

3-Speed (Velocity)

Page 138: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Challenges in Handling Big Data

• The Bottleneck is in technology– New architecture, algorithms, techniques are needed

• Also in technical skills– Experts in using the new technology and dealing with big data

Page 139: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Big Data Integration is Multidisciplinary

Less than 10% of Big Data world are genuinely relational

Meaningful data integration in the real, messy, schema-less

and complex Big Data world of database and semantic web

using multidisciplinary and multi-technology methode

The Billion Triple Challenge

Web of data contain 31 billion RDf triples, that 446million of

them are RDF links, 13 Billion government data, 6 Billion

geographic data, 4.6 Billion Publication and Media data, 3 Billion

life science data

BTC 2011, Sindice 2011

The Linked Open Data Ripper

Mapping, Ranking, Visualization, Key Matching, Snappiness

Demonstrate the Value of Semantics: let data integration drive

DBMS technology

Large volumes of heterogeneous data, like link data and RDF

Other Challenges in Big Data

Page 140: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

140

Page 141: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Big Data Technology

141/295

Page 142: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Hadoop NoSQL Databases Analytic Databases

Hadoop

• Low cost, reliable

scale-out architecture

• Distributed computing

Proven success in

Fortune 500

companies

• Exploding interest

NoSQL Databases

• Huge horizontal scaling

and high availability

• Highly optimized for

retrieval and appending

• Types

• Document stores

• Key Value stores

• Graph databases

Analytic RDBMS

• Optimized for bulk-load

and fast aggregate

query workloads

• Types

• Column-oriented

• MPP

• In-memory

Main Big Data Technologies

142/295

Page 143: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Hadoop Distributed File System (HDFS)

– Massive redundant storage across a commodity

cluster

• MapReduce

– Map: distribute a computational problem

across a cluster

– Reduce: Master node collects the answers to

all the sub-problems and combines them

• Many distros available

US and Worldwide: +1 (866) 660-7555 | Slide

Hadoop Core Components

Page 144: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Apache Hive

Apache Pig

Apache HBase

Sqoop

Oozie

Hue

Flume

Apache Whirr

Apache Zookeeper

SQL-like language and

metadata repository

High-level language

for expressing data

analysis programs

The Hadoop database.

Random, real -time

read/write access

Highly reliable

distributed

coordination service

Library for running

Hadoop in the cloud

Distributed service for

collecting and

aggregating log and

event data

Browser-based

desktop interface for

interacting with

Hadoop

Server-based

workflow engine for

Hadoop activities

Integrating Hadoop

with RDBMS

Major Hadoop Utilities

144/295

Page 145: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Hadoop & Databases

Page 146: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Hadoop DB – A Hybrid Approach[Abouzeid et al., VLDB 2009]

• An architectural hybrid of MapReduce and DBMS technologies

• Use Fault-tolerance and Scale of MapReduce framework like Hadoop

• Leverage advanced data processing techniques of an RDBMS

• Expose a declarative interface to the user

• Goal: Leverage from the best of both worlds

Page 147: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Architecture of HadoopDB

EDBT 2011 Tutorial

Page 148: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

EDBT 2011 Tutorial

Architecture of HadoopDB

148/295

Page 149: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

MapReduceRaw Input: <key, value>

MAP

<K2,V2><K1, V1> <K3,V3>

REDUCE

Implementation of Big Data

Page 150: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Automatic Parallelization:– Depending on the size of RAW INPUT DATA instantiate

multiple MAP tasks– Similarly, depending upon the number of intermediate <key,

value> partitions instantiate multiple REDUCE tasks

• Run-time:– Data partitioning– Task scheduling– Handling machine failures– Managing inter-machine communication

• Completely transparent to the programmer/analyst/user

MapReduce Advantages

150/295

Page 151: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Google MapReduce (2004)– Jeffrey Dean et al. MapReduce: Simplified Data Processing on

Large Clusters. OSDI 2004.

• Apache Hadoop (2005)– http://hadoop.apache.org/– http://developer.yahoo.com/hadoop/tutorial/

• Apache Hadoop 2.0 (2012)– Vinod Kumar Vavilapalli et al. Apache Hadoop YARN: Yet

Another Resource Negotiator, SOCC 2013.– Separation between resource management and computation

model.

MapReduce Model

Page 152: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Google MapReduce

Worker

WorkerWorker

Worker

Worker

(1) fork (1)

fork

(1) fork

Master(2)

assig

n

map

(2)

assi

gn

redu

ce(3) read (4) local

write

(5) remote

read

Output

File 0

Output

File 1

(6) write

Split 0Split 1Split 2

Input files

Mapper: split, read,

emit intermediate

KeyValue pairs

Reducer:

repartition, emits

final output

User

Program

Map phaseIntermediate files

(on local disks)Reduce phaseOutput files

Page 153: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• MapReduce works with a single data source (table):– <key K, value V>

• How to use the MR framework to compute:– R(A, B) S(B, C)

• Simple extension (proposed independently by multiple researchers):– <a, b> from R is mapped as: <b, [R, a]>– <b, c> from S is mapped as: <b, [S, c]>

• During the reduce phase:– Join the key-value pairs with the same key but different relations

Two-way Joins and MapReduce

Page 154: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Beyond Two-way Joins?

• How to generalize to:

– R(A, B) S(B, C) T(C,D)

in MapReduce?

Page 155: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

U

Beyond Two-way Joins

155/295

Page 156: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Beyond Two-way Joins?

156/295

Page 157: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• Big Data Models and Algorithms – Foundational Models– Algorithms and Programming Techniques– Analytics and Metrics– Representation Formats for Multimedia Big Data

• Big Data Architectures – Big Data as a Service– Cloud Computing Techniques for Big Data– Big Data Open Platforms– Big Data in Mobile and Pervasive Computing

• Big Data Search and Mining– Algorithms and Systems for Big Data Search– Distributed, and Peer-to-peer Search– Machine learning based on Big Data– Visualization Analytics for Big Data

Challenges (1)

Page 158: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Challenges (2)

• Big Data Management – Big Data Persistence and Preservation– Big Data Quality and Provenance Control– Management Issues of Social Network Big Data

• Big Data Protection, Integrity and Privacy – Models and Languages for Big Data Protection– Privacy Preserving Big Data Analytics– Big Data Encryption

• Security Applications of Big Data – Anomaly Detection in Very Large Scale Systems– Collaborative Threat Detection using Big Data Analytics

158/295

Page 159: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

• In general, data analytics and Big data imply:– A Distributed data architecture, where data can be stored and

analyzed in real time.

– A High-performance computing capability embedded in the architecture to filter and analyze data.

– A set of services for the operational and policy decision making, based on data analytics.

• Big data and data analytics make possible – To better understand the city: What? Where? Who? How?

– To anticipate: • On short term: traffic congestion, risks due to weather events …

• On long term: needs of infrastructures, needs of schools …

– To get the real, clear and understandable indicators (scoreboards) to the attention of the mayor.

Page 160: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BD

Case study

Data Ingestion

Manipulation

Integration

Enterprise &

Ad Hoc Reporting

Data Discovery

Visualization

Predictive Analytics

RelationalHadoop NoSQLAnalytic

Databases

Pentaho Big Data Analytics

Complete Big Data Analytics &

Visual Data Management

160/295

Page 161: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Business Intelligence (BI)

• software that searches vast amounts of data to derive information for improved decision making

A management decision support

framework that empowers business users

to understand data => resulting in

actionable insights that improve the

business.

Page 162: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Business Intelligence enables the business to make intelligent, fact-based decisions

Aggregate

Data

Database, Data Mart, Data

Warehouse, ETL Tools,

Integration Tools

Present

Data

Enrich

Data

Inform a

Decision

Reporting Tools,

Dashboards, Static

Reports, Mobile Reporting,

OLAP Cubes

Add Context to Create

Information, Descriptive

Statistics, Benchmarks,

Variance to Plan or LY

Decisions are Fact-based

and Data-driven

161/295

Page 163: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

scal

e

scale

1990’s2000’s 2010’s

https://www.google.de/search?q=evolution+of+business+intelligence&newwindow=1&tbm=isch&tbo=u&source=univ&sa=X&ei=gE

GoU5KXBuTb4QSGsoH4BQ&ved=0CDsQsAQ&biw=1366&bih=64

The Evolution of Business Intelligence

Page 164: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Business Intelligence Process

164/295

Page 165: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

(source Heinz, 2014)

165/295

Page 166: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Strategic

Tactical

Operational

High direction

Managers

Personaloperating

Business Intelligence

ERP

Strategy

Day to day

Page 167: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Tactical /

Strategic BI

What’s the best that can happen?

What will happen next?

What if these trends continue?

Why is this happening?

What actions are needed?

Where exactly is the problem?

How many, how often, where?

What happened?

Sophistication of Intelligence

Operational BI

Optimization

Predictive Modeling

Forecasting/extrapolation

Statistical analysis

Alerts

Query/drill down

Ad hoc reports

Standard reports

Co

mp

eti

tive A

dvan

tag

eWhy do companies need BI?

Page 168: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

source Heinz, 2014

168/295

Page 169: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Data Analysis and

Data Mining

Business Modeling

Knowledge

Management

“Actionable” Information

Report

Warehouse

And

Document

Mart

Data

Warehouse

And

Data Marts

Business

Intelligence

PROJECT MANAGMENT

Decision Making

Page 170: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Originally a term coined by the Gartner

Group in 1993, Business Intelligence (BI) is a

broad range of software and solutions aimed at

collection, consolidation, analysis and

providing access to information that allows

users across the business to make better

decisions.

The technology includes software for

database query and analysis, multidimensional

databases or OLAP tools, data warehousing and

data mining, and web enabled reporting

capabilities.

Applied across disciplines but especially in

Customer Relationship Management (CRM),

Supply Chain Management (SCM) Enterprise

Resource Planning

Provide better, faster and more accessible

reports

170/295

Page 171: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Core Capabilities of BI

Page 172: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• Improve Management Processes– planning, controlling, measuring and/or

changing resulting in increased revenues and reduced costs

• Improve Operational Processes– fraud detection, order processing, purchasing..

resulting in increased revenues and reduced costs

• Predict the Future

Benefits of Business Intelligence

Page 173: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Stages in Business Intelligence

Page 174: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

DETERMINE approach and ACQUIREsoftware

THE NEED for Business Intelligence

FOCUS on user adoptionto ensure success!

EXECUTEbased upon selection

EXPAND to new areas within your organization

Implement successful

Business Intelligence Strategy…

EXPANSIONADOPTIONIMPLEMENTSELECTJUSTIFY

174/295

Page 175: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

»Customer/Product Profitability

»More competitive pricing

» Improved customer loyalty

» Integration of sales, delivery billing and AR

Justify BIBusiness Intelligence Benefit OPPORTUNITY

»Real time views across business processes

»Real time alerts to operational problems

»Trend analysis on Inventory & AR

»Real time information for direct customer interaction

»Executive dashboards

»Consistent use of KPI’s

»Real time access to data

»Fewer silos between apps

»Reduced data entry

»Reduced report development costs

»Reduced error Processing

»More efficient administrative and processes

Business

Intelligence

175/295

Page 176: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

BI SelectionOther

Considerations:

Real time vs. nightly refresh

Other data sources

Speed of implementation

Source system upgrades (JDE or PS)

Impact on Production system

Cross-module reporting

TIME

Direct Connect with Adapters

Enterprise BI with Pre-Built Data Warehouse

Enterprise BI No Data Warehouse

ERP

Value

Value

Value

Page 177: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

BI Implementation

YOU HAVE TO HAVE A PLAN

RewardsExecutionExpectationsRequirements

Page 178: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

YOU HAVE TO HAVE A PLAN

BI Implementation

RewardsExecutionExpectationsRequirements

A successful BI implementation involves:

• Gathering

requirements

• Training

• Planning

• Project

management

Page 179: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

YOU HAVE TO HAVE A PLAN

BI Implementation

RewardsExecutionExpectationsRequirements

that EVERY issue

will be solved

Don’t set the expectation

Page 180: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

YOU HAVE TO HAVE A PLAN

BI Implementation

• Define the path

• Identify the team

• Keep deliverables

set to 4–6 weeks

Define, identify, deliver

RewardsExecutionExpectationsRequirements

180/295

Page 181: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

YOU HAVE TO HAVE A PLAN

BI Implementation

The upside potential

in cost savings FAR

outweighs the

acquisition cost

Rewards:

Tie to results

RewardsExecutionExpectationsRequirements

181/295

Page 182: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Executive Sponsor must help drive BI

BI should be accessible to ALL levels of the

Organization

Training

Establish a BI Center of Excellence

Ensure their BI solution is rolled out to their entire End User community

BI Adoption

Page 183: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Data Warehousing

It is, of a complete file of an organization,

beyond the transactional and operational

information stored in a database

designed to facilitate efficient analysis

and dissemination of data.

182/295

Page 184: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Why Data Warehouse?

Missing data: Decision support requires historical data

which operational DBs do not usually have

Data Consolidation: It requires consolidation

(aggregation, summary) of data from heterogeneous

sources

Data quality: different sources typically use

inconsistent data representations, codes and formats

that must be reconciled, etc.

Page 185: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• Large Database

• Subject-Oriented

• Integrated

• Time-Variant

• Nonvolatile

• User-Friendly Interface

Data Warehouse

185/295

Page 186: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Data Warehouse System

Oper-

ational

DB

Other

DB

External

DB

Data

Ware-

house

Reporting

Data Mining

KM

Expert

Page 187: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Client Client

Warehouse

Source Source Source

Query & Analysis

Integration

Metadata

Page 188: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Data MartLarge amounts of data in the Data

Warehouse sometimes subdivided into

smaller logical drives (data marts)

Page 189: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• Mini data warehouses

• Hold subsets of data from the data warehouse

• Data focuses on a specific aspect of the company

Data Mart

Page 190: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

E

T

L

190/295

Page 191: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

E

T

L

192/295

Page 192: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

193/295

Page 193: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Capture / Extract... obtains a subset of the data

sources to load the DW

Source Heinz 2013

Page 194: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Cleanse... uses pattern recognition and AI

technologies to improve data quality

sourceHeinz 2013

195/295

Page 195: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Transform… converts data from relational

databases to format DW

Source Heinz 2013

Page 196: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Load/Index… Transforms data and creates indexes

source Heinz 2013

Page 197: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

The five styles of BI

1. Enterprise reporting

2. Cube analysis

3. Ad hoc querying and analysis

4. Statistical analysis and data mining

5. Report delivery and alerting

Page 198: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Dimensional modelsIt is a logical design technique commonly used for data

warehouses, which seeks to present data in a standard

architecture and high performance allow access to end

users.

The model is based on star schemas, tables of Facts

and Dimensional Tables (e.g. cubes).

Multidimensionality: The ability to organize, present,

and analyze data by several dimensions, such as sales

by region, by product, by salesperson, and by time (four

dimensions)

Page 199: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Star Schemas

• A star schema is a common organization for data at a warehouse. It consists of:

1. Fact table : a very large accumulation of facts such as sales.

w Often “insert-only.”

2. Dimension tables : smaller, generally static information about the entities involved in the facts.

199/295

Page 200: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

200/295

Page 201: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

customer custId name address city

53 joe 10 main sfo

81 fred 12 main sfo

111 sally 80 willow la

product prodId name price

p1 bolt 10

p2 nut 5

store storeId city

c1 nyc

c2 sfo

c3 la

sale oderId date custId prodId storeId qty amt

o100 1/7/97 53 p1 c1 1 12

o102 2/7/97 53 p2 c1 2 11

105 3/8/97 111 p1 c3 5 50

Page 202: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

sale prodId storeId amt

p1 c1 12

p2 c1 11

p1 c3 50

p2 c2 8

c1 c2 c3

p1 12 50

p2 11 8

Fact table view:Multi-dimensional cube:

dimensions = 2

CubeA subset of highly interrelated data that is organized to

allow users to combine any attributes in a cube (e.g.,

stores, products, customers, suppliers) with any

metrics in the cube (e.g., sales, profit, units, age) to

create various two-dimensional views, or slices, that

can be displayed on a computer screen

Page 203: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

dimensions = 3

Multi-dimensional cube:Fact table view:

3-D Cube

Page 204: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

MEASURE

The word “measure” is

exactly what it means: a

number that we want to

analyze, what we want to

measure in our analysis

In this example:

410 is the number of

packages delivered

204/295

Page 205: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

205/295

Page 206: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

DIMENSION

The business attribute that

“describes” the measure.

In this example:

We find that the 410

measure has important

context, it represents the

intersection of:

- Route

- Source

- Time

Page 207: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

MEASURE CONTEXT

Specifically, the 410 packages

are related to:

Non-Ground / Air

Eastern Hemisphere / Australia

2nd Half / 4th Quarter on

November 27, 1999

Page 208: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

HIERARCHY

Literally, from the highest

level “grain” to the most

detailed grain. Think:

Year/Qtr/Month/Week/Day

In this example:

The Source dimension can

be drilled-down into

increasing levels of detail.

Each time we do this, the

cube recalculates all

measures at the

intersections.

1 2

208/295

Page 209: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• OLTP: Online Transaction Processing (DBMSs)

• OLAP: Online Analytical Processing (Data Warehousing)

• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

Page 210: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

OLAP• On-Line Analytical Processing

– Drill-Down

– Consolidation

– Slicing and Dicing

Page 211: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

211

Roll

up

Drill

down

Pivot

(rotate):

Slice and

dice

211/295

Page 212: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Limitations of dimensionality

– The multidimensional database can take up significantly more computer storage room than a summarized relational database

– Multidimensional products cost significantly more than standard relational products

– Database loading consumes significant system resources and time, depending on data volume and the number of dimensions

– Interfaces and maintenance are more complex in multidimensional databases than in relational databases

212/295

Page 213: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

OLAP versus OLTP

– OLTP concentrates on processing repetitive transactions in large quantities and conducting simple manipulations

– OLAP involves examining many data items complex relationships

– OLAP may analyze relationships and look for patterns, trends, and exceptions

– OLAP is a direct decision support method

Page 214: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• Operators: sum, count, max, min, median, ave

• “Having” clause

• Using dimension hierarchy

– average by region (within store)

– maximum by month (within date)

Aggregates

Page 215: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Aggregates

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

• Add up amounts for day 1

• In SQL: SELECT sum(amt) FROM SALE

WHERE date = 1

81

215/295

Page 216: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Aggregates

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

• Add up amounts by day

• In SQL: SELECT date, sum(amt) FROM SALE

GROUP BY date

ans date sum

1 81

2 48

216/295

Page 217: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

• Add up amounts by day, product

• In SQL: SELECT date, sum(amt) FROM SALE

GROUP BY date, prodId

sale prodId date amt

p1 1 62

p2 1 19

p1 2 48

drill-down

rollup

Page 218: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Business intelligence and analytics (BI&A)

218/295

Page 219: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

Page 220: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: BI

Case study

• Complete solutions

Pentaho, JasperReports, SpagoBI, BIRT

• ETL tools

Clover , Enhydra Octopus

• OLAP developments

Mondrian, JPivot

• Dashboards

JetSpeed, JBoss Portal

220/295

Page 221: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Discovery of useful, possibly unexpected, patterns in data

• Non-trivial extraction of implicit, previously unknown and potentially useful information from data

• Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

What is Data Mining?

Page 222: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

data mining is a collection of techniques for

efficient automated discovery of

previously unknown, valid, novel, useful

and understandable patterns in large

databases.

The patterns must be actionable so they

may be used in an enterprise’s decision

making.

222/295

Page 223: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Valid: The patterns hold in general.

Novel: We did not know the pattern beforehand.

Useful: We can devise actions from the patterns.

Understandable: We can interpret and comprehend

the patterns.

… discover valid, novel, potentially useful, and

ultimately understandable patterns in data.

Page 224: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Examples

• amazon.com uses associations. Recommendations to customers are based on past purchases and what other customers are purchasing.

• A store in USA “Just for Feet” has about 200 stores, each carrying up to 6000 shoe styles, each style in several sizes. Data mining is used to find the right shoes to stock in the right store.

• More examples in case studies to be discussed later.

224/295

Page 225: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Data Mining vs. KDD

• Knowledge Discovery in Databases (KDD):process of finding useful information and patterns in data.

• Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process.

Page 226: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Knowledge Discovery Process

– Data mining: the core of knowledge discovery process.

Data Cleaning

Data Integration

Databases

Preprocessed

Data

Task-relevant Data

Data transformations

Selection

Data Mining

Knowledge Interpretation

Page 227: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

•Similarity Measures

•Hierarchical Clustering

•IR Systems

•Imprecise Queries

•Textual Data

•Web Search Engines

•Bayes Theorem

•Regression Analysis

•EM Algorithm

•K-Means Clustering

•Time Series Analysis

•Neural Networks

•Decision Tree Algorithms

•Algorithm Design Techniques•Algorithm Analysis•Data Structures

•Relational Data Model•SQL•Association Rule Algorithms•Data Warehousing•Scalability Techniques

DATA MINING

Page 228: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Data Mining Process

Successful data mining involves careful determining the aims and selecting appropriate data. The following steps should normally be followed:

1. Requirements analysis 2. Data selection and collection3. Cleaning and preparing data4. Data mining exploration and validation5. Implementing, evaluating and monitoring6. Results visualisation

228/295

Page 229: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Requirements Analysis

The enterprise decision makers need to formulate goals that

the data mining process is expected to achieve. The business problem must be clearly defined. One cannot use data mining without a good idea of what kind of outcomes the enterprise is looking for.

If objectives have been clearly defined, it is easier to evaluate

the results of the project.

Page 230: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Preprocessing• A data mining process would normally involve preprocessing

• Often data mining applications use data warehousing

• One approach is to pre-mine the data, warehouse it, then carry out data mining

• The process is usually iterative and can take years of effort for a large project

• Preprocessing is very important although often considered too mundane to be taken seriously

• Preprocessing may also be needed after the data warehouse phase

• Data reduction may be needed to transform very high dimensional data to a lower dimensional data

230/295

Page 231: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Data Selection and Collection

Find the best source databases for the data that is required.

If the enterprise has implemented a data warehouse, then most of the data could be available there. Otherwise source OLTP systems need to be identified and required information extracted and stored in some temporary system.

In some cases, only a sample of the data available may be required.

Page 232: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Cleaning and Preparing Data

This may not be an onerous task if a data warehouse containing the required data exists, since most of this must have already been done when data was loaded in the warehouse.

Otherwise this task can be very resource intensive, perhaps more than 50% of effort in a data mining project is spent on this step. Essentially a data store that integrates data from a number of databases may need to be created. When integrating data, one often encounters problems like identifying data, dealing with missing data, data conflicts and ambiguity. An ETL (extraction, transformation and loading) tool may be used to overcome these problems.

232/295

Page 233: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Exploration and Validation

Assuming that the user has access to one or more data

mining tools, a data mining model may be constructed based on the enterprise’s needs. It may be possible to take a sample of data and apply a number of relevant techniques. For each technique the results should be evaluated and their significance interpreted.

This is likely to be an iterative process which should lead to selection of one or more techniques that are suitable for further exploration, testing and validation.

Page 234: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Implementing, Evaluating and Monitoring

Once a model has been selected and validated, the model can be implemented for use by the decision makers. This may involve software development for generating reports or for results visualisation and explanation for managers. If more than one technique is available for the given data mining task, it is necessary to evaluate the results and choose the best. This may involve checking the accuracy and effectiveness of each technique. Regular monitoring of the performance of the techniques that have been implemented is required. Every enterprise evolves with time and so must the data mining system. Monitoring may from time to time to lead to the refinement of tools and techniques that have been implemented.

Page 235: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

CRISP Data Mining Model

235/295

Page 236: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Page 237: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Classification [Predictive]

• Clustering [Descriptive]

• Association Rule Discovery [Descriptive]

• Sequential Pattern Discovery [Descriptive]

• Regression [Predictive]

• Deviation Detection [Predictive]

• Collaborative Filter [Predictive]

Data Mining Tasks

Page 238: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Association analysis

• Classification and prediction

• Cluster analysis

• Web data mining

• Search Engines

• Data warehouse and OLAP

• Others, for example, Sequential patterns and Time-series analysis,

238/295

Page 239: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Association Analysis

• Association analysis involves discovery of relationships or correlations among a set of items.

• Discovering that personal loans are repaid with 80% confidence when the person owns his home.

• The classical example is the one where a store discovered that people buying nappies tend also to buy beer.

• The association rules are often written as X → Y meaning that whenever X appears Y also tends to appear. X and Y may be collection of attributes.

Page 240: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Association Rules• Identify dependencies in the

data:– X makes Y likely

• Indicate significance of each dependency

• Bayesian methods

Uses:

• Targeted marketing

Technologies:

• AIS, SETM, Hugin, TETRAD II

“Find groups of items commonly purchased together”– People who purchase fish are

extraordinarily likely to purchase wine

– People who purchase Turkey are extraordinarily likely to purchase cranberries

Date/Time/Register Fish Turkey Cranberries Wine …

12/6 13:15 2 N Y Y Y …

12/6 13:16 3 Y N N Y …

240/295

Page 241: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

241

We prune the set of all possible association rules using two interestingness measures:

• Confidence of a rule:– X => Y has confidence c if P(Y|X) = c

• Support of a rule:– X => Y has support s if P(XY) = s

We can also define

• Support of a co-ocurrence XY:– XY has support s if P(XY) = s

Confidence and Support

Page 242: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Example rule:{Pen} => {Milk}Support: 75%Confidence: 75%

• Another example:{Ink} => {Pen}Support: 100%Confidence: 100%

TID CID Date Item Qty

111 201 5/1/99 Pen 2

111 201 5/1/99 Ink 1

111 201 5/1/99 Milk 3

111 201 5/1/99 Juice 6

112 105 6/3/99 Pen 1

112 105 6/3/99 Ink 1

112 105 6/3/99 Milk 1

113 106 6/5/99 Pen 1

113 106 6/5/99 Milk 1

114 201 7/1/99 Pen 2

114 201 7/1/99 Ink 2

114 201 7/1/99 Juice 4

Page 243: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Sequential Associations

• Find event sequences that are unusually likely

• Requires “training” event list, known “interesting” events

• Must be robust in the face of additional “noise” events

Uses:

• Failure analysis and prediction

Technologies:

• Dynamic programming (Dynamic time warping)

• “Custom” algorithms

“Find common sequences of warnings/faults within 10 minute periods”– Warn 2 on Switch C preceded

by Fault 21 on Switch B

– Fault 17 on any switch preceded by Warn 2 on any switch

Time Switch Event

21:10 B Fault 21

21:11 A Warn 2

21:13 C Warn 2

21:20 A Fault 17

Page 244: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Classification and Prediction

A set of training objects each with a number of attribute

values are given to the classifier. The classifier formulates

rules for each class in the training set so that the rules

may be used to classify new objects. Some techniques do

not require training data.

Classification may be used for predicting the class label of

data objects. Number of techniques including decision

tree and neural network.

244/295

Page 245: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Given a collection of records (training set )– Each record contains a set of attributes, one of the

attributes is the class.

• Find a model for class attribute as a function of the values of other attributes.

• Goal: previously unseen records should be assigned a class as accurately as possible.– A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

245/295

Page 246: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Example application: telemarketing

246/295

Page 247: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Decision trees are one approach to classification.

• Other approaches include:

– Linear Discriminant Analysis

– k-nearest neighbor methods

– Logistic regression

– Neural networks

– Support Vector Machines

Classification

Page 248: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Training database:– Two predictor attributes:

Age and Car-type (Sport, Minivan and Truck)

– Age is ordered, Car-type iscategorical attribute

– Class label indicateswhether person boughtproduct

– Dependent attribute is categorical

Age Car Class

20 M Yes

30 M Yes

25 T No

30 S Yes

40 S Yes

20 T No

30 M Yes

25 M Yes

40 M Yes

20 S No

Page 249: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Types of Variables

• Numerical: Domain is ordered and can be represented on the real line (e.g., age, income)

• Nominal or categorical: Domain is a finite set without any natural ordering (e.g., occupation, marital status, race)

• Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., preference scale, severity of an injury)

250/295

Page 250: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Decision Trees• A decision tree T encodes d (a classifier or regression function) in form of a

tree.

• A node t in T without children is called a leaf node. Otherwise t is called an internal node.

Minivan

Age

Car Type

YES NO

YES

<30 >=30

Sports, Truck

0 30 60 Age

YES

YES

NO

Minivan

Sports,Truck

Page 251: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Cluster Analysis

Similar to classification in that the aim is to build clusters

such that each of them is similar within itself but is

dissimilar to others. Clustering does not rely on class-

labeled data objects.

Based on the principle of maximizing the intracluster

similarity and minimizing the intercluster similarity.

252/295

Page 252: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Output: (k) groups of records called clusters, such that the records within a group are more similar to records in other groups– Representative points for each cluster

– Labeling of each record with each cluster number

– Other description of each cluster

• This is unsupervised learning: No record labels are given to learn from

• Usage:– Exploratory data mining

– Preprocessing step (e.g., outlier detection)

Page 253: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

age

income

education

253/295

Page 254: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Example input database: Two numerical variables

• How many groups are here?

Age Salary

20 40

25 50

24 45

23 50

40 80

45 85

42 87

35 82

70 30

Customer Demographics

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80

Age

Sa

lary

in

$1

0K

Customers

Page 255: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Requirements: Need to define “similarity” between records

• Important: Use the “right” similarity (distance) function

– Scale or normalize all attributes. Example: seconds, hours, days

– Assign different weights to reflect importance of the attribute

– Choose appropriate measure (e.g., L1, L2)

255/295

Page 256: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Similarity Measures

• Determine similarity between two objects.

• Similarity characteristics:

• Alternatively, distance measure how unlike or dissimilar objects are.

Page 257: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Page 258: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Distance Measures

• Measure dissimilarity between objects

258/295

Page 259: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Approaches• Centroid-based: Assume we have k

clusters, guess at the centers, assign points to nearest center, e.g., K-means; over time, centroids shift

• Hierarchical: Assume there is one cluster per point, and repeatedly merge nearby clusters using some distance threshold

Scalability: Do this with fewest number of passes over data, ideally, sequentially

Page 260: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

K-means Clustering Algorithm

• Choose k initial means

• Assign each point to the cluster with the closest mean

• Compute new mean for each cluster

• Iterate until the k means stabilize

260/295

Page 261: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Page 262: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Other Types of Mining

• Text mining: application of data mining to textual documents

– cluster Web pages to find related pages

– cluster pages a user has visited to organize their visit history

– classify Web pages automatically into a Web directory

• Graph Mining:

– Deal with graph data

262/295

Page 263: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Data Mining / Knowledge Discovery

Structured Data Multimedia Free Text Hypertext

HomeLoan (

Loanee: Frank Rizzo

Lender: MWF

Agency: Lake View

Amount: $200,000

Term: 15 years

)

Frank Rizzo bought

his home from Lake

View Real Estate in

1992.

He paid $200,000

under a15-year loan

from MW Financial.

<a href>Frank Rizzo

</a> Bought

<a hef>this home</a>

from <a href>Lake

View Real Estate</a>

In <b>1992</b>.

<p>...Loans($200K,[map],...)

Mining Text Data: An Introduction

Page 264: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Text mining is about knowledge discovery from large collections of unstructured text.

• It’s not the same as data mining, which is more about discovering patterns in structured data stored in databases.

• Similar techniques are sometimes used, however text mining has many additional constraints caused by the unstructured nature of the text and the use of natural language.

• Information extraction (IE) is a major component of text mining.

• IE is about extracting facts and structured information from unstructured text.

Page 265: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Reasons for Text Mining

0

10

20

30

40

50

60

70

80

90

Percentage

Collections ofText

StructuredData

265/295

Page 266: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Who is in the text analysis arena?

Data Analysis

Computational Linguistics

Search & DBKnowledge Rep. & Reasoning / Tagging

Page 267: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Four score and seven

years ago our fathers brought

forth on this continent, a new

nation, conceived in Liberty,

and dedicated to the

proposition that all men are

created equal.

Now we are engaged in a

great civil war, testing

whether that nation, or …

nation – 5

civil - 1

war – 2

men – 2

died – 4

people – 5

Liberty – 1

God – 1

Feature

Extraction

Loses all order-specific information!

Severely limits context!

Documents Token Sets

Page 268: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Knowledge Discovery: Extraction of codified information (features)

• Information Distillation: Analysis of the feature distribution

Two Mining Phases

268/295

Page 269: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Text mining stages

• Document selection and filtering (IR techniques)

• Document pre-processing (NLP techniques)

• Document processing (NLP / ML / statistical techniques)

Page 270: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Basic Measures for Text Retrieval

• Precision: the percentage of retrieved documents that are in fact relevant to the query (i.e., “correct” responses)

• Recall: the percentage of documents that are relevant to the query and were, in fact, retrieved

|}{|

|}{}{|

Retrieved

RetrievedRelevantprecision

Relevant Relevant &

Retrieved Retrieved

All Documents

|}{|

|}{}{|

Relevant

RetrievedRelevantRecall

270/295

Page 271: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

271 271271271

Semantic mining

Minería

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Page 272: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Semantic Data Mining

Semantic Web Mining

Ontology Mining

Text Mining

Semantic mining

272/295

Page 273: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Mining

Given: table transaction data, relational

databases, text documents, Web pages, ...

one or more domain ontologies Find: a classification model, a

set of semantic patterns

Page 274: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Integration,

• Semantic Web

• Web Mining

Combination of Semantic Web and Web Mining

• Improve Web Mining using Semantic Web

• Improve Semantic Web using Web Mining

The Semantic Web is expressed in formats such as OWL, RDF,

XML,

Are the resources that will be mined to extract knowledge from

the Semantic Web

Semantic Web Mining

Page 275: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Web Mining

• Discovers Local and Global Structure• Structured Data• Goals

• Improvement of site design• Generate dynamic recommendations• Improve marketing

• Main Areas• Web Content Mining• Web Structure Mining• Web Usage Mining

275/295

Page 276: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

• Type of Text Mining

• Uses Tags

• Detect co-occurrences

• Event detection

• Reconstruction of page content

• Relations in a domain

Content Mining

Page 277: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Web Usage Mining

• Request by Visitors

• Additional Structure

• Unintended Relationships

Page 278: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Web Structure Mining

• WebPages as a whole– Uses hyperlinks

– Identify relevance

• Single Pages– Five types of Web Pages

• Head Pages

• Navigation Pages

• Content Pages

• Look up Pages

• Personal Pages

278/295

Page 279: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Ontology Mining

• Ontology Learning

– Learn structures of Ontologies

• Instance Learning

– Populates the Ontologies

Page 280: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Extraction Rules: This technique aims to extract rules from an ontology

and / or group of documents, whether to update the existing or for

creating a new ontology.

Ontology Integration: Consists seek shared vocabulary among several

ontologies.

Ontology Linked: This technique is different relationships between

entities ontologies in order to extract information, view maps, make

changes, build relationships or rules, etc.

Ontology Fusion : information from several ontologies is mixed in order

to standardize knowledge.

Ontology alignment: Identifies similar concepts between ontologies.

Ontology Mining

280/295

Page 281: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Mining to Learn Ontologies

Page 282: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Filling the Ontologies

282/295

Page 283: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Use Ontology to Mine

Page 284: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Open data

• In general, the availability of open data is considered crucial to improving the functioning of cities.

• The convergence of smart cities and open data initiatives is fast unfolding across a number of cities.

• Open data is the way to master information and turn challenges into opportunities. – Allow for better decisions.

– Stimulate innovation.

– Foster greater collaboration.

– Promote predictive analytics.

– Become more effective, efficient, and equitable

Page 285: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

285/295

Page 286: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

http://www.meltinfo.com/ppt/ibm-big-data

286/295

Page 287: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Smart Data Analytics for Smart Cities

Clustering large masses of urban data in a

compact format allows analysts to present

information of the entire dataset (without

omissions or deletions) but reorganises the

data and makes it manageable.

Page 288: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

How can AD potentially contribute to smart cities?

• DA can help reduce emissions and bring down pollution.

• Parking problems can be better managed

• The environment will cooler and greener with less energy being consumed.

Page 289: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

5 ways DA can build better governments

1. Raw data needs to become useful

knowledge

2. Governments must shift from a culture of

secrecy to openness

3. Websites should be user-friendly

4. Distribute stuff that computers can use

(a.k.a machine readable data)

5. Governments will need to open up, in

order to be seen as legitimate

288/295

Page 290: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

Five Ways the Government Wants to Use DA

Do Not Pay Portal

Continuous Evaluation of Insider Threats

Helping Students Learn

Doing Away with Fee-For-Service in Healthcare

Tracking Illegal Activity on the Deep Web

290/295

Page 291: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

Smart cities and ICT

e-Government

Introduction to Data Analytics

Neighbor concepts: MP

Case study

eGovernance Big Data Analytics Platform

Page 292: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

References• D. Loshin, “Business Intelligence: The Savvy Manager's Guide”, The Morgan

Kaufmann Series on Business Intelligence, 2010

• S. Kudyba , Richard Hoptroff , “Data Mining and Business Intelligence: A Guide to Productivity”, IGI Publishing, 2011

• J. Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, “Big Data For Dummies,”, Wiley, 2013

• I. Witten, E. Frank & M. Hall "Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations. Third Edition". Morgan Kaufmann Publishers. 2011.

• M. Milton, “Head First Data Analysis”, O'Reilly Media, 2009

• K. Ahmed, M. Bouhorma, M. Ahmed, “Age of Big Data and Smart Cities: Privacy Trade-Off”, International Journal of Engineering Trends and Technology, 16(6), pp. pp298-304, 2014.

• J. Aguilar, M. Petrizzo, O. Terán, “Desarrollo de las Tecnologías de Información y Comunicación bajo un enfoque de Desarrollo Endógeno: hacia un conocimiento libre y socialmente pertinente”, CAYAPA Revista Venezolana de Economía Social, 9(18), pp. 52-74, 2009.

• J. Aguilar, ‘Ciudades ubicuas y Ciudades Emergentes: Las nuevas CiudadesInteligentes”, to be published Revista de la Academia de Mérida, 2016.

Page 293: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

References• E. Al Nuaimi, H. Al Neyadi, N. Mohamed, J. Al-Jaroodi, “Applications of big data to

smart cities” Journal of Internet Services and Applications, 25(6), 2015.

• T. Bakıcı, E. Almirall, J. Wareham. "A smart city initiative: the case of Barcelona´. Journal of the Knowledge Economy, 4(2), pp.135–148, 2013.

• B. Clark, J. Brudney, S. Jang, “Coproduction of government services and the new information technology: investigating the distributional biases”. Public Adm. Rev. 73, pp. 687–701. 2013.

• Z. Khan, A. Anjum, S. Liaquat. “Cloud Based Big Data Analytics for Smart Future Cities”. Proc. IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC '13). pp. 381-386. 2013.

• D. Lu, Y. Tian, V. Liu, Y. Zhang, “The Performance of the Smart Cities in China—A Comparative Study by Means of Self-Organizing Maps and Social Networks Analysis”, Sustainability, 7, pp. 7604-7621, 2015.

• S. Martin, Z. Holger, G. Vangelis, A. Navot, "Towards a Big Data Analytics Framework for IoT and Smart City Applications", In Modeling and Processing for Next-Generation Big-Data Technologies: With Applications and Case Studies, Springer, pp. 257—282, 2015.

• W. Zhang, Q. Chen, “From E-government to C-government via cloud computing”. Proc. 2010 International Conference on E-Business and E-Government, pp. 679–682. 2010.

Page 294: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

294 294

Next conferences

294294

WITFOR 2016

World Information Technology

Forum

September 12th - 14th, 2016

Metropolis: an Emerging Serious Game in a Smart City

Ciudades Ubicuas y Ciudades

Emergentes:

Las nuevas Ciudades Inteligentes

27 de Junio 2016

Page 295: Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT • e-Government • Introduction to Data Analytics • Neighbor concepts: • Business

GRACIAS

295295

MERCI

BEAUCOUP

Thanks

GRACIAS

www.ing.ula.ve/~aguilar

Wakupeman

Merci

Thanks

Obrigado

Danke

“Si buscas resultados distintos,

entonces no hagas siempre lo mismo”

A. Einstein

[email protected]