dbms data mining

Upload: dhanyaprasad8

Post on 03-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 DBMS Data Mining

    1/40

    1

  • 7/28/2019 DBMS Data Mining

    2/40

    Data Mining - Beer and Nappies

    2

    On Thursday nights people who buy diapers also tend to buy beer

  • 7/28/2019 DBMS Data Mining

    3/40

    Introduction Data is growing at a phenomenal rate

    Users expect more sophisticated information

    How?

    UNCOVER HIDDEN INFORMATION

    DATA MINING

    3

  • 7/28/2019 DBMS Data Mining

    4/40

    Data Mining Data mining, the extraction of hidden predictive

    information from large databases, is a powerful newtechnology with great potential to help companiesfocus on the most important information in their datawarehouses.

    Data mining tools predict future trends and behaviors,

    allowing businesses to make proactive, knowledge-driven decisions.

    4

  • 7/28/2019 DBMS Data Mining

    5/40

    Data mining Data mining involves the use of sophisticated data

    analysis tools to discover previously unknown, validpatterns and relationships in large data sets.

    These tools can include statistical models,mathematical algorithms, and machine learningmethods (algorithms that improve their performanceautomatically through experience, such as neural

    networks or decision trees). Consequently, data mining consists of more than

    collecting and managing data, it also includes analysisand prediction.

    5

  • 7/28/2019 DBMS Data Mining

    6/40

    Data Mining Algorithm Objective: Fit Data to a Model

    Descriptive

    Predictive

    Preference Technique to choose the best model

    Search Technique to search the data Query

    6

  • 7/28/2019 DBMS Data Mining

    7/40

    Data Mining

    7

    Descriptive

    Identify and describe groups of customers withcommon buying behavior

  • 7/28/2019 DBMS Data Mining

    8/40

    Data MiningPredictive

    Given a customers characteristics a model predictshow much the customer will spend on the next catalogorder.

    Predicting likelihood (probability) a customerwould respond to an offer

    8

  • 7/28/2019 DBMS Data Mining

    9/40

    Data Mining Models and Tasks

    9

  • 7/28/2019 DBMS Data Mining

    10/40

    Data Mining Association

    (purchasing a pen and purchasing paper),

    Sequence or Path analysis

    (birth of a child and purchasing diapers), Classification

    (duct tape purchases and plastic sheeting purchases),clustering

    Finding and visually documenting groups of previously

    unknown facts, geographic location and brand preferences) forecasting (discovering patterns

    from which one can make reasonable predictions regardingfuture activities, such as

    10

  • 7/28/2019 DBMS Data Mining

    11/40

    Data MiningData Mining is Knowledge discovery using a

    sophisticated blend of techniques from

    traditional statistics,

    artificial intelligence and

    computer graphics.

    Data mining is the process of semi-automatically

    analyzing large databases to find interesting anduseful patterns

    Data mining overlaps with machine learning,statistics, artificial intelligence and databases.

    11

  • 7/28/2019 DBMS Data Mining

    12/40

    Goals of Data Mining Explanatory: To explain some observed event or

    condition.(Why sales of Maruti Swift has increased in Chennai).

    Confirmatory: To confirm a hypothesis.(whether two-income families are more likely to buy familymedical coverage than single-income families)

    Exploratory: To analyze data for new orunexpected relationships.(What spending patterns are likely to accompany creditcard fraud.)

    12

  • 7/28/2019 DBMS Data Mining

    13/40

    Issues in data mining Data quality,

    which refers to the accuracy and completeness of thedata being analyzed.

    Interoperability of the data mining

    software and databases being used by differentagencies.

    Mission creep, The use of data for purposes other than for which the

    data were originally collected.

    Privacy.

    13

  • 7/28/2019 DBMS Data Mining

    14/40

    Advanced forms of Data MiningWeb mining

    Spatial Mining

    Temporal Mining

    14

  • 7/28/2019 DBMS Data Mining

    15/40

    Web mining Crawlers

    Robot (spider)

    Focused crawler PageRank

    backlinks

    Personalization

    15

  • 7/28/2019 DBMS Data Mining

    16/40

    Spatial miningGoal: data mining on spatial data

    Spatial selection may involve specialized selection

    comparison operations: Near

    North, South, East, West

    Contained in

    Overlap/intersect

    16

  • 7/28/2019 DBMS Data Mining

    17/40

    Temporal miningGoal: data mining for temporal data

    Time Series

    Pattern Detection Sequences

    Temporal Association Rules

    HR database

    17

  • 7/28/2019 DBMS Data Mining

    18/40

    Temporal Database Snapshot Traditional database

    Temporal Multiple time points

    18

  • 7/28/2019 DBMS Data Mining

    19/40

    Types of Database (Temporal)Snapshot No temporal supportTransaction Time Supports time when

    transaction inserted dataTimestampRange

    Valid Time Supports time range when

    data values are validBitemporal Supports both transaction

    and valid time

    19

  • 7/28/2019 DBMS Data Mining

    20/40

    Database Searching vs. Data Mining Query

    Well defined SQL

    Data

    Operational data

    Output

    Precise

    Subset of database

    20

    Query

    Poorly defined

    No precise query language

    Data

    Not operational data

    Output

    Fuzzy

    Not a subset of database

  • 7/28/2019 DBMS Data Mining

    21/40

    Query Examples Database

    Find all credit applicants with last name of Smith. Identify customers who have purchased more than $10,000 in

    the last month. Find all customers who have purchased milk

    Data Mining Find all credit applicants who are poor credit risks.

    (classification)

    Identify customers with similar buying habits. (Clustering) Find all items which are frequently purchased with milk.

    (association rules)

    21

  • 7/28/2019 DBMS Data Mining

    22/40

    Data Mining vs. KDDKnowledge Discovery in Databases

    (KDD): process of finding useful

    information and patterns in data.Data Mining: Use of algorithms to extract

    the information and patterns derived by the

    KDD process.

    22

  • 7/28/2019 DBMS Data Mining

    23/40

    KDD Process

    Selection: Obtain data from various sources.

    Preprocessing: Cleanse data.

    Transformation: Convert to common format.

    Transform to new format. Data Mining: Obtain desired results.

    Interpretation/Evaluation: Present results touser in meaningful manner.

    23

  • 7/28/2019 DBMS Data Mining

    24/40

    Data WarehousingA data warehouse is subject-oriented,integrated,time-variant, and nonvolatile collection of data Subject-oriented: Contains information regarding

    objects of interest for decision support: Sales by region,by product, etc. Integrated: Data are typically extracted from multiple,

    heterogeneous data sources (e.g., from sales, inventory,billing databases etc.).

    Time-variant: Contain historical data, longer horizonthan operational system. Nonvolatile : Data is not (or rarely) directly

    updated.

    24

  • 7/28/2019 DBMS Data Mining

    25/40

    Why build a data warehouseAccess to data from multiple sources, have a

    comprehensive data collection.

    Separate transactional and analysis systems: Improve

    query response time (without slowing downtransaction processing)

    Easy formulation of complex queries

    Access to historical data (not in operational systems)

    Improved data quality (fewer errors and missingvalues)

    25

  • 7/28/2019 DBMS Data Mining

    26/40

    Data Warehouse Back-End Tools Data extraction: Extract data from multiple,

    heterogeneous, and external sources Data cleaning (scrubbing): Detect errors in the data

    and rectify them when possible Data converting: Convert data from legacy or host

    format to warehouse format Transforming: Sort, summarize, compute views, check

    integrity, and build indices Refresh: Propagate the updates from the data sources

    to the warehouse

    26

    D t b D t W h

  • 7/28/2019 DBMS Data Mining

    27/40

    27

    Database

    Application Oriented(OLTP)

    Used to run business Clerical User

    Detailed data

    Current up to date

    Operational Data

    Repetitive access bysmall transactions

    Fast response time

    (seconds)

    Read/Update access

    Relational Schema

    Data Warehouse

    Subject Oriented (OLAP)

    Used to analyze business

    Manager/Analyst Summarized and refined

    Historical data

    Integrated Data

    Ad-hoc access usinglarge queries

    Slow response time(minutes)

    Mostly read access(batch update)

    Star / Snowflake Schema

  • 7/28/2019 DBMS Data Mining

    28/40

    On-Line Analytical Processing OLAP Front-end to the data warehouse. Allowing easy

    data manipulation

    Allows conducting inquiries over the data atvarious levels of abstractions

    Fast and easy because some aggregations arecomputed in advance

    No need to formulate entire query

    OLAP uses data in multidimensional format (e.g.,data cubes) to facilitate query and response time

    28

  • 7/28/2019 DBMS Data Mining

    29/40

    Data Mining Vs. Data WarehouseData Mining: Applications of methods (algorithms) todiscover patterns in data.

    Include some OLAP operations

    OLAP: deductive process - testing existence of hypotheticalpatterns in data Good to explore the data and test hypotheses

    Data Mining mostly refers to modeling underlyingdata

    Uncovering patterns in data Potentially surprising patterns may arise

    Data Mining methods may use data from a datawarehouse (when available)

    29

  • 7/28/2019 DBMS Data Mining

    30/40

    Data Mining + Data WarehouseData Warehousing provides theEnterprise with a memory

    30

    Data Mining provides theEnterprise with intelligence

  • 7/28/2019 DBMS Data Mining

    31/40

    31

  • 7/28/2019 DBMS Data Mining

    32/40

    MDDBMS Multidimensional data model emerged over the past

    10-15 years

    MDDBMS is the Rubik's Cube of databasemanagement systems

    Focuses on analyzing the data, not recordingtransactions

    Data is categorized as either facts with numericalmeasures, or as dimensions that characterize the fact

    32

  • 7/28/2019 DBMS Data Mining

    33/40

    MDDBMS Takes data from many sources, such as RDBMS, Legacy

    System, etc

    Data is physically stored on disk in a data structurethat is highly optimized for multidimensionalprocessing and fast retrieval

    Storage is between 2 and 10 times more efficient over

    RDBMS due to better indexing, compression andrepresentation of sparse data

    33

  • 7/28/2019 DBMS Data Mining

    34/40

    BenefitsQueries are simply a request to see pre-

    existing data organized in a specific

    fashion.Already highly organized, so therequested data is removed andreorganized

    Stores information in the same waythat it is viewed (less datamanagement, and maintenance)

    34

  • 7/28/2019 DBMS Data Mining

    35/40

    The drawbacksNot the best solution for every problem

    Works only on information with

    interrelations

    Database explosion with large amounts

    of sparse data (calculating allrelationships can increase the databasesize dramatically).

    35

  • 7/28/2019 DBMS Data Mining

    36/40

    Example

    MDDBMS are an important tool in KM,

    36

  • 7/28/2019 DBMS Data Mining

    37/40

  • 7/28/2019 DBMS Data Mining

    38/40

  • 7/28/2019 DBMS Data Mining

    39/40

  • 7/28/2019 DBMS Data Mining

    40/40

    Thank You