bit 33603: data mining lecture 1: introduction to data mining

59
1 BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING Professor Dr. Rozaida Ghazali Office: Room 8, Level 5, FSKTM rozaida@uthm.edu.my 07-45383648

Upload: others

Post on 25-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

1

BIT 33603: Data Mining

Lecture 1:INTRODUCTION TO

DATA MINING

Professor Dr. Rozaida Ghazali

Office: Room 8, Level 5, FSKTM [email protected]

07-45383648

Page 2: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

About Me

2

From Langkawi, the beautiful Island

Bachelor Degree in Computer Science, USM

Master Degree in Computer Science, UTM

Phd in Neural NetworksLiverpool John Moores UniversityUK.

Page 3: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 3

Slides Outlinesn Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 4: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Why Take this Class?

October 18, 2021 Data Mining: Concepts and Techniques 4

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

Page 5: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Careers

Page 6: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 6

Anatomy & Roles of Data Scientist

Page 7: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

7

Page 8: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

n Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

Page 9: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

9

Page 10: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

10

Data science vs data analytics vs big data What is Data Science:• Mining large amounts of structured & unstructured data

to identify patterns• Includes a combination of programming statistical skills,

machine learning, and algorithms.

What is Big Data:• Refers to humongous volumes of data• Includes capturing data, data storage & data querying

What is Data Analytics:• Process and perform statistical analysis of data• Discover how data can be used to draw conclusions &

solve problems

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 11: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

What do they do?

Data Scientist

11

Data Analyst

Big Data Professional

Page 12: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

n Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

Page 13: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

13

Why Data Mining?

n The Explosive Growth of Data: from terabytes to petabytes

n Data collection and data availability

n Automated data collection tools, database systems, Web,

computerized society

n Major sources of abundant data

n Business: Web, e-commerce, bank transactions, stocks, etc

n Science: Remote sensing, bioinformatics, scientific

simulation, …

n Society and everyone: news, digital cameras, YouTube

n We are drowning in data, but starving for knowledge!

n Data mining—Automated analysis of massive data sets

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 14: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 14

Page 15: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

BIG DATA

15

Page 16: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

16

Page 17: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 17

Page 18: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Why big data keep getting bigger?- they never sleeps

Page 19: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Source: HP

Page 20: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 21: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 22: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 23: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Roughly 4.66 billion people around the world use the internet at the start of 2021 – that's close to 60 percent of the world's total population. This number is still growing too, with our latest data showing that 319 million new users came online over the past twelve months.

Page 24: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 25: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 26: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 27: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 27

Big Data Growth

Page 28: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Analytics as Solution Huge Volumes of Data • Unprecedented data production both Online as well as Offline • Online: social media, emails, e-

commerce • Offline: Census, banking, GPS,

etc.

Scalable Approach • Traditional analytical techniques become infeasible

Tech Advances • Cheap storage and computing power.

Page 29: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

n Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

Page 30: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 31: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Artificial Intelligence (AI) means adding intelligence tomachines artificially so that they become intelligentand behave in ways similar to humans. AI is usuallydefined as the science of making computers do thingsthat require intelligence when done by humans.

Machine learning is a field of computer science thatuses intelligent, or statistical techniques to givecomputer systems the ability to "learn" (e.g.,progressively improve performance on a specific task)with data, without being explicitly programmed.

Data mining is the process of discovering patterns inlarge data sets involving methods at the intersectionof machine learning, statistics, and database systems.

Business analytics refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning.

Page 32: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING
Page 33: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

33

Examples of Data

Page 34: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

34

What is (Not) data mining?Watch out: Is everything “data mining”?

Simple search and query processing Look up phone number in phone directory

What is Data mining?

Data mining is a process of extracting and discoveringpatterns in large data sets involving methodsat the intersection of machine learning, statistics, and artificial intelligence

Data mining is the process of finding anomalies, patterns and correlationswithin large data sets to predict outcomes. Using a broad range oftechniques, you can use this information to increase revenues, cut costs,improve customer relationships, reduce risks and more.

Data mining, also known as knowledge discovery in data (KDD), is theprocess of uncovering patterns and other valuable information from largedata sets.

Page 35: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 35

Data Mining: Confluence of Multiple Disciplines

Data Mining

Database Technology Statistics

MachineLearning

PatternRecognition

AlgorithmOther

Disciplines

Visualization

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 36: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 36

Knowledge Discovery (KDD) ProcessMESYUARAT BAGI

MEMBINCANGKAN STATUS DAN TINDAKAN BERDASARKAN

LAPORAN PENILAIAN PROGRAM LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 37: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

KDD Process: The Steps1) Learning the application domain

n relevant prior knowledge and goals of application

2) Creating a target data set: data selection3) Data cleaning and preprocessing: (may take 60% of effort!)

4) Data reduction and transformationn Find useful features, dimensionality/variable reduction,

invariant representation

5) Choosing functions of data mining

n classification, regression, association, clustering

6) Choosing the mining algorithm(s)7) Data mining: search for patterns of interest

8) Pattern evaluation and knowledge presentationn visualization, transformation, removing redundant

patterns, etc.

9) Use of discovered knowledge

Page 38: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Multi-Dimensional View of Data Mining

Kinds of data to be mined

n Data warehouse, transactional, stream, object-oriented/relational database, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

Kinds of knowledge to be mined

n Characterization, discrimination, association, classification, clustering, trend/deviation, patterns, outlier analysis, etc.

Kinds of techniques to be utilized

n Neural networks, fuzzy logic, swarm optimization, case-based reasoning, decision tree, support vector machine, etc.

Kinds of applications to be adapted

n Retail, telecommunication, banking, fraud analysis

n Plant loading, stock market analysis

n Web mining, text mining, patter recognition etc.

Page 39: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

n Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

Page 40: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 40

Data Mining: On What Kinds of Data?

n Database-oriented data sets and applications

n Relational database, data warehouse, transactional database

n Advanced data sets and advanced applications

n Data streams and sensor data

n Time-series data, temporal data, sequence data (incl. bio-

sequences)

n Structured data, unstructured data, graphs, social networks

and multi-linked data

n Spatial data and spatiotemporal data

n Multimedia database

n Text databases

n The World-Wide Web

n Many more

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 41: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Major Issues in Data MiningMESYUARAT BAGI

MEMBINCANGKAN STATUS DAN TINDAKAN BERDASARKAN

LAPORAN PENILAIAN PROGRAM LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 42: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

n Motivation: Why taking this class?

n Data science vs data analytics vs big data

n Why data mining?

n What is data mining?

n Data Mining: On what kind of data?

n Major issues in data mining

n Applications of data mining

Page 43: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Applications of Data Mining

Page 44: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Water Quality Classification Using an Artificial Neural Network

Prediction Of Earthquake’s Magnitude Using Neural Network

Figure 11. Interface After do a Prediction (provided data)

Figure 12 is an interface for new data. If user has the new data, they can choose the data (only data in .txt format). Then set the learning rate, momentum and input node then run the application.

Figure 12. Interface Before do a Prediction (new

data)

Figure 13 is a result for the chosen data after the prediction.

Figure 13. Interface After do a Prediction (new data)

A. Performance Matrix 1) Mean Squared Error (MSE) Mean Squared Error is a value between prediction value and original value. To know whether the application was predict well or not, look at the MSE value. If the MSE value was decreasing a close to zero so it means the prediction is good.

Figure 14. MSE Graft

Table 2. Last 11 MSE value. Epoch MSE

90 0.007652407 91 0.007652220 92 0.007652032 93 0.007651845 94 0.007651658 95 0.007651471 96 0.007651284 97 0.007651097 98 0.007650911 99 0.007650724 100 0.007650538

Figure 14 and Table 2 show clearly the decrease of MSE. B. Result 1) Training Graph

Figure 15. Training Graph Using Simple

Backpropagation

Figure 15 show the result of training. Based in these graph we will see two line plotting, blue line and red line. Blue line is an original signal plotting and red line is a prediction signal plotting. We see that these to line are

Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint 5.0 now.

Time series prediction

Page 45: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Breast Cancer Diagnosis and Classification using an Artificial Neural Network

Flower Image Recognition Using Deep Neural Network Oil Palm Fruit Grading using Fuzzy

Logic Approach

Input node pula ialah masukan pengguna yang boleh di isi oleh pengguna mengikut tahap kekompleksan masalah atau data yang diberikan serta kesesuai dan kaedah cuba jaya dimana jika input node terlalu tinggi atau terlalu rendah, ianya akan memberikan nilai yang tidak efiksyen kepada keluaran [10].

Pemalar momentum adalah untuk meningkatkan kadar penumpuan bagi mencapai penyelesaian yang lebih baik. Nilai pemalar momentum terletak di antara 0 dan 1. Jika pemalar momentum terlalu kecil, pergerakan perubahan pemberat terlalu lambat sehingga ia lambat menumpu. Namun begitu, nilai yang terlalu besar akan menyebabkan pergerakan terlalu cepat sehingga ada kemungkinan terdapat beberapa penyelesaian yang tidak diambilkira [11].

B. Graf

Rajah 5 : Graf Pengelasan Keluaran dan Graf MSE melawan Epochs.

Daripada aplikasi ini juga, graf yang akan dikeluarkan adalah graf berkaitan dengan output pengelasan pesakit kanser payudara yang telah dikelaskan kepada 2 bahagian iaitu malignant dan benign. Ini menunjukkan pengelasan pesakit samaada ramai yang dijangkiti ataupun tidak.

Graf kedua pula berkaitan dengan Mean Square Error (MSE) melawan Epoch. MSE adalah fungsi prestasi rangkaian iaitu mengukur prestasi rangkaian sesuai dengan kesalahan kuadrat (mean of squared errors). Graf tersebut menjelaskan bahawa, jika graf menghampiri nilai sifar, maka aplikasi tersebut adalah menepati nilai yang ingin dikelaskan ataupun diramalkan [12].

C.Keluaran Keputusan

Rajah 6 : Keluaran keputusan MSE (mean of squared errors).

Rajah 7 : Keluaran keputusan pengiraan CPU dan ketepatan aplikasi.

Keluaran keputusan dalah rajah 6 adalah keputusan bagi MSE yang dipaparkan adalah berdasarkan keputusan yang telah dikeluarkan semasa aplikasi berjalan. Keputusan ini membantu untuk mengenalpasti samaada aplikasi yang dibangunkan adalah berhampiran dengan nilai sifar ataupun tidak.

Keluaran keputusan pada rajah 7 pula adalah pengiraan CPU dan ketepatan aplikasi. Ini juga membantu dalam mengenalpasti samaada pengkelasan yang dilakukan betul ataupun salah berdasarkan Accuracy Percentage. Pengiraan perjalanan CPU terhadap sesuatu proses di dalam aplikasi boleh diketahui dengan adanya pengiraan masa CPU tersebut.

III.KESIMPULAN

Daripada aplikasi yang dibangunkan ini, pengelasan pesakit kanser payudara dapat dilakukan dengan lebih cepat dan tepat. Selain itu juga, pengelasan pesakit kanser payudara ini juga mampu membantu dalam bidang medical.

Data pesakit kanser payudara ini diambil melalui Dr.William H. Wolberg yang berada di University of

Wisconsin Hospitals, Madison, Wisconsin USA yang

daripada laman web Machine Learning Repository.

Data ini adalah bertujuan untuk mengkelaskan dan

meramalkan pesakit kanser payudara yang dijangkiti

dengan dua kelas iairu malignant dan benign.

Di dalam fasa pemahaman data ini, pemilihan

data dan pemahaman terhadap data yang akan

diperolehi harus dilakukan untuk membangunkan

Aplikasi Rangkaian Neural Buatan bagi Pengelasan

Pesakit Kanser Payudara.

Data yang diperolehi adalah berkaitan dengan

pesakit kanser payudara daripada University of

Wisconsin Hospitals, Madison, Wisconsin USA yang

diambil daripada laman web Machine Learning

Repository.

Pada fasa pemodelan, antaramuka sistem dibangunkan menggunakan MatLab R2010a bagi melengkapi Aplikasi Rangkaian Neural Buatan Pengelasan Pesakit Kanser Payudara yang dibangunkan. Pemilihan dan penstrukturan antaramuka amat dititikberatkan untuk menarik minat pelanggan menggunakan sistem ini.

Pada fasa penilaian & perlaksanaan ini, penilaian terhadap Aplikasi Rangkaian Neural Buatan Pengkelasan bagi Pesakit Kanser Payudara dilakukan bagi mengenalpasti dan mencapai objektif sistem yang dibangunkan. Tujuan utama fasa ini ialah untuk menentukan sama ada sistem yang dijalankan mempunyai sebarang kelemahan yang boleh diperbetulakan sebelum fasa yang seterusnya dijalankan. Fasa perlaksanaan ini selalunya berlaku selepas sesebuah sistem siap dibangunkan. Fasa ini dilakukan untuk mengenalpasti samaada sistem yang dibangunkan berjaya mencapai objektif ataupun tidak. Fasa ini lebih kepada pengujian yang akan dilakukan oleh pengguna terhadap aplikasi yang dibangunkan.

Rajah 3 : Metodhologi CRISP-DM. [8]

II. IMPLEMENTASI & PENGUJIAN

A. Keputusan

Daripada aplikasi yang dibangunkan, keputusan yang dijangkakan akan diperolehi oleh aplikasi ini ialah keputusan pengelasan pesakit kanser payudara kepada dua kumpulan berbeza iaitu malignant dan benign.

Aplikasi ini mengandungi antaramuka yang memerlukan pengguna memasukkan beberapa input sebagai masukan dan akan mengeluarkan graf serta keputusan di bahagian yang telah disediakan.

Rajah 4 : Antaramuka Aplikasi yang dibangunkan.

Rajah 3 menunjukkan antaramuka yang direka untuk memenuhi kehendak pengguna dan kehendak aplikasi itu sendiri dimana pengguna harus memasukkan nilai kadar pembelajaran, input nod, serta momentum untuk menjalankan aplikasi ini.

Learning rate atau lebih dikenali dengan kadar pembelajaran akan dimasukkan oleh pengguna dengan berdasarkan antara nilai 0 dan 1. Ini kerana, kadar pembelajaran yang terlalu besar akan menyebabkan rangkaian neural tidak dapat mempelajari secara menyeluruh dan jika nilai terlalu kecil ianya akan menyebabkan rangkaian neural lambat menumpu [9].

Pattern Recognition &

Data Classification

Page 46: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 46

Applications of Data Mining (cont.)

n Data analysis and decision supportn Market analysis and management

n Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation

n Risk analysis and managementn Forecasting, customer retention, improved underwriting, quality

control, competitive analysisn Fraud detection and detection of unusual patterns (outliers)

n Other Applicationsn Text mining (news group, email, documents) and Web miningn Stream data miningn Bioinformatics and bio-data analysis

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 47: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 47

Ex. 1: Market Analysis and Managementn Where does the data come from?—Credit card transactions, loyalty cards,

discount coupons, customer complaint calls, plus (public) lifestyle studiesn Target marketing

n Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.

n Determine customer purchasing patterns over timen Cross-market analysis—Find associations/co-relations between product sales,

& predict based on such association n Customer profiling—What types of customers buy what products (clustering

or classification)n Customer requirement analysis

n Identify the best products for different groups of customersn Predict what factors will attract new customers

n Provision of summary informationn Multidimensional summary reportsn Statistical summary information (data central tendency and variation)

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 48: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 48

Ex. 2: Corporate Analysis & Risk Management

n Finance planning and asset evaluationn cash flow analysis and predictionn contingent claim analysis to evaluate assets n cross-sectional and time series analysis (financial-ratio, trend

analysis, etc.)n Resource planning

n summarize and compare the resources and spendingn Competition

n monitor competitors and market directions n group customers into classes and a class-based pricing proceduren set pricing strategy in a highly competitive market

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 49: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

October 18, 2021 Data Mining: Concepts and Techniques 49

Ex. 3: Fraud Detection & Mining Unusual Patterns

n Approaches: Clustering & model construction for frauds, outlier analysisn Applications: Health care, retail, credit card service, telecomm.

n Auto insurance: ring of collisions n Money laundering: suspicious monetary transactions n Medical insurance

n Professional patients, ring of doctors, and ring of referencesn Unnecessary or correlated screening tests

n Telecommunications: phone-call fraudn Phone call model: destination of the call, duration, time of day or

week. Analyze patterns that deviate from an expected normn Retail industry

n Analysts estimate that 38% of retail shrink is due to dishonest employees

n Anti-terrorism

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 50: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Applications in Airlines Companies

Page 51: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Applications in Hypermarket Companies.

Page 52: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Applications in Businesses.

1- Retailing and sales distributions - Predicting sales, determining correct inventory levels and distribution schedules among outlets

2- Manufacturing & production – predicting machinery failures, finding key factors that control optimization of manufacturing capacity

3- Marketing – classifying customer demographics that can be used to predict which customers will respond to a mailing or buy a particular product.

Page 53: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Applications in Banks & Insurance Companies

1- brokerage and security trading -predicting when the bondprices will change, forecasting the range of stock fluctuations forparticular issues and the overall market, determining when to buyor sell stocks.

2- insurance – forecasting claim amounts and medical coveragecost, classifying the most important elements that affect medicalcoverage, predicting which customers will buy new policy, etc.

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 54: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

54

n Benefits include improved health care quality, reduced operating costs, and better insight into medical data

Diagnosis:Recognize and classify patterns in multivariate patient attributes

Therapy:Select from available treatment methods; based on effectiveness, suitability to patient, etc.

Prognosis:Predict future outcomes based on previous experience and present conditions

Applications in Medical/Health Care

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 55: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Need for Data Mining in Medicine

• Nature of medical data: noisy, incomplete,uncertain, nonlinearities, fuzziness

• Too much data now collected due tocomputerization (text, graphs, images, etc)

• Too many disease markers (attributes) nowavailable for decision making

• Increased demand for health services: Greaterawareness, increased life expectancy, …..

• Overworked physicians and facilities• Stressful work conditions in ICUs, etc.

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 56: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

56

n Travel Behavior Modelling n drivers behavior in signalized urban

Intersection n Driver decision making modeln Traffic Flow Intersection control n Estimation of Speed-Flow relationshipn Traffic managementn Trip generation modeln Urban public transport equilibriumn Incident detectionn Prediction parking characteristicsn Travel time prediction

Applications in TransportationMESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 57: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Some more Examples …

October 18, 2021 Data Mining: Concepts and Techniques 57

MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 58: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING

Some more Examples …MESYUARAT BAGI MEMBINCANGKAN STATUS DAN

TINDAKAN BERDASARKAN LAPORAN PENILAIAN PROGRAM

LIBAT URUS

Universiti Tun Hussein Onn Malaysia28 April 2020 (Selasa)

9.30 pagi

Page 59: BIT 33603: Data Mining Lecture 1: INTRODUCTION TO DATA MINING