data mining – introduction

38
Data mining – databases Tim Deprez

Upload: fiddy-prasetiya

Post on 21-May-2015

132 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data mining – introduction

Data mining – databases

Tim Deprez

Page 2: Data mining – introduction

Outline

• Today: – introduction to databases– Introduction on how to work with MSAccess

• Next coming days: practical excercises with MSAccess

Page 3: Data mining – introduction

Data mining

• Exploration of data• Prerequisite: data should be available in a

minable format - database• Database = electronic document storing data

– Non-relational: 1 bulk system with non-related items (eg. Msexcel files, text-documents, non-related-tables)

– Relational: all items (tables) are linked to each other (see further)

Page 4: Data mining – introduction

Why using a database

• Relational database:– All your data is stored in 1 file

• Easy to retrieve data• Easy to backup

– Data and metadata stored together• Data ...• Metadata: data about the data (documentation)

– Many data-files contain undocumented values:– Species A has an abundance of 17 ( meaning of value 17?)

Page 5: Data mining – introduction

Why using a database

• All data in a good relational designed database is only stored once:– Example: species list typing errors

• Nudora thorakista• Nudora thorrakista• Nudora thorakhista • Nudora thorakisa

– 1 species species richness calculation: 4– Solution: 1 table with each species 1 record and

use it as a reference

Page 6: Data mining – introduction

Why using a database

• Data is much more rigid ...– More difficult to make errors– E.g. Sorting in excell

Page 7: Data mining – introduction

Relational database - biologySpecies

person

Places

Sample

Country

Density

Equipment

Page 8: Data mining – introduction

Speciesperson

Places

Sample

Country

Density

Equipment

Which person was present on samplings in sweden?

Page 9: Data mining – introduction

Speciesperson

Places

Sample

Country

Density

Equipment

Which species sampled with a core occur in densities higher than 40

Page 10: Data mining – introduction

Variable

Var_value

Taxonomy

Photo

Literature...

...

...

...

Page 11: Data mining – introduction

Table designs ...

• A table consists of a series of Columns ...• Each record as such:

– Different fields– Design of table must be done before data is entered– Each field: name, data type– Each field can also by formatted layout

Record

ColumnField

Page 12: Data mining – introduction

Table designs ...

• Field types:– Numeric – integer/double– Text – Date/Time– Memo– Autonumber ID– Yes/No

Page 13: Data mining – introduction

Task on field types:

• 12• 15 jan 1988• hallo• 12,456• 12:56• Azdazdazd azdda zda azdd dad zd dadazdzd azdazddazdd azdazd azdazd dzdzdzzd ada zzd azdaz

dda azd da az d z azdzadazd a zd a azd azd z dd da a z a z zd d ddaa zd

• 09:89

Page 14: Data mining – introduction

Special field in a table: key

• A key = a unique identifier for a record– Example: pasport number:

• Number in a database which is unique and relates to all data about you

– Each record in a table gets also a key– This key is used to link tables to each other– Example:

• Nudora sp1 – id: 123776• Nudora sp2 – id: 34688

– Advantage: species name changes: linked taxa remain linked

Page 15: Data mining – introduction

Linking tables through id’s

• Storing numbers is most effecient way to store data:

• Nudora sp1 is found in the north sea with a density of 32

• Species 123776 is found in station 2 (North sea) with a density of 32

• Record in table density becomes:123776 | 2 | 32

Page 16: Data mining – introduction

Setting up relations between tables

• Relations: links between tables• Connecting tables through certain fields in a

rigid way to each other• Advantage: database becomes a strong unity• Types of relations:

– 1 to many– Many to many ( = 2 times 1 to many)

Page 17: Data mining – introduction

Examples of relations

• Table places: field country (numeric)• Table countries – list of countries, each country has unique id• Relation is made between:

– Field country in places– Field id in country

• One to many relation: 1 record in table country linked to multiple records in places

• No deleting of countries possible

Places

Country

Page 18: Data mining – introduction

Examples of relations

• Many to many• Id of sample• Id of species

• Table density: unique combination of sample, species ...

Species

Sample

Density

Page 19: Data mining – introduction

Queries

• All data in database:– Next step: get it out again– Selections on 1 table: by using filters– Selections on multiple tables: using queries– Queries can be saved and reused– Queries can be the basis for new queries

Page 20: Data mining – introduction

Sorting on tables

• Sorting

Page 21: Data mining – introduction

Filtering on tables

Page 22: Data mining – introduction

Making a simple selection Query

• Create ... Query in design view• Switching between views:

Page 23: Data mining – introduction

Making a simple selection Query

• Select the tables and/or queries needed

Page 24: Data mining – introduction

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Page 25: Data mining – introduction

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Page 26: Data mining – introduction

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Page 27: Data mining – introduction

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Page 28: Data mining – introduction

Making a simple selection Query

• Set the criteria

Page 29: Data mining – introduction

Making a simple selection Query

• Select the values to out put and add sorting options

Page 30: Data mining – introduction

Output the results

• Go to datasheet view

Page 31: Data mining – introduction

Making a simple selection Query

• Special options ...

Page 32: Data mining – introduction

Exporting data

• From msaccess it is possible to export to different formats!

• Tables, queries, ...• Exports can be used to do further data mining:

– Through MSExcell making graphs– To do statistical analysis

Page 33: Data mining – introduction

Exporting data

Page 34: Data mining – introduction
Page 35: Data mining – introduction

Step by step demonstration

• Open a database• Different items in database• Open tables, sorting, filtering• Table design• Relationships• Queries

Page 36: Data mining – introduction

Query operators

= equals> Larger than< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ...

Page 37: Data mining – introduction

Query operators

Page 38: Data mining – introduction

Query operators

and both trueor at least 1 true< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ... >"q*" and <"u*" VOORNAAM René, Robbie, Stefan, Stijn, Tim, Tristam

="r*" or "s*" VOORNAAM Robbie, Stefan, Stijn