data mining – introduction

Post on 21-May-2015

132 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data mining – databases

Tim Deprez

Outline

• Today: – introduction to databases– Introduction on how to work with MSAccess

• Next coming days: practical excercises with MSAccess

Data mining

• Exploration of data• Prerequisite: data should be available in a

minable format - database• Database = electronic document storing data

– Non-relational: 1 bulk system with non-related items (eg. Msexcel files, text-documents, non-related-tables)

– Relational: all items (tables) are linked to each other (see further)

Why using a database

• Relational database:– All your data is stored in 1 file

• Easy to retrieve data• Easy to backup

– Data and metadata stored together• Data ...• Metadata: data about the data (documentation)

– Many data-files contain undocumented values:– Species A has an abundance of 17 ( meaning of value 17?)

Why using a database

• All data in a good relational designed database is only stored once:– Example: species list typing errors

• Nudora thorakista• Nudora thorrakista• Nudora thorakhista • Nudora thorakisa

– 1 species species richness calculation: 4– Solution: 1 table with each species 1 record and

use it as a reference

Why using a database

• Data is much more rigid ...– More difficult to make errors– E.g. Sorting in excell

Relational database - biologySpecies

person

Places

Sample

Country

Density

Equipment

Speciesperson

Places

Sample

Country

Density

Equipment

Which person was present on samplings in sweden?

Speciesperson

Places

Sample

Country

Density

Equipment

Which species sampled with a core occur in densities higher than 40

Variable

Var_value

Taxonomy

Photo

Literature...

...

...

...

Table designs ...

• A table consists of a series of Columns ...• Each record as such:

– Different fields– Design of table must be done before data is entered– Each field: name, data type– Each field can also by formatted layout

Record

ColumnField

Table designs ...

• Field types:– Numeric – integer/double– Text – Date/Time– Memo– Autonumber ID– Yes/No

Task on field types:

• 12• 15 jan 1988• hallo• 12,456• 12:56• Azdazdazd azdda zda azdd dad zd dadazdzd azdazddazdd azdazd azdazd dzdzdzzd ada zzd azdaz

dda azd da az d z azdzadazd a zd a azd azd z dd da a z a z zd d ddaa zd

• 09:89

Special field in a table: key

• A key = a unique identifier for a record– Example: pasport number:

• Number in a database which is unique and relates to all data about you

– Each record in a table gets also a key– This key is used to link tables to each other– Example:

• Nudora sp1 – id: 123776• Nudora sp2 – id: 34688

– Advantage: species name changes: linked taxa remain linked

Linking tables through id’s

• Storing numbers is most effecient way to store data:

• Nudora sp1 is found in the north sea with a density of 32

• Species 123776 is found in station 2 (North sea) with a density of 32

• Record in table density becomes:123776 | 2 | 32

Setting up relations between tables

• Relations: links between tables• Connecting tables through certain fields in a

rigid way to each other• Advantage: database becomes a strong unity• Types of relations:

– 1 to many– Many to many ( = 2 times 1 to many)

Examples of relations

• Table places: field country (numeric)• Table countries – list of countries, each country has unique id• Relation is made between:

– Field country in places– Field id in country

• One to many relation: 1 record in table country linked to multiple records in places

• No deleting of countries possible

Places

Country

Examples of relations

• Many to many• Id of sample• Id of species

• Table density: unique combination of sample, species ...

Species

Sample

Density

Queries

• All data in database:– Next step: get it out again– Selections on 1 table: by using filters– Selections on multiple tables: using queries– Queries can be saved and reused– Queries can be the basis for new queries

Sorting on tables

• Sorting

Filtering on tables

Making a simple selection Query

• Create ... Query in design view• Switching between views:

Making a simple selection Query

• Select the tables and/or queries needed

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Making a simple selection Query

• Select the fields needed for output/selection/sorting

Making a simple selection Query

• Set the criteria

Making a simple selection Query

• Select the values to out put and add sorting options

Output the results

• Go to datasheet view

Making a simple selection Query

• Special options ...

Exporting data

• From msaccess it is possible to export to different formats!

• Tables, queries, ...• Exports can be used to do further data mining:

– Through MSExcell making graphs– To do statistical analysis

Exporting data

Step by step demonstration

• Open a database• Different items in database• Open tables, sorting, filtering• Table design• Relationships• Queries

Query operators

= equals> Larger than< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ...

Query operators

Query operators

and both trueor at least 1 true< Smaller than>= larger than or equalsBetween ... And ... Is nullLike ...Not like ... >"q*" and <"u*" VOORNAAM René, Robbie, Stefan, Stijn, Tim, Tristam

="r*" or "s*" VOORNAAM Robbie, Stefan, Stijn

top related