met280: computing for bioinformatics introduction to databases what is a database? not a...

12
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System) architectures DBMS components DBMS user interfaces SQL (Structured Query Language) Client-server and 3-tier systems

Upload: oliver-reed

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

MET280: Computing for Bioinformatics

Introduction to databases

• What is a database? Not a spreadsheet.

• Data types and uses

• DBMS (DataBase Management System) architectures

• DBMS components

• DBMS user interfaces

• SQL (Structured Query Language)

• Client-server and 3-tier systems

Page 2: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

What databases are not

• unstructured piles of data (including heaps of web pages)

• spreadsheets such as Excel tables

• text files with neatly tabulated data

• data collected for one kind of analysis only

Why are these things not databases?

Page 3: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Spreadsheets – databases (1)

• A spreadsheet is typically viewed as an entire table of cells which may contain

– numbers (data)

– text (labels)

– formulae (calculations producing results)

• A database may be structured in various ways, usually so that a small subset of the data is presented as the result of a search

Page 4: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Spreadsheets – databases (2)

Spreadsheets

• Can be used immediately with little preparation (or thought)

• Data is visible

• Data entry is simple

Databases

• Require planning

• Data is hidden

• May require a program to help you enter or retrieve data

Page 5: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Spreadsheets – databases (3)

Spreadsheets

• Little checking is carried out

• Tables and graphs can be produced

• Single user

Databases

• Extensive integrity checks can be arranged

• Reports can be programmed

• Searches can be made

• Can be multi-user

• Can be put on the Web with a suitable user interface program

Page 6: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

What a database is

• Data is stored separately from any application programs which might use it

• Multiple uses of the data are envisaged

• Designed for retrieval in various anticipated and unanticipated forms

Page 7: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

What are they used for?Biology:• species names, data about species• details of publications

Biodiversity:• data about biological specimens• place names, data about areas, places, sampling sites,

habitats etc. (sometimes in Geographical Information Systems (GIS)

Bioinformatics:• results of experiments• molecular sequences, protein structures• gene frequencies, gene expression data, etc.

Page 8: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

DBMS types (database internal structure)

What are the main types of database design? (The internal mechanics, not the information stored or the appearance of the database as seen by the user.)

• “Free text” - records not divided into fields• “Flat-file” - records have fields (one table with columns

like a spreadsheet), common and easy to understand, often inefficient

• Hierarchical, Network - now obsolete• Relational - several tables, usually the choice of the

professional (solid, boring)• Object-oriented - for the adventurous (cutting edge)• “NoSQL” (Not only SQL) non-tabular XML, RDF etc.

Page 9: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Database system components

A database management system (DBMS) has the following essential components:

• Data tables (the data itself)• Core DBMS software including the database storage “engine”

(stores data to and retrieves data from the tables)• User interface (for humans to enter, view and edit data)

Some commercial general-purpose DBMSs, such as Microsoft Access, make the storage engine and the interface appear as one (although Access can act as the user interface to other storage engines such as MySQL)

Page 10: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Accessing the data in the database

• A user can use a built-in user interface to search, edit, etc. (e.g. in Microsoft Access)

• A user can use a separate or even third-party general-purpose client program, especially in the case of client-server systems such as MySQL, Oracle, etc.

• Such clients often use the SQL language (pronounced either “ess-cue-ell” or “sequel”) as a (fairly) standard way to formulate search requests, data editing instructions, etc.

• Special-purpose client programs may also be written (in Perl, Java, PHP, etc.) to perform such access, using SQL “embedded” in the program

Page 11: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

User interface components

A DBMS is usually also associated with: • Database “drivers” to permit programs to communicate with

the storage engine to store, retrieve and alter data• Application programs, which use drivers to connect to the

database, send SQL commands to it and do useful things, sometimes called “business logic”; may be general-purpose or specialised)

• Report writer (a specialised application program)• Utilities (ditto, for back-up, integrity checking, import &

export modules, etc.)

Page 12: MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)

Smallest ever guide to SQL

• Database table definition: column names, data types, indexes, etc.

• Data records may be inserted, altered or deleted

• Data retrieval is based on the idea of selecting columns and rows to obtain a subset of a larger stored table, e.g.

– SELECT name, salary FROM Employee WHERE school = “COMSC” AND name LIKE ‘Whit%’;

• Data may be retrieved from two or more tables using “joins” on linking data fields (keys)

• For examples, see the notes for Lecture 7