databases 101 - california institute of technologygeorge/aybi199/graham_db1.pdf · adatabase...

36
Databases 101 Matthew J. Graham CACR Methods of Computational Science Caltech, 3 May 2011 matew graham

Upload: others

Post on 15-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

Databases101

MatthewJ.GrahamCACR

MethodsofComputationalScienceCaltech, 3May2011

matthew graham

Page 2: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

fliptrap

matthew graham

Page 3: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

fliptrap

matthew graham

Page 4: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

fliptrap

matthew graham

Page 5: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

fliptrap

matthew graham

Page 6: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

whatisadatabase?

A structuredcollectionofdataresidingonacomputersystemthatcanbeeasilyaccessed, managedandupdated

Dataisorganisedaccordingtoadatabasemodel

A DatabaseManagementSystem(DBMS) isasoftwarepackagedesignedtostoreandmanagedatabases

matthew graham

Page 7: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

whyuseadbms?

dataindependence

efficientandconcurrentaccess

dataintegrity, securityandsafety

uniformdataadministration

reducedapplicationdevelopmenttime

dataanalysistools

matthew graham

Page 8: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

scaleofdatabases

"DBs own the sweet spot of1GB to 100TB" (Gray&Hey,2006)

SQLite

MySQL,PostgreSQL

SQLServer, Oracle

∗Hive, HadoopDB

matthew graham

Page 9: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

datamodels

A collectionofconceptsdescribinghowstructureddataisrepresentedandaccessed

Withinadatamodel, the schema isasetofdescriptionsofaparticularcollectionofdata

Theschemaisstoredina datadictionary andcanberepresentedinSQL,XML,RDF,etc.

Insemanticsadatamodelisequivalenttoanontology-"aformal, explicitspecificationofasharedconceptualisation"

matthew graham

Page 10: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

fliptrap

matthew graham

Page 11: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

flat(file)model

Datafilesthatcontainrecordswithnostructuralrelationships

Additionalinformationisrequiredtointerpretthesefilessuchasthefileformatproperties

Hollerith1889patent"ArtofCompilingStatistics"describeshoweveryUS residentcanberepresentedbyastringof80charactersandnumbers

Examples: delimited-separateddata, HTML table

matthew graham

Page 12: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

relationalmodel

Dataisorganizedasrelations, attributesanddomains

A relationisatablewithcolumns(attributes)androws(tuples)

Thedomainisthesetofvaluesthattheattributesareallowedtotake

Withintherelation, eachrowisunique, thecolumnorderisimmaterialandeachrowcontainsasinglevalueforeachofitsattributes

ProposedbyE.F.Coddin1969/70

matthew graham

Page 13: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

transactions

Anatomicsequenceofactions(read/write)inthedatabase

Eachtransactionhastobeexecuted completely andmustleavethedatabaseinaconsistentstate

Ifthetransactionfailsorabortsmidway, thedatabaseis"rolledback"toitsinitialconsistentstate

Example:AuthorisePaypaltopay $100formyeBaypurchase:-Debitmyaccount $100-Credittheseller'saccount $100

matthew graham

Page 14: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

acid

Bydefinition, adatabasetransactionmustbe:

Atomic: allornothing

Consistent: nointegrityconstraintsviolated

Isolated: doesnot interferewithanyothertransaction

Durable: committed transaction effectspersist

matthew graham

Page 15: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

concurrency

DBMS ensuresthatinterleavedtransactionscomingfromdifferentclientsdonotcauseinconsistenciesinthedata

Itconvertstheconcurrenttransactionsetintoanewsetthatcanbeexecutedsequentially

Beforereading/writinganobject, eachtransactionwaitsforalock ontheobject

Eachtransactionreleasesallitslockswhenfinished

matthew graham

Page 16: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

locks

DMBS cansetandholdmultiplelockssimultaneouslyondifferentlevelsofthephysicaldatastructure

Granularity: atarowlevel, page(abasicdatablock), extent(multiplearrayofpages)orevenanentiretable

Exclusivevs. shared

Optimisticvs. pessimistic

matthew graham

Page 17: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

logs

Ensuresatomicityoftransactions

Recoveringafteracrash, effectsofpartiallyexecutedtransactionsareundoneusingthelog

Logrecord:

-- Header(transactionID,timestamp, ...)-- ItemID-- Type-- Oldandnewvalue

matthew graham

Page 18: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

partitions

Horizontal: differentrowsindifferenttables

Vertical: differentcolumnsindifferenttables(normalisation)

Range: rowswherevaluesinaparticularcolumnareinsideacertainrange

List: rowswherevaluesinaparticularcolumnmatchalistofvalues

Hash: rowswhereahashfunctionreturnsaparticularvalue

matthew graham

Page 19: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

structuredquerylanguage

Appearedin1974fromIBM

Firststandardpublishedin1986; mostrecentin2008

SQL92istakentobedefaultstandard

Differentflavours:

Microsoft/Sybase Transact-SQLMySQL MySQLOracle PL/SQLPostgreSQL PL/pgSQL

matthew graham

Page 20: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

create

CREATE DATABASE databaseNameCREATE TABLE tableName (name1type1, name2type2, . . . )

CREATE TABLE star (name varchar(20), ra float, dec float, vmag float)

Datatypes:• boolean, bit, tinyint, smallint, int, bigint;• real/float, double, decimal;• char, varchar, text, binary, blob, longblob;• date, time, datetime, timestamp

CREATE TABLE star (name varchar(20) not null, ra float default 0, ...)

matthew graham

Page 21: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

keysCREATE TABLE star (name varchar(20), ra float, dec float, vmag float,

CONSTRAINT PRIMARY KEY (name))

A primarykeyisauniqueidentifierforarowandisautomaticallynotnull

CREATE TABLE star (name varchar(20), ..., stellarType varchar(8),CONSTRAINT stellarType_fk FOREIGN KEY (stellarType)REFERENCES stellarTypes(id))

A foreignkeyisareferentialconstraintbetweentwotablesidentifyingacolumninonetablethatreferstoacolumninanothertable.

matthew graham

Page 22: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

insertINSERT INTO tableName VALUES(val1, val2, . . . )

INSERT INTO star VALUES('Sirius', 101.287, -16.716, -1.47)

INSERT INTO star(name, vmag) VALUES('Canopus', -0.72)

INSERT INTO starSELECT ...

matthew graham

Page 23: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

delete

DELETE FROM tableName WHERE conditionTRUNCATE TABLE tableNameDROP TABLE tableName

DELETE FROM star WHERE name = 'Canopus'

DELETE FROM star WHERE name LIKE 'C_n%'

DELETE FROM star WHERE vmag > 0 OR dec < 0

DELETE FROM star WHERE vmag BETWEEN 0 and 5

matthew graham

Page 24: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

update

UPDATE tableName SET columnName =val1WHERE condition

UPDATE star SET vmag = vmag + 0.5

UPDATE star SET vmag = -1.47 WHERE name LIKE 'Sirius'

matthew graham

Page 25: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

select

SELECT selectionList FROM tableList WHERE conditionORDER BY criteria

SELECT name, constellation FROM star WHERE dec > 0ORDER by vmag

SELECT * FROM star WHERE ra BETWEEN 0 AND 90

SELECT DISTINCT constellation FROM star

SELECT name FROM star LIMIT 5ORDER BY vmag

matthew graham

Page 26: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

joins

Innerjoin: combiningrelatedrows

SELECT * FROM star s INNER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * FROM star s, stellarTypes t WHERE s.stellarType = t.id

Outerjoin: eachrowdoesnotneedamatchingrow

SELECT * from star s LEFT OUTER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * from star s RIGHT OUTER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * from star s FULL OUTER JOIN stellarTypes t ON s.stellarType = t.id

matthew graham

Page 27: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

aggregatefunctions

COUNT,AVG,MIN,MAX,SUM

SELECT COUNT(*) FROM star

SELECT AVG(vmag) FROM star

SELECT stellarType, MIN(vmag), MAX(vmag) FROM starGROUP BY stellarType

SELECT stellarType, AVG(vmag), COUNT(id) FROM starGROUP BY stellarTypeHAVING vmag > 14

matthew graham

Page 28: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

views

CREATE VIEW viewName AS . . .

CREATE VIEW region1View ASSELECT * FROM star WHERE ra BETWEEN 150 AND 170

AND dec BETWEEN -10 AND 10

SELECT id FROM region1View WHERE vmag < 10

CREATE VIEW region2View ASSELECT * FROM star s, stellarTypes t WHERE s.stellarType = t.id

AND ra BETWEEN 150 AND 170 AND dec BETWEEN -10 AND 10

SELECT id FROM regionView2 WHERE vmag < 10 and stellarType LIKE 'A%'

matthew graham

Page 29: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

indexes

CREATE INDEX indexName ON tableName(columns)

CREATE INDEX vmagIndex ON star(vmag)

A clusteredindexisoneinwhichtheorderingofdataentriesisthesameastheorderingofdatarecords

Onlyoneclusteredindexpertablebutmultipleunclusteredindexes

TypicallyimplementedasB+treesbutalternatetypessuchasbitmapindexforhighfrequencyrepeateddata

matthew graham

Page 30: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

cursorsDECLARE cursorName CURSOR FOR SELECT ...OPEN cursorNameFETCH cursorName INTO ...CLOSE cursorName

A cursorisacontrolstructureforsuccessivetraversalofrecordsinaresultset

Slowestwayofaccessingdata

matthew graham

Page 31: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

cursorsexample

Foreachrowintheresultset, updatetherelevantstellarmodel

DECLARE @name varchar(20)DECLARE @mag floatDECLARE starCursor CURSOR FOR

SELECT name, AVG(vmag) FROM starGROUP BY stellarType

OPEN starCursorFETCH starCursor INTO @name, @magEXEC updateStellarModel @name, @mag / CALL updateStellarModel(@name, @mag)

CLOSE starCursor

matthew graham

Page 32: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

triggersCREATE TRIGGER triggerName ONtableName ...

A trigger is procedural code that isautomatically executed in response tocertaineventsonaparticulartable:• INSERT• UPDATE• DELETE

CREATE TRIGGER starTrigger ON star FOR UPDATE ASIF @@ROWCOUNT = 0 RETURNIF UPDATE (vmag) EXEC refreshModels

GO

matthew graham

Page 33: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

storedprocedures

CREATE PROCEDURE procedureName @param1type, . . .AS ...

CREATE PROCEDURE findNearestNeighbour @starName varchar(20) ASBEGIN

DECLARE @ra, @dec floatDECLARE @name varchar(20)SELECT @ra = ra, @dec = dec FROM star WHERE name LIKE @starNameSELECT name FROM getNearestNeighbour(@ra, @dec)

END

EXEC findNearestNeighbour 'Sirius'

matthew graham

Page 34: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

normalisation

Firstnormalform: norepeatingelementsorgroupsofelementstablehasauniquekey(andnonullablecolumns)

Secondnormalform: nocolumnsdependentononlypartofthekey

StarName Constellation Area

Thirdnormalform: nocolumnsdependentonothernon-keycolumns

StarName Magnitude Flux

matthew graham

Page 35: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

programming

Java

import java.sql.*...String dbURL = "jdbc:mysql://127.0.0.1:1234/test";Connection conn = DriverManager.getConnection(dbUrl, "mjg", "mjg");Statement stmt = conn.createStatement();ResultSet res = stmt.executeQuery("SELECT * FROM star");...conn.close();

matthew graham

Page 36: Databases 101 - California Institute of Technologygeorge/aybi199/Graham_DB1.pdf · ADatabase Management System (DBMS)is a software package designed to store and manage databases matthew

programming

Python:

import MySQLdbCon = MySQLdb.connect(host="127.0.0.1", port=1234, user="mjg",

passwd="mjg", db="test")Cursor = Con.cursor()sql = "SELECT * FROM star"Cursor.execute(sql)Results = Cursor.fetchall()...Con.close()

matthew graham