msc bioinformatics 2006 biocomputing databases and perl 13 th february 2006

MSc Bioinformatics 2006

BiocomputingDatabases and Perl

13th February 2006


Client

HTTP

Web Server

CGI

Application ServerRequest

Response


e.g. PC + Apache +PERL (with CGI module) + MySQL

Client

HTTP

Web Server

CGI DB API

DB Server +DB

Application Server


The Database Tier• stores and retrieves data.

• manages updates, allows simultaneous (concurrent) access from web servers, provides security, ensures the integrity of data

• requires complex software.

– Database Management Systems (DBMSs)

• searches and manages data that's stored in databases

• a database is a collection of related data, and an application can have more than one database

– interface is accessed using SQL (Standard Query Language) that's used to define and manipulate databases and data


Why use a database server?

• more than one user who needs to access the data at the same time.

• a moderate amount of data• relationships between the stored data items• more than one kind of data object• constraints that must be rigidly enforced on the data,

such as field lengths, field types etc..• If data must be queried to produce reports or results.• large amount of data that must be searched quickly.• If security is important• When adding, deleting, or modifying data is a complex

process.


Relational databases

• Entities and Attributes– An entity is a person, place, event, or thing about

which data is collected– An entity set is a named collection of entities sharing

common characteristics– Attributes are characteristics of the entity– Attributes are also called fields

• Tables– A table holds related entities or an entity set– Also called relations– Comprised of rows and columns


Tables

• Two-dimensional structure with rows and columns• Each row (tuple) represents a single entity• Columns represent attributes• Row/column intersection represents single value• Tables must have an attribute to uniquely identify each

row, a unique key• Column values all have same data format• Each column has range of values called attribute domain• Order of the rows and columns is immaterial to the

DBMS


Fingerprint

Motif

Sequence


Entity-relationship model


Relational logical model


mySQL

• Command-line interpreter– UNIX/Mac

/usr/local/bin/mysql -uhugh -pshhh

– WindowsStart > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client


Managing databases• Creating databases

mysql> CREATE DATABASE phyloprints;

• To use that databaseuse phyloprints;

• Creating tables

CREATE TABLE "fingerprint" ( "fprint_accn" varchar(15), "identifier" varchar(15), "motifs" int2, "date" date, "up_date" date, "family_title" text, "family_doc“ varchar(18), "category" text, "second_accn" varchar(15), "pseudo" bool);

SHOW CREATE TABLE wine;


Modifiers

• Are applied to attributes– NOT NULL (row can’t exist without this

attribute having a value)

identifier varchar(15) NOT NULL,

– DEFAULT (sets the data to the value that follows when no data is supplied)pseudo" bool DEFAULT f,


Keys

• Primary key – one or more attributes that uniquely identify a row in a table– It is essential that every table has one

PRIMARY KEY (fprint_accn), key names*fprint_accn,identifier) type=MyISAM;


Removing databases and tables

• DROP statement (make sure you want to do this – it doesn’t check!)

DROP TABLE seqn;

DROP DATABASE phyloprints;


Inserting/Updating/Deleting Data

• Inserting DataINSERT INTO fingerprint VALUES (‘PR90008’,

‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t);

• Inserting lots of data rowsINSERT INTO fingerprint VALUES (‘PR90008’,

‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t),(‘PR00240’,’ADRENRGCA1DR’,7,1996-08-14,2000-02-18,’Alpha-1D adrenergic receptor signature’,’PDOC00240’, f);


• DELETE statement– All data

DELETE FROM fingerprint; – Deleting specific rows – use WHEREDELETE FROM fingerprint WHERE identifer = ‘CRYSTALLIN’;

• UPDATE statementUPDATE fingerprint SET fprint_accn = ‘PR00001’ WHERE identifier=‘CRYSTALLIN’;


Querying with SQL

• Key operators– SELECT

– PROJECT

– JOIN

• Other operators– INTERSECT

– UNION (union compatible tables)

– DIFFERENCE

– PRODUCT

– DIVIDE


Querying databases with SQL

• SELECT statementSELECT identifier, fprint_accn FROM fingerprint;

SELECT * FROM fingerprint;

• WHERE clauseSELECT * from fingerprint WHERE fprint_accn=‘PR00001’;


• select identifier,fprint_accn from fingerprint where pseudo='t' ORDER BY fprint_accn;

• select fprint_accn,COUNT(*) from fingerprint where pseudo='t' GROUP BY fprint_accn;

• select DISTINCT seqn_accn from motif;• select distinct seqn_accn from motif LIMIT

10;


Join queries

• output data that's based on relationships between two or more tables


Diagram taken from Jane Mabey

MSc Bioinformatics 2006 Diagram taken from Jane Mabey


Querying databases using Perl

#! /usr/bin/perl -w # intro6.pl - connect to MySQL, retrieve data, write plain text output use strict; use DBI;

my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1});

$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); $count = 0; while (my @val = $sth->fetchrow_array ()) { printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++

$count; } print "count rows total\n"; $sth->finish (); $dbh->disconnect (); exit (0);


%./intro6.pl name = Fargo-Moorhead Twins, wins = 36, losses = 16 name = Winnipeg Maroons, wins = 24, losses = 26 name = Minot Why Nots, wins = 19, losses = 23 name = Warren Wanderers, wins = 16, losses = 30 4 rows total


Perl example…use DBI; my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database: " .

DBI->errstr; my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die "Couldn't

prepare statement: " . $dbh->errstr; print "Enter name> "; while ($lastname = <>) { # Read input from the user my @data; chomp $lastname; $sth->execute($lastname) # Execute the query or die "Couldn't execute statement: " . $sth->errstr; # Read the matching records and print them out while (@data = $sth->fetchrow_array()) {

my $firstname = $data[1]; my $id = $data[2]; print "\t$id: $firstname $lastname\n";

} if ($sth->rows == 0) {

print "No names matched `$lastname'.\n\n"; } $sth->finish; print "\n"; print "Enter name> "; } $dbh->disconnect;


Query databases over the Web

#! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write HTML output

use strict; use DBI; use CGI qw(:standard); my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev",

"webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); print header(), start_html ("team data"); $count = 0; while (my @val = $sth->fetchrow_array ()) {

print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2])); ++$count;

} print p ("$count rows total"), end_html (); $sth->finish (); $dbh->disconnect (); exit (0);


More about data retrieval with DBI.pm

• fetchrow_arrayref

$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");

$sth->execute ();

while (my $ref = $sth->fetchrow_arrayref ()) {

printf "name = %s, wins = %d, losses = %d\n", $ref->[0], $ref->[1], $ref->[2];

}

$sth->finish ();


#!/usr/bin/perl

use DBI;

my $db = “sequence”;my $server = ‘localhost’;my $user = ‘root’;my $passwd = ‘passwd’;

my $dbConnection = DBI->connect(“dbi::mysql:$db:$server”,$user,$passwd)

my $query = “show tables”;my $sql = $dbConnection->prepare($query);$sql ->execute();while(my $row = $sql->fetchrow_arrayref){

print join(“\t”,@$row),”\n”;}$dbConnection->disconnect;exit;


Higher retrieval methods

• selectrow_array()– to retrieve a single row or a single column

value • selectcol_arrayref()

– returns the first column of a result set, as a reference to an array of values.

• selectall_arrayref() – retrieves the entire result set as a matrix and

returns a reference to it


Inserting data in the databasesub new_employee { # Arguments: database handle; first and last names of new

employee; # department ID number for new employee's work assignment

my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO

employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE

departments SET num_members = num_members + 1 WHERE id = ?');

die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle;

$insert_handle->execute($first, $last, $department) or return 0; $update_handle->execute($department) or return 0; return 1; # Success

}


Inserting data in the databasesub new_employee { # Arguments: database handle; first and last names of new

employee; # department ID number for new employee's work assignment

my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees

VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET

num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined

$insert_handle && defined $update_handle; my $success = 1; $success &&= $insert_handle->execute($first, $last, $department);

$success &&= $update_handle->execute($department); my $result = ($success ? $dbh->commit : $dbh->rollback); unless

($result) { die "Couldn't finish transaction: " . $dbh->errstr } return $success;

}


• A short cut for DELETING, UPDATING and INSERTING data use the do statement

$dbh->do('DELETE FROM people WHERE age > 65');


Reminders

• Executable CGI files

• Configuring Apache to read cgi files from cgi-bin

• In httpd.conf

• ScriptAlias /cgi-bin/ "C:/Program Files/Apache Group/Apache2/cgi-bin/"

msc bioinformatics 2006 biocomputing databases and perl 13 th february 2006

Documents