2018 02 20_biological_databases_part2_v_upload
TRANSCRIPT
![Page 1: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/1.jpg)
![Page 2: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/2.jpg)
FBW
27-02-2018
Biological Databases
Wim Van Criekinge
![Page 3: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/3.jpg)
![Page 4: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/4.jpg)
![Page 5: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/5.jpg)
Syllabi left
![Page 6: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/6.jpg)
Project
![Page 7: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/7.jpg)
Big Data Volumes
![Page 8: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/8.jpg)
![Page 9: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/9.jpg)
Data Levels in Biological Research
![Page 10: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/10.jpg)
Primary dataDerived data
![Page 11: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/11.jpg)
Primary dataDerived dataInterpreted data/
knowledge
Experimental metadata
Analytical metadata
![Page 12: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/12.jpg)
Direct Queryability of Selected Bioinformatics Databases
Database Direct Querying? How? Modality DB Engine
ArrayExpressWarehouse Eventually http://www.ebi.ac.uk/aedw/ SQL Oracle
BioWarehouse Yeshttp://biowarehouse.ai.sri.com/
- need accountSQL Oracle/MySQL
Ensembl Yeshttp://www.ensembl.org/info/d
ata/download.htmlSQL MySQL
Mouse Genome Database Yes ask for account SQL Sybase
PharmGKB Yeshttp://www.pharmgkb.org/hom
e/projects/webservices/SOAP-based Oracle
Saccharomyces Genome
DatabaseEventuallyMaybe Oracle
Stanford Microarray Database No Oracle
![Page 13: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/13.jpg)
Example 3-tier model in biological database
![Page 14: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/14.jpg)
Rationale to study SQL
• Learning SQL to Query Biological Databases
• Are you getting too much data, or missing crucial data when querying via a Web interface? Many biological databases can be queried directly via SQL, thus bypassing the limitations of the database's interface. SQL is at the heart of biological databases as diverse as Ensembl, ArrayExpress and the Mouse Genome Database.
• Focus on understanding relational databases by studying a repository of biological data and learn how to query it using SQL.
![Page 15: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/15.jpg)
SQL consists of only 4 statements, sometimes
referred to as CRUD:
–Create - INSERT - to store new data
–Read - SELECT - to retrieve data
–Update - UPDATE - to change or modify
data.
–Delete - DELETE - delete or remove data
Structured Query Language
![Page 16: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/16.jpg)
Definitions
Definitions
• Database: Collection of tables • Table
– Collection of records that share a common fundamental characteristic
• E.g., patients and locations can each be stored in their own table
• Record
– Basic unit of information in a relation information
– A record is composed of fields
– E.g., 1 record per person
• Query
– Set of instructions to a database “engine” to retrieve, sort and format returning data.
• “find me all patients in my database”
![Page 17: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/17.jpg)
Table Characteristics
• Two-dimensional structure with rows and columns
• Rows (tuples) represent single entity
• Columns represent attributes
• Row/column intersection represents single value
• Tables must have an attribute to uniquely identify each row
• Column values all have same data format
• Each column has range of values called attribute domain
• Order of the rows and columns is immaterial to the DBMS
![Page 18: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/18.jpg)
Numeric Data Types
MySQL uses all the standard ANSI SQL numeric data types, so if you're coming to MySQL from a different
database system, these definitions will look familiar to you. The following list shows the common numeric data
types and their descriptions:
INT - A normal-sized integer that can be signed or unsigned. If signed, the allowable range is from -2147483648 to
2147483647. If unsigned, the allowable range is from 0 to 4294967295. You can specify a width of up to 11 digits.
TINYINT - A very small integer that can be signed or unsigned. If signed, the allowable range is from -128 to 127.
If unsigned, the allowable range is from 0 to 255. You can specify a width of up to 4 digits.
SMALLINT - A small integer that can be signed or unsigned. If signed, the allowable range is from -32768 to
32767. If unsigned, the allowable range is from 0 to 65535. You can specify a width of up to 5 digits.
MEDIUMINT - A medium-sized integer that can be signed or unsigned. If signed, the allowable range is from -
8388608 to 8388607. If unsigned, the allowable range is from 0 to 16777215. You can specify a width of up to 9
digits.
BIGINT - A large integer that can be signed or unsigned. If signed, the allowable range is from -
9223372036854775808 to 9223372036854775807. If unsigned, the allowable range is from 0 to
18446744073709551615. You can specify a width of up to 20 digits.
FLOAT(M,D) - A floating-point number that cannot be unsigned. You can define the display length (M) and the
number of decimals (D). This is not required and will default to 10,2, where 2 is the number of decimals and 10 is
the total number of digits (including decimals). Decimal precision can go to 24 places for a FLOAT.
DOUBLE(M,D) - A double precision floating-point number that cannot be unsigned. You can define the display
length (M) and the number of decimals (D). This is not required and will default to 16,4, where 4 is the number of
decimals. Decimal precision can go to 53 places for a DOUBLE. REAL is a synonym for DOUBLE.
DECIMAL(M,D) - An unpacked floating-point number that cannot be unsigned. In unpacked decimals, each
decimal corresponds to one byte. Defining the display length (M) and the number of decimals (D) is required.
NUMERIC is a synonym for DECIMAL.
![Page 19: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/19.jpg)
String Data Type
• Although numeric and date types are fun, most data you'll store will be in string format. This list describes
the common string datatypes in MySQL.
• CHAR(M) - A fixed-length string between 1 and 255 characters in length (for example CHAR(5)), right-
padded with spaces to the specified length when stored. Defining a length is not required, but the default
is 1.
• VARCHAR(M) - A variable-length string between 1 and 255 characters in length; for example
VARCHAR(25). You must define a length when creating a VARCHAR field.
• BLOB or TEXT - A field with a maximum length of 65535 characters. BLOBs are "Binary Large Objects"
and are used to store large amounts of binary data, such as images or other types of files. Fields defined
as TEXT also hold large amounts of data; the difference between the two is that sorts and comparisons
on stored data are case sensitive on BLOBs and are not case sensitive in TEXT fields. You do not specify
a length with BLOB or TEXT.
• TINYBLOB or TINYTEXT - A BLOB or TEXT column with a maximum length of 255 characters. You do
not specify a length with TINYBLOB or TINYTEXT.
• MEDIUMBLOB or MEDIUMTEXT - A BLOB or TEXT column with a maximum length of 16777215
characters. You do not specify a length with MEDIUMBLOB or MEDIUMTEXT.
• LONGBLOB or LONGTEXT - A BLOB or TEXT column with a maximum length of 4294967295
characters. You do not specify a length with LONGBLOB or LONGTEXT.
• ENUM - An enumeration, which is a fancy term for list. When defining an ENUM, you are creating a list of
items from which the value must be selected (or it can be NULL). For example, if you wanted your field to
contain "A" or "B" or "C", you would define your ENUM as ENUM ('A', 'B', 'C') and only those values (or
NULL) could ever populate that field.
![Page 20: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/20.jpg)
Built-in Data Types in SQL
• date: Dates, containing a (4 digit) year, month and
date
– Example: date ‘2005-7-27’
• time: Time of day, in hours, minutes and seconds.
– Example: time ‘09:00:30’ time ‘09:00:30.75’
• timestamp: date plus time of day
– Example: timestamp ‘2005-7-27 09:00:30.75’
• interval: period of time
– Example: interval ‘1’ day
– Subtracting a date/time/timestamp value from another gives
an interval value
– Interval values can be added to date/time/timestamp values
![Page 21: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/21.jpg)
Build-in Data Types in SQL (Cont.)
• Can extract values of individual fields from
date/time/timestamp
– Example: extract (year from r.starttime)
• Can cast string types to
date/time/timestamp
– Example: cast <string-valued-expression>
as date
– Example: cast <string-valued-expression>
as time
![Page 22: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/22.jpg)
User-Defined Types
• create type construct in SQL creates user-defined
type
create type Dollars as numeric (12,2) final
• create domain construct in SQL-92 creates user-
defined domain types
create domain person_name char(20) not null
• Types and domains are similar. Domains can have
constraints, such as not null, specified on them.
![Page 23: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/23.jpg)
Large-Object Types
• Large objects (photos, videos, CAD files, etc.)
are stored as a large object:
– blob: binary large object -- object is a large
collection of uninterpreted binary data (whose
interpretation is left to an application outside of
the database system)
– clob: character large object -- object is a large
collection of character data
– When a query returns a large object, a pointer is
returned rather than the large object itself.
![Page 24: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/24.jpg)
Creating a Table
• To create a table, use the CREATE TABLE command:
mysql> CREATE TABLE pet (
-> name VARCHAR(20),
-> owner VARCHAR(20),
-> species VARCHAR(20),
-> sex CHAR(1),
-> birth DATE, death DATE);
Query OK, 0 rows affected (0.04 sec)
![Page 25: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/25.jpg)
Keys
• One or more attributes that
determine other attributes
– Key attribute
– Composite key
• Full functional dependence
• Entity integrity
– Uniqueness
– No ‘null’ value in key
![Page 26: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/26.jpg)
Example Tables
![Page 27: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/27.jpg)
Simple Relational Database
![Page 28: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/28.jpg)
Keys (con’t.)
• Primary key
– Candidate key to uniquely identify all other
attributes in a given row
• Foreign key
– Values must match primary key in another
table
![Page 29: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/29.jpg)
Integrity Rules
• Entity integrity
– Ensures all entities are unique
– Each entity has unique key
• Referential integrity
– Foreign key must have null value or match
primary key values
– Makes it impossible to delete row whose
primary key has mandatory matching foreign
key values in another table
![Page 30: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/30.jpg)
Integrity Constraints
• Integrity constraints guard against
accidental damage to the database, by
ensuring that authorized changes to the
database do not result in a loss of data
consistency.
– A checking account must have a balance
greater than $10,000.00
– A salary of a bank employee must be at
least $4.00 an hour
– A customer must have a (non-null) phone
number
![Page 31: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/31.jpg)
Referential Integrity
• Ensures that a value that appears in one relation for a given set of
attributes also appears for a certain set of attributes in another
relation.
– Example: If “Perryridge” is a branch name appearing in one of the
tuples in the account relation, then there exists a tuple in the
branch relation for branch “Perryridge”.
• Primary and candidate keys and foreign keys can be specified as part
of the SQL create table statement:
– The primary key clause lists attributes that comprise the primary
key.
– The unique key clause lists attributes that comprise a candidate
key.
– The foreign key clause lists the attributes that comprise the
foreign key and the name of the relation referenced by the foreign
key. By default, a foreign key references the primary key attributes
of the referenced table.
![Page 32: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/32.jpg)
![Page 33: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/33.jpg)
![Page 34: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/34.jpg)
![Page 35: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/35.jpg)
![Page 36: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/36.jpg)
![Page 37: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/37.jpg)
![Page 38: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/38.jpg)
![Page 39: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/39.jpg)
![Page 40: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/40.jpg)
![Page 41: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/41.jpg)
![Page 42: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/42.jpg)
Install BIOSQL locally
• Get latest version of mysql (MAMP,
mariaDB)
• Download biosqldb-mysql.sql
• Remove type=innodb
• Launch database server
• Connect using toad (port 8889)
• Create database biosql;
• Set as active database
• Use worksheet to execute biosqldb-
mysql.sql
![Page 43: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/43.jpg)
Typical ODBC Architecture
![Page 44: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/44.jpg)
How ODBC Works?
• ODBC inserts a middle layer called a Client
Driver
• Purpose of the Client Driver is to translate the
applications queries into commands that the
DBMS understands
![Page 45: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/45.jpg)
Other Parts of the Architecture
• Application – calls functions defined in the ODBC API to access a data source
• Driver Manager – implements the ODBC API and provides information to the application
• Database – contains all the data
![Page 46: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/46.jpg)
Setting Up a Data Source in Windows
• A data source is just a database that ODBC connects to
• This allows the person to change database types without any changes to the program
• Step 1. Get a database.
– Using access for this example because it’s on this computer
![Page 47: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/47.jpg)
Conclusion
The four parts that make up ODBC:
![Page 48: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/48.jpg)
Conclusion
• Advantages:
– Allows access to different types of databases
– Uniform way of retrieving information
– Highly efficient
– Low memory requirements
• Disadvantages
– Complex and steep learning curve
– All data in database must look like a relational
database
![Page 49: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/49.jpg)
Conclusion
• ODBC changed the way people code their programs
that have to interact with databases.
• The efficiency of programmers in the business world
has increased due to ODBC because they no longer
are wasting time to create multiple copies of a
program.
• ODBC has vastly improved the way programmers
deal with databases.
![Page 50: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/50.jpg)
![Page 51: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/51.jpg)
![Page 52: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/52.jpg)
![Page 53: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/53.jpg)
![Page 54: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/54.jpg)
Application/utilities
![Page 55: 2018 02 20_biological_databases_part2_v_upload](https://reader031.vdocuments.us/reader031/viewer/2022022419/5aacd1dd7f8b9a1f368b45b5/html5/thumbnails/55.jpg)
Project