avancier methods (am)grahamberrisford.com/am 2 methods support...row in a table an rdbms suits...

29
Avancier Copyright Avancier Limited Avancier Methods (AM) From logical model to physical database Data structures It is illegal to copy, share or show this document (or other document published at http://avancier.co.uk) without the written permission of the copyright holder

Upload: others

Post on 15-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

Avancier Methods (AM) From logical model to physical database

Data structures

It is illegal to copy, share or show this document

(or other document published at http://avancier.co.uk)

without the written permission of the copyright holder

Page 2: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Logical domain models

Data in Storage

► technology neutral,

► but usually focused on one data store

Copyright Avancier Limited 2015

Conceptual

Logical

Physical

Real

Domain Model

Physical Data Schema

Data Store

Page 3: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Physical data models

Data in Storage

Copyright Avancier Limited 2015

Conceptual

Logical

Physical

Real

Domain Model

Physical Data Schema

Data Store

► A physical data schema or model defines a

data structure in a way that is tuned to

specific technologies and NFRs

Page 4: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Physical data models: databases

► The physical data model (though it holds the same data as the

logical data model) can be different from the logical one

► Ranging from

■ Slightly denormalised relational database, to

■ Radically restructured non-relational database

Copyright Avancier Limited

Page 5: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Physical database schema types include

► Structured data

■ Relational (RDBMS)

■ Key-value store

■ Column store

■ Document store (e.g. XML or JSON

documents)

► Unstructured data (e.g. Hadoop)

■ Data can be anything from free-

form to structured

■ A benefit being you can store

whatever data is captured, then

glue different data sets together

later

► Non-relational databases are

sometimes misleadingly called

NoSQL databases.

► They need some kind of SQL if they

are to support the coding of queries.

Copyright Avancier Limited 2015

Page 6: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Relational (RDBMS)

Copyright Avancier Limited

► Entities appear in rows of one table

► A typical online transaction updates one

row in a table

► An RDBMS suits OLTP-like workloads

► With many discrete transactions

RowId Deptid EmpId Lastname Firstname Salary

001 1 10 Smith Joe 40000

002 7 12 Jones Mary 50000

003 5 11 Johnson Cathy 44000

004 3 22 Jones Bob 55000

HR Database

DEPARTMENT entity

DeptNumber P Key

DeptName

DeptManager

EMPLOYEE entity

DeptNumber * F Key

Empid P Key

LastName

FirstName

Salary

Page 7: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier RDBMS: Default data architecture solution?

► It is easy and natural to build a relational database to hold the data

defined in a logical data model.

► Many enterprise applications have been built on top of a single

relational database.

► It is the default data architecture for an enterprise application, and

should always be considered.

► You need good reasons to store data in a different way.

► And those reasons are usually non-functional requirements rather

than functional ones.

Copyright Avancier Limited

Page 8: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Optimisation the LDM to form a Physical Relational Data Model

► Identify processes to be supported, especially

■ batch input and output processes

■ predicted queries and required reports

■ processes that are frequent or have long access paths

► Ensure the data model contains data needed by those processes

► Facilitate process access paths

■ Do access path analysis

■ Add derivable data

■ Add derivable relationships

■ Otherwise denormalise the data structure

Copyright Avancier Limited

Page 9: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Customer Order History Customer id

Customer name and address Orders Placed Order id

Order value Products Ordered Product type Product amount Products Ordered End

Order Placed End Customer Order History END

Product Demand Product type

Amount on hand Products ordered Product amount Order id Products Ordered End

Product Demand End

Consider how data will be serialised into a required data flow

Every data store can be serialised

into a hierarchical / sequential data flow

Copyright Avancier Limited

Serialisation

Serialisation ORDER

Customer id *

Order id

Order value

LINE ITEM

Line item number

Order id *

Product type *

Product amount

PRODUCT

Product type

Amount in stock

Unit price

CUSTOMER

Customer id

Customer n & a

Page 10: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Don’t forget the numbers: international car parts manufacturer

Copyright Avancier Limited

Logical Data Model

Order Item

Order

Part Customer 4,500,000 6,000,000

Ave 10 (1 to 100)

Ave 4

Nation 25

Ave 5

Region 5

Ave 200,000

Part

Supplier

Supplier 300,000

24,000,000

45,000,000

179,998,372

Ave 30

Page 11: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

Don’t forget the numbers: school stationary supplies

► Given a Logical Data Model, define

■ the volumes of kernel entities,

■ the population of each relationship,

■ expected growth rates.

The numbers influence

the physical design

■ Choice of database technology

Logical Data Model

Order

Item

Order

Product Customer 1,000 1,000

Ave 1,000 (1 to 5,000)

Ave 10

Ave 10,000 (1 to 50,000)

Page 12: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Customer Order History Customer id

Customer name and address Orders Placed Order id

Order value Products Ordered Product type Product amount Products Ordered End

Order Placed End Customer Order History END

Copyright Avancier Limited

Do access path analysis – is it feasible?

► “List all the products ordered by a customer”

► First, is it feasible? YES

Logical Data Model

Line

Item

Order

Product Customer

[skip if already in list]

Serialisation

Page 13: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

Do access path analysis – is it fast enough?

► The typical customer (a school) has

placed 1,000 orders, each with 10 items,

for our stationery products

► But each as ordered only 15 products

► You can remove redundant

accesses by adding redundant data

or redundant relationships

Physical Data Model

Line

Item

Order

Product Customer

Customer

Ordered

Product

Logical Data Model

Line

Item

Order

Product Customer

[skip if already in list]

Page 14: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

Don’t forget the numbers

► Given a Logical Data Model, define

■ the volumes of kernel entities,

■ the population of each relationship,

■ expected growth rates.

The numbers influence

the physical design

■ Choice of database technology

Logical Data Model

Line

Item

Order

Product Customer 1,000 1,000

Ave 1,000 (1 to 5,000)

Ave 10

Ave 10,000 (1 to 50,000)

Page 15: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Clustering related data on the same block/page

► Copying the price stops the price

change transaction affecting the

aggregate, which is probably the

business rule anyway

Copyright Avancier Limited

► An old-fashioned optimisation

technique

► Cluster detail entities with one of

several parents

► Rule of thumb: “the least

dependent occurrence rule”

► An Order has few Order Lines

► A Product has many

► So cluster the Order Lines on the

same block/page as the Order

ORDER

Customer id

Order id

Order value

LINE ITEM

Order id

Line item number

Product type *

Product amount

Unit price

PRODUCT TYPE

Product type

Amount in stock

Unit price

Page 16: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

Separation of update and query data stores

► You can remove redundant accesses by adding redundant data or

redundant relationships.

► This includes adding indexes.

► Where adding redundant data and relations is not enough, then you

can separate data stores for updates and queries.

► This is a common and powerful technique, since you can de-

normalise the query database.

Page 17: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier

Copyright Avancier Limited

OLAP database OLTP database

Denormalisation for faster enquiry/report processes (naive picture)

CUSTOMER INDEX

Customer number Primary Key

ORDER FACT TABLE

Customer number Foreign Key

Order number Primary Key

Order amount

Customer name

Customer address

Product type Foreign Key

Product description

Unit price

PRODUCT INDEX

Product type Primary Key

CUSTOMER entity

Customer number Primary Key

Customer name

Customer address

ORDER entity

Customer number Foreign Key

Order number Primary Key

Order amount

Product type Foreign Key

PRODUCT entity

Product type Primary Key

Product description

Unit price

Page 18: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Queries that take a long time

► One attribute value may appear in only a few of many rows

► Q) List all employees with Lastname = Jones

► The query must look at every row in the whole table unless

■ a) we introduce an index with an entry for every attribute value

■ b) we use a data store designed to support such a query

Copyright Avancier Limited

RowId Deptid EmpId Lastname Firstname Salary

001 1 10 Smith Joe 40000

002 7 12 Jones Mary 50000

003 5 11 Johnson Cathy 44000

004 3 22 Jones Bob 55000

Page 19: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Alternatively

► Where de-normalising the query database is still not enough

■ meaning some queries still take too long or hog the resources.

► Then look at non-relational databases

■ that restructure the logical data model in a more radical way.

Copyright Avancier Limited

Page 20: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Column store

► Attribute values appears as rows.

► Rows appear as attributes

► Column Store

► Suits OLAP-like workloads (e.g. data warehouses)

► A smaller number of complex queries over lots of data

► Efficient where columns have many duplicate values (including null values)

Copyright Avancier Limited

Empid Rows

10 001

12 002

11 003

22 004

Lastname Rows

Smith 001

Jones 002, 004

Johnson 003

Firstname Rows

Joe 001

Mary 002

Cathy 003

Bob 004

Salary Rows

40000 001

5000 002

44000 003

55000 004

Page 21: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Column Store

► One application loads one million rail tickets every night into an RDBMS, for

apportionment of money between stakeholders

► Another application loads those tickets into a data warehouse for analysis.

► Much duplication of attribute values in different tickets/rows.

Copyright Avancier Limited

Ticketid From To Date Time Seat

001 Lond Newc 30122030 10.30 A36

002 Newc Edin 30122030 10.30 A36

003 Edin. Newc 30122030 18.30 A36

004 Newc Lond 30122030

From Tickets

Lond 001

Newc 002, 004

Edin 003

To Tickets

Lond 004

Newc 001, 003

Edin 002

Date Tickets

30122030 001, 002,003, 004

Seat Keys

A36 001, 002, 003,

Time Tickets

10.30 001, 002

18.30 003

Page 22: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Key Value database

► Like one giant database table, in which

every attribute value is a distinct row.

► Scalable to handle millions of queries

per second by adding more servers.

Copyright Avancier Limited

Key Value

001_empid 10

003_lastname Johnson

002_empid 12

003_empid 11

001_firstname Joe

004_empid 22

001_lastname Smith

002_lastname Jones

002_salary 5000

002_lastname Jones

003_firstname Cathy

004_firstname Bob

001_salary 40000

002_firstname Mary

003_salary 44000

004_salary 55000

Page 23: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Document store

► No pre-defined schema: documents are self-describing ■ <orderhistory>

● <firstname>Bob</firstname>

● <lastname>Smith</lastname>

● <email type=Home>[email protected]</email>

● <phone type=Cell>(123) 456-7890</phone>

● <phone type=Work>(890) 765-4321</phone>

● <order>

□ <ordernumber>123.</ordernumber>

□ <orderamount>200</orderamount>

□ <productnumber>4444</productnumber>

□ <productdescription>A4 ream</productdescription>

● </order>

● <order>

□ <ordernumber>124.</ordernumber>

□ <orderamount>70</orderamount

□ ….

□ ….

Copyright Avancier Limited

Order History SEQUENCE

Firstname

Lastname

Email addresses

Phone numbers

Set of Orders ITERATION

Order SEQUENCE

Order number

Order amount

Product type

Product description

Order END

Set of Orders END

Order History END

Page 24: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier A history of databases

► The history of databases is much about making it easier to

■ store data that is captured via data entry forms and transactions

■ retrieve the data that is wanted

■ store increasing volumes of data

■ ensure data quality and consistency

► Some of what follows is edited from “The history of data

processing” http://www.infoq.com/presentations/db-history-data-

processing

Copyright Avancier Limited

Page 25: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Databases since 1960

► Multi-Value, Hierarchical

■ PICK, IMS, IDS, ADABAS

► Relational

■ CODASYL, System R (SEQUEL), INGRES (QUEL), Mimer, Oracle

► RBMDS with SQL STANDARD

■ DB2, Teradata, Informix, Sybase, Postgres

► OODBMS, ORDBMS

■ Versant, Objectivity, Gemstone, Informix*, Oracle*

► MPP Query and NoSQL

■ Netezza, Paraccel, Vertica, MongoDB, CouchBase, Riak, Cassandra

► NewSQL

■ SciDB, NuoDB, JethroDB, Metanaulx

► Future?

■ Spanner, F1

Copyright Avancier Limited

Page 26: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Storing BIG data

► Horizontal partitioning

■ Divide the rows of a table into several tables and store them.

■ For example, 'User Table' with identical schema can be divided into

● 'User Table #0' in which users less than 13 years are stored

● 'User Table #1' in which users 13 or greater than 13 years old.

► Database sharding

■ Stores horizontal partitions in physically separate databases.

■ For example:

● Store users less than 13 years old in database 0

● Store users 13 or greater than 13 years old in database 1

► Done for performance and scalability

► Increases system complexity

Copyright Avancier Limited

Page 27: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier Increasing complexity

► Remember, anything not done by the DBMS becomes a

programmer’s task.

► An RDBMS is likely to provide all these services:

■ Standard API/Query Layer

■ Transactions/consistency

■ Query optimisation

■ Data navigation, joins

■ Data access *

■ Storage management *

► * A NoSQL database may give you only the last two.

► “Eventual consistency” is a nice way of saying “not correct”.

► “Transaction malleability” is a nice way of saying “broken”.

Copyright Avancier Limited

Page 28: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier When to use “NoSQL” databases?

► Is your data really just a collection forms, each accessible using

one key

■ Document store

► Is your data really just a giant hash lookup?

■ Key-value store

► Do you need full-text searching?

■ Text-indexing engine

► Will you get questions about your data that you can't predict?

■ Then better make sure your data also ends up in an RDBMS.

Copyright Avancier Limited

Page 29: Avancier Methods (AM)grahamberrisford.com/AM 2 Methods support...row in a table An RDBMS suits OLTP-like workloads With many discrete transactions RowId Deptid EmpId Lastname Firstname

Avancier KISS

► If you are storing structured business data

► Start with understanding the logical data model

► Look at whatever database best suits your data entry and

retrieval/reporting requirements

► You might end up needing an RDBMS for

■ Consistency

■ Unpredictable queries

Copyright Avancier Limited