what's so unique about a columnar database?
TRANSCRIPT
www.flydata.com
What's So Unique
About a Columnar
Database?
Copyright © 2015 FlyData Inc. All rights reserved. www.flydata.com
Types of Database Technologies
Copyright © 2015 FlyData Inc. All rights reserved.
● Looking for the right database technology to use?
Luckily there are many database technologies to
choose from…
○ Relational Databases (MySQL, Postgres)
○ NoSQL (MongoDB)
○ Columnar Databases (Amazon Redshift,
BigQuery)
○ and others…
● Each choice has its own pros and cons!
● Today let’s compare it against the more traditional
row-oriented database (e.g., MySQL).
www.flydata.com
Row-Oriented
Database
Copyright © 2015 FlyData Inc. All rights reserved. www.flydata.com
Row-Oriented Database
Copyright © 2015 FlyData Inc. All rights reserved. www.flydata.com
● Traditional, row-oriented databases
store data by row.
● The fields for each record are
sequentially stored. Let’s say you
have a table like this..
This two-dimensional table would be stored in a row-oriented database like this:
● As you can see, a record’s fields are stored one by one, then the next record’s
fields are stored, then the next, and so on.
Columnar Database
Copyright © 2015 FlyData Inc. All rights reserved. www.flydata.com
● Contrast the previous slide with how a columnar
database would store this data:
● Each field is stored by the column so that each ‘id’ is
stored, then the ‘name’ column, then the ‘zip codes’,
etc.
● So what implications are there when storing data in a
column-oriented fashion?
The Advantages of Columnar Databases, and the Disadvantages
● Imagine for example, that you wanted to know the
average age of all your users
● Instead of looking up the age for each record row-
by-row (row-oriented database), you can simply
jump to the area where the “age” data is stored
and read just the data you need.
● So in a nutshell, columnar storage lets you skip
over non-relevant data pretty quickly
● Hence, aggregation queries (queries where you
only need to look up small subsets of your total
data) could become really fast compared to row-
oriented databases .
● Since the data type for each column is similar:
○ Better compression on each column
○ Queries are even faster!
Cons
● Querying against a few rows
● Writing new data could take more time
○ You need to write each column one by
one
○ Inserting a record into a row-oriented
database can simply take just one
operation
○ Therefore, loading new data or updating
many values could take much more time.
Cons
● This is why you would want a row-oriented
database running the back-end of your web
app, etc.
● Once your app becomes huge, you might
want to consider having a columnar
database
○ like Amazon Redshift!
○ For BI analytics queries!
● We’ve seen many companies that make web
apps or mobile games go through this same
transition.
Conclusion
Copyright © 2015 FlyData Inc. All rights reserved. www.flydata.com
Conclusion
To summarize, columnar databases are good for:
● Queries that involve only a few columns
● Aggregation queries against vast amounts of data
● Column-wise compression
But are not so good at:
● Incremental data loading
● Online transaction processing (OLTP)
● Queries against only a few rows
www.flydata.com
Check us out!
-> http://flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift