advance sqlite3
TRANSCRIPT
Advance SQLite
Normal FormNormalization is the process of removing data duplication, more clearly defining key
relationships,and generally moving towards a more idealized database form. It is possible for
different tables in the same database to be at different levels.
Most people recognize five normal forms simply referred to as the First Normal Form
through the Fifth Normal Form. These are often abbreviated 1NF through 5NF. There
are also a few named forms, such as the Boyce-Codd Normal Form (BCNF). Most of
these other forms are roughly equivalent to one of the numbered forms. For example,
BCNF is a slight extension to the Third Normal Form. Some folks also recognize higher
levels of normalization, such as a Sixth Normal Form and beyond, but these extreme
levels of normalization are well beyond the practical concerns of most database
designers.
NormalizationThe normalization process is useful for two reasons. First, normalization specifies design
criteria that can act as a guide in the design process. If you have a set of tables that
are proving to be difficult to work with, that often points to a deeper design problem
or assumption. The normalization process provides a set of rules and conditions that
can help identify trouble spots, as well as provide possible solutions to reorganize the
data in a more consistent and clean fashion.
DenormalizationNormalizing a database and spreading the data out into different tables means that
queries usually involve joining several tables back together. This can occasionally lead
to performance concerns, especially for complex reports that require data from a large
number of tables. These concerns can sometimes lead to the process of denormalization,
where duplicate copies of the same data are intentionally introduced to reduce the
number of joins required for common queries. This is typically done on systems that
are primarily read-only, such as data-warehouse databases, and is often done by computing
temporary tables from properly normalized source data.
The First Normal Form
The First Normal Form, or 1NF, is the lowest level of normalization. It is primarily
concerned with making sure a table is in the proper format. There are three conditions
that must be met for a table to be in 1NF.
The first condition relates to ordering. To be in 1NF, the individual rows of a table
cannot have any meaningful or inherent order. Each row should be an isolated, standalone
record
The Second Normal FormThe Second Normal Form, or 2NF, deals with compound keys (multicolumn keys) and
how other columns relate to such keys. 2NF has only one condition: every column that
is not part of the primary key must be relevant to the primary key as a whole, and not
just a sub-part of the key.
The Third Normal Form
The Third Normal Form, or 3NF, extends the 2NF to eliminate transitive key dependencies.
A transitive dependency is when A depends on B, and B depends on C, and
therefore A depends on C. 3NF requires that each nonprimary key column has a direct
(nontransitive) dependency on the primary key
For example, consider an inventory database that is used to track laptops at a small
business. The laptop table will have a primary key that uniquely identifies each laptop,
such as an inventory control number. It is likely the table would have other columns
that include the make and model of the machine, the serial number, and perhaps a
purchase date. For our example, the laptop table will also include a responsible_
person_id column. When an employee is assigned a laptop, their employee ID number
is put in this column
Higher Normal FormsWe’re not going to get into the details of BCNF, or the Fourth or Fifth (or beyond)
Normal Forms, other than to mention that the Fourth and Fifth Normal Forms start
to deal with inter-table relationships and how different tables interact with each other.
Most database designers make a solid effort to get everything into 3NF and then stop
worrying about it. It turns out that if you get the hang of things and tend to turn out
table designs that are in 3NF, chances are pretty good that your tables will also meet
the conditions for 4NF and 5NF, if not higher. To a large extent, the higher Normal
Forms are formal ways of addressing some edge cases that are somewhat unusual,
especially in simpler designs.
IndexesIndexes (or indices) are auxiliary data structures used by the database system to enforce
unique constraints, speed up sort operations, and provide faster access to specific records.
They are often created in hopes of making queries run faster by avoiding table
scans and providing a more direct lookup method.
How They WorkEach index is associated with a specific table. Indexes can be either single column or
multicolumn, but all the columns of a given index must belong to the same table. There
is no limit to the number of indexes a table can have, nor is there a limit on the number
of indexes a column can belong to. You cannot create an index on a view or on a virtual
table.
Internally, the rows of a normal table are stored in an indexed structure. SQLite uses
a B-Tree for this purpose, which is a specific type of multi-child, balanced tree. The
details are unimportant, other than understanding that as rows are inserted into the
tree, the rows are sorted, organized, and optimized, so that a row with a specific, known
ROWID can be retrieved relatively directly and quickly.
When you create an index, the database system creates another tree structure to hold
the index data. Rather than using the ROWID column as the sort key, the tree is sorted
and organized using the column or columns you’ve specified in the index definition.
The index entry consists of a copy of the values from each of the indexed columns, as
well as a copy of the corresponding ROWID value. This allows an indexed entry to be
found very quickly using the values from the indexed columns.
If we have a table like this:
CREATE TABLE tbl ( a, b, c, d );
And then create an index that looks like this:
CREATE INDEX idx_tbl_a_b ON tbl ( a, b );
The database generates an internal data structure that is conceptually similar to:
SELECT a, b, ROWID FROM tbl ORDER BY a, b;
If SQLite needs to quickly find all the rows where, for example, a = 45, it can use the sorted index to quickly jump to that range of values and extract the relevant index entries. If it’s looking for the value of b, it can simply extract that from the index and be done. If we need any other value in the row, it needs to fetch the full row. This is done by looking up the ROWID. The last value of any index entry is the ROWID of its corresponding table row. Once SQLite has a list of ROWID values for all rows where a = 45, it can efficiently look up those rows in the original table and retrieve the full row. If everything works correctly, the process of looking up a small set of index entries, and then using those to look up a small set of table rows, will be much faster and more efficient than doing a full table scan.
INTEGER PRIMARY KEYsWhen you declare one or more columns to be a PRIMARY KEY, the database system
automatically creates a unique index over those columns. The fundamental purpose of
this index is to enforce the UNIQUE constraint that is implied with every PRIMARY KEY. It
also happens that many database operations typically involve the primary key, such as
natural joins, or conditional lookups in UPDATE or DELETE commands. Even if the index
wasn’t required to enforce the UNIQUE constraint, chances are good you would want an
index over those columns anyway.
Transferring Design ExperienceIf you have some experience designing application data structures or class hierarchies,
you may have noticed some similarities between designing runtime structures and
database tables. Many of the organization principles are the same and, thankfully, much
of the design knowledge and experience gained in the application development world
will transfer to the database world.
Tables Are TypesThe most common misconception is to think of tables as instances of a compound data
structure. A table looks a whole lot like an array or a dynamic list, so it is easy to make
this mistake
Keys Are Backwards PointersAnother stumbling block is the proper use of keys. Keys are very similar to pointers. A
primary key is used to identify a unique instance of a data structure. This is similar to
the address of a record. Anything that wants to reference that record needs to record
its address as a pointer or, in the cases of databases, as a foreign key. Foreign keys are
essentially database pointers.
ClosingAs with application development, database design is part science and part art. It may
seem quite complex, with keys to set up, different relationships to define, Normal
Forms to follow, and indexes to create.
Thankfully, the basics usually fall into place fairly quickly. If you start to get into larger
or more complex designs, some reading on more formal methods of data modeling and
database design might be in order, but most developers can get pretty far by just leveraging
their knowledge and experience in designing application data structures. In the
end, you’re just defining data structures and hooking them together.
C Programming InterfaceAPI Overview
Even when using the programming interface, the primary way of interacting with your
data is to issue SQL commands to the database engine. This chapter focuses on the
core of the API that is used to convey SQL command strings to the database engine. It
is important to understand that there are no public functions to walk the internal
structure of a table or, for example, access the tree structure of an index.
Structure
The C API for SQLite 3 includes a dozen-plus data structures, a fair number of constants,
and well over one hundred different function calls. While the API is somewhat
large, using it doesn’t have to be complex. A fair number of the functions are highly
specialized and infrequently used by most developers. Many of the remaining functions
are simple variations of the same basic operation. For example, there are a dozen
variations on the sqlite3_value_xxx() function, such as sqlite3_value_int(),
sqlite3_value_double(), and sqlite3_value_text(). All of these functions perform the
same basic operation and can be considered simple type variations of the same basic
interface
All public API function calls and datatypes have the prefix sqlite3_, indicating they
are part of version 3.x of the SQLite product. Most of the constants, such as error codes,
use the prefix SQLITE_. The design and API differences between SQLite 2.x and 3.x were
significant enough to warrant a complete change of all API names and structures. The
depth of these changes required anyone upgrading from SQLite 2 to SQLite 3 to modify
their application, so changing the names of the API functions only helped keep the
names distinct and keep any version questions clear. The distinct names also allowed
applications that were in transition to link to both libraries at the same time.
In addition to the sqlite3_ prefix, public function calls can be identified by the use of
lowercase letters and underscores in their names. Private functions use run-together
capitalized words (also known as CamelCase). For example, sqlite3_create_function()
is a public API function (used to register a user-defined SQL function), while
sqlite3CreateFunc() is an internal function that should never be called directly.
Internal functions are not in the public header file, are not documented, and are subject
to change at any time.
Strings and Unicode
There are a number of API functions that have a 16 variant. For instance, both an
sqlite3_column_text() function and an sqlite3_column_text16() function are available.
The first requests a text value in UTF-8 format, while the second will request a text
value in UTF-16.
Error CodesSQLite follows the convention of returning integer error codes in any situation when
there is a chance of failure. If data needs to be passed back to the function caller, it is
returned through a reference parameter.
In all cases, if a function succeeds, it will return the constant SQLITE_OK, which happens
to have the value zero. If something went wrong, API functions will return one of the
standard error codes to indicate the nature of the error.
Structures and AllocationsAlthough the native SQLite API is often referred to as a C/C++ API, technically the
interface is only available in C. As mentioned in “Building”, the SQLite
source code is strictly C based, and as such can only be compiled with a C compiler.
Once compiled, the library can be easily linked to, and called from, both C and C++
code, as well as any other language that follows the C linking conventions for your
platform.
Although the API is written in C, it has a distinct object-like flavor. Most of the program
state is held in a series of opaque data structures that act like objects. The most common
data structures are database connections and prepared statements. You should never
directly access the fields of these data structures. Instead, functions are provided to
create, destroy, and manipulate these structures in much the same way that object
methods are used to manipulate object instances.
Library InitializationBefore an application can use the SQLite library to do anything, the library must first
be initialized. This process allocates some standard resources and sets up any OSspecific
data structures. By default, most major API function calls will automatically
initialize the library, if it has not already been initialized. It is considered a good practice
to manually initialize the library, however.
int sqlite3_initialize( )
Initializes the SQLite library. This function should be called prior to any other
function in the SQLite API. Calling this function after the library has already been
initialized is harmless. This function can be called after a shutdown to reinitialize
the library. A return value of SQLITE_OK indicates success.
When an application is finished using the SQLite library, the library should be shut down.
int sqlite3_shutdown( )
Releases any resources allocated by sqlite3_initialize(). Calling this function
before the library has been initialized or after the library has already been shut
down is harmless. A return value of SQLITE_OK indicates success.
Closing
To close and release a database connection, call sqlite3_close().
int sqlite3_close( sqlite3 *db )
Closes a database connection and releases any associated data structures. All temporary
items associated with this connection will be deleted.
Any pointer returned by a call to sqlite3_open_xxx(), including a NULL pointer, can
be passed to sqlite3_close(). This function verifies there are no outstanding changes
to the database, then closes the file and frees the sqlite3 data structure
Example#include <stdlib.h>
int main( int argc, char **argv )
{
char *file = ""; /* default to temp db */
sqlite3 *db = NULL;
int rc = 0;
if ( argc > 1 )
file = argv[1];
sqlite3_initialize( );
rc = sqlite3_open_v2( file, &db, SQLITE_OPEN_READWRITE |
SQLITE_OPEN_CREATE, NULL );
if ( rc != SQLITE_OK) {
sqlite3_close( db );
exit( -1 );
}
/* perform database operations */
sqlite3_close( db );
sqlite3_shutdown( );
}
The default filename is an empty string. If passed to sqlite3_open_xxx(), this will result
in a temporary, on-disk database that will be deleted as soon as the database connection
is closed. If at least one argument is given, the first argument will be used as a filename.
If the database does not exist, it will be created and then opened for read/write access.
It is then immediately closed.
Prepared Statements
Once a database connection is established, we can start to execute SQL commands.
This is normally done by preparing and stepping through statements. Statements are
held in sqlite3_stmt data structures.
Statement Life CycleThe life cycle of a prepared statement is a bit complex. Unlike database connections,
which are typically opened, used for some period of time, and then closed, a statement
can be in a number of different states. A statement might be prepared, but not run, or
it might be in the middle of processing
/* create a statement from an SQL string */
sqlite3_stmt *stmt = NULL;
sqlite3_prepare_v2( db, sql_str, sql_str_len, &stmt, NULL );
/* use the statement as many times as required */
while( ... )
{
/* bind any parameter values */
sqlite3_bind_xxx( stmt, param_idx, param_value... );
...
/* execute statement and step over each row of the result set */
while ( sqlite3_step( stmt ) == SQLITE_ROW )
{
/* extract column values from the current result row */
col_val = sqlite3_column_xxx( stmt, col_index );
...
}
/* reset the statement so it may be used again */
sqlite3_reset( stmt );
sqlite3_clear_bindings( stmt ); /* optional */
}
/* destroy and release the statement */
sqlite3_finalize( stmt );
stmt = NULL;
The prepare process converts an SQL command string into a prepared statement. That
statement can then have values bound to any statement parameters. The statement is
then executed, or “stepped through.” In the case of a query, each step will make a new
results row available for processing.
The statement can then be reset, allowing it to be re-executed with a new set of bindings.
Preparing a statement can be somewhat costly, so it is a common practice to reuse
statements as much as possible.
Prepare
To convert an SQL command string into a prepared statement, use one of the
sqlite3_prepare_xxx() functions:
int sqlite3_prepare( sqlite3 *db, const char *sql_str, int sql_str_len,
sqlite3_stmt **stmt, const char **tail )
int sqlite3_prepare16( sqlite3 *db, const void *sql_str, int sql_str_len,
sqlite3_stmt **stmt, const void **tail )
It is strongly recommended that all new developments use the _v2 version of these
functions.
int sqlite3_prepare_v2( sqlite3 *db, const char *sql_str, int sql_str_len,
sqlite3_stmt **stmt, const char **tail )
int sqlite3_prepare16_v2( sqlite3 *db, const void *sql_str, int sql_str_len,
sqlite3_stmt **stmt, const void **tail )
Converts an SQL command string into a prepared statement. The first parameter
is a database connection. The second parameter is an SQL command encoded in
a UTF-8 or UTF-16 string. The third parameter indicates the length of the command
string in bytes. The fourth parameter is a reference to a statement pointer.
This is used to pass back a pointer to the new sqlite3_stmt structure.
If the length parameter is negative, the length will be automatically computed by the
prepare call. This requires that the command string be properly null-terminated. If the
length is positive, it represents the maximum number of bytes that will be parsed. For
optimal performance, provide a null-terminated string and pass a valid length value
that includes the null-termination character
Once a statement has been prepared, but before it is executed, you can bind parameter
values to the statement. Statement parameters allow you to insert a special token into
the SQL command string that represents an unspecified literal value. You can then bind
specific values to the parameter tokens before the statement is executed.
StepPreparing an SQL statement causes the command string to be parsed and converted
into a set of byte-code commands. This byte-code is fed into SQLite’s Virtual Database
Engine (VDBE) for execution. The translation is not a consistent one-to-one affair.
Depending on the database structure (such as indexes), the query optimizer may generate
very different VDBE command sequences for similar SQL commands. The size
and flexibility of the SQLite library can be largely attributed to the VDBE architecture
Reset and FinalizeWhen a call to sqlite3_step() returns SQLITE_DONE, the statement has successfully finished
execution. At that point, there is nothing further you can do with the statement.
If you want to use the statement again, it must first be reset.
int sqlite3_reset( sqlite3_stmt *stmt )
Resets a prepared statement so that it is ready for another execution. A statement
should be reset as soon as you’re done using it. This will ensure any locks are
released.
Statement TransitionsPrepared statements have a significant amount of state. In addition to the currently
bound parameter values and other details, every prepared statement is always in one
of three major states.
Knowing the current state of a statement is important. Although some API functions
can be called at any time (like sqlite3_reset()), other API functions can only be called
when a statement is in a specific state.
ExamplesHere are two examples of using prepared statements. The first example executes a
CREATE TABLE statement by first preparing the SQL string and then calling
sqlite3_step() to execute the statement:
sqlite3_stmt *stmt = NULL;
/* ... open database ... */
rc = sqlite3_prepare_v2( db, "CREATE TABLE tbl ( str TEXT )", -1, &stmt, NULL );
if ( rc != SQLITE_OK) exit( -1 );
rc = sqlite3_step( stmt );
if ( rc != SQLITE_DONE ) exit ( -1 );
sqlite3_finalize( stmt );
/* ... close database ... */
The CREATE TABLE statement is a DDL command that does not return any type of value
and only needs to be “stepped” once to fully execute the command. Remember to reset
or finalize statements as soon as they’re finished executing. Also remember that all
statements associated with a database connection must be fully finalized before the
connection can be closed.
Bound ParametersStatement parameters are special tokens that are inserted into the SQL command string
before it is passed to one of the sqlite3_prepare_xxx() functions
Once this statement has been prepared, it can be used to insert multiple rows. For each
row, simply bind the appropriate values, step through the statement, and then reset
the statement. After the statement has been reset, new values can be bound to the
parameters and the statement can be stepped again.
You can also use explicit index values:
INSERT INTO people (id, name) VALUES ( ?1, ?2 );
Using explicit parameter indexes has two major advantages. First, you can have multiple
instances of the same index value, allowing the same value to be bound to more
than one place in the same statement.
This level of abstraction can be taken even further by using named parameters. In this
case, you allow SQLite to assign parameter index values as it sees fit, in a similar fashion
to anonymous parameters. The difference is that you can ask SQLite to tell you the
index value of a specific parameter based off the name you’ve given it. Consider this
statement:
INSERT INTO people (id, name) VALUES ( :id, :name );
Binding ValuesWhen you first prepare a statement with parameters, all of the parameters start out
with a NULL assigned to them. Before you execute the statement, you can bind specific
values to each parameter using the sqlite3_bind_xxx() family of functions.
There are nine sqlite3_bind_xxx() functions available, plus a number of utility functions.
These functions can be called any time after the statement is prepared, but before
sqlite3_step() is called for the first time. Once sqlite3_step() has been called, these
functions cannot be called again until the statement is reset.
All the sqlite3_bind_xxx() functions have the same first and second parameters and
return the same result. The first parameter is always a pointer to an sqlite3_stmt, and
the second is the index of the parameter to bind. Remember that for anonymous parameters,
the first index value starts with one. For the most part, the third parameter
is the value to bind. The fourth parameter, if present, indicates the length of the data
value in bytes. The fifth parameter, if present, is a function pointer to a memory management
callback.
All the bind functions return an integer error code, which is equal to SQLITE_OK upon
success.
The bind functions are:
int sqlite3_bind_blob( sqlite3_stmt *stmt, int pidx,
const void *data, int data_len, mem_callback )
Binds an arbitrary length binary data BLOB.
int sqlite3_bind_double( sqlite3_stmt *stmt, int pidx, double data )
Binds a 64-bit floating point value.
int sqlite3_bind_int( sqlite3_stmt *stmt, int pidx, int data )
Binds a 32-bit signed integer value.
int sqlite3_bind_int64( sqlite3_stmt *stmt, int pidx, sqlite3_int64 )
Binds a 64-bit signed integer value.
int sqlite3_bind_null( sqlite3_stmt *stmt, int pidx )
Binds a NULL datatype.
int sqlite3_bind_text( sqlite3_stmt *stmt, int pidx,
const char *data, int data_len, mem_callback )
Database LockingSQLite employs a number of different locks to protect the database from race conditions.
These locks allow multiple database connections (possibly from different processes)
to access the same database file simultaneously without fear of corruption. The
locking system is used for both autocommit transactions (single statements) as well as
explicit transactions.
The locking system involves several different tiers of locks that are used to reduce contention
and avoid deadlocking. The details are somewhat complex, but the system
allows multiple connections to read a database file in parallel, but any write operation
requires full, exclusive access to the entire database file.
Most of the time the locking system works reasonably well, allowing applications to
easily and safely share the database file. If coded properly, most write operations only
last a fraction of a second. The library is able to get in, make the required modifications,
verify them, and then get out, quickly releasing any locks and making the database
available to other connections.
Busy handlersA busy handler is a callback function that is called by the SQLite library any time it is
unable to acquire a lock, but has determined it is safe to try and wait for it. The busy
handler can instruct SQLite to keep trying to acquire the lock, or to give up and return
an SQLITE_BUSY error.
DeadlocksSetting a busy handler will not fix every problem. There are some situations when
waiting for a lock will cause the database connection to deadlock. The deadlock happens
when a pair of database connections each have some set of locks and both need
to acquire additional locks to finish their task. If each connection is attempting to access
a lock currently held by the other connection, both connections will stall in a deadlock.
This can happen if two database connections both attempt to write to the database at
the same time. In this case, there is no point in both database connections waiting for
the locks to be released, since the only way to proceed is if one of the connections gives
up and releases all of its locks.
If SQLite detects a potential deadlock situation, it will skip the busy handler and will
have one of the database connections return SQLITE_BUSY immediately. This is done to
encourage the applications to release their locks and break the deadlock. Breaking the
deadlock is the responsibility of the application(s) involved—SQLite cannot handle
this situation for you.
Avoiding SQLITE_BUSYWhen developing code for a system that requires any degree of database concurrency,
the easiest approach is to use sqlite3_busy_timeout() to set a timeout value that is
reasonable for your application. Start with something between 250 to 2,000 milliseconds
and adjust from there. This will help reduce the number of SQLITE_BUSY response
codes, but it will not eliminate them.
Avoiding deadlocksThe rules to avoid deadlocks are fairly simple, although their application can cause
significant complexity in code.
First, the easy ones. The functions sqlite3_prepare_xxx(), sqlite3_backup_step(), and
sqlite3_blob_open() cannot cause a deadlock. If an SQLITE_BUSY code is returned from
one of these functions at any time, simply wait and call the function again.
If the function sqlite3_step(), sqlite3_reset(), or sqlite3_finalize() returns
SQLITE_BUSY from within a deferred transaction, the application must back off and try
again. For statements that are not part of an explicit transaction, the prepared statement
can simply be reset and re-executed. For statements that are inside an explicit deferred
transaction, the whole transaction must be rolled back and started over from the
beginning. In most cases, this will happen on the first attempt to modify the database.
When BUSY becomes BLOCKEDWhen a database connection needs to modify the database, a lock is placed that makes
the database read-only. This allows other connections to continue to read the database,
but prevents them from making modifications
Memory ManagementWhen SQLite needs to dynamically allocate memory, it normally calls the default
memory handler of the underlying operating system. This causes SQLite to allocate its
memory from the application heap
Regardless, you can access whatever memory manager SQLite is using with these
SQLite memory management functions:
void* sqlite3_malloc( int numBytes )
Allocates and returns a buffer of the size specified. If the memory cannot be allocated,
a NULL pointer is returned. Memory will always be 8-byte (64-bit) aligned.
This is a replacement for the standard C library malloc() function.
void* sqlite3_realloc( void *buffer, int numBytes )
Used to resize a memory allocation. Buffers can be made larger or smaller. Given
a buffer previously returned by sqlite3_malloc() and a byte count, *_realloc()
will allocate a new buffer of the specified size and copy as much of the old buffer
as will fit into the new buffer. It will then free the old buffer and return the new
buffer. If the new buffer cannot be allocated, a NULL is returned and the original
buffer is not freed.
If the buffer pointer is NULL, the call is equivalent to a call to sqlite3_malloc().
If the numBytes parameter is zero or negative, the call is equivalent to a call to
sqlite3_free().
This is a replacement for the standard C library realloc() function
void sqlite3_free( void *buffer )
Releases a memory buffer previously allocated by sqlite3_malloc() or
sqlite3_realloc(). Also used to free the results or buffers of a number of SQLite
API functions that call sqlite3_malloc() internally.
This is a replacement for the standard C library free() function.