advance sqlite3

Advance SQLite

Normal FormNormalization is the process of removing data duplication, more clearly defining key

relationships,and generally moving towards a more idealized database form. It is possible for

different tables in the same database to be at different levels.

Most people recognize five normal forms simply referred to as the First Normal Form

through the Fifth Normal Form. These are often abbreviated 1NF through 5NF. There

are also a few named forms, such as the Boyce-Codd Normal Form (BCNF). Most of

these other forms are roughly equivalent to one of the numbered forms. For example,

BCNF is a slight extension to the Third Normal Form. Some folks also recognize higher

levels of normalization, such as a Sixth Normal Form and beyond, but these extreme

levels of normalization are well beyond the practical concerns of most database

designers.

NormalizationThe normalization process is useful for two reasons. First, normalization specifies design

criteria that can act as a guide in the design process. If you have a set of tables that

are proving to be difficult to work with, that often points to a deeper design problem

or assumption. The normalization process provides a set of rules and conditions that

can help identify trouble spots, as well as provide possible solutions to reorganize the

data in a more consistent and clean fashion.

DenormalizationNormalizing a database and spreading the data out into different tables means that

queries usually involve joining several tables back together. This can occasionally lead

to performance concerns, especially for complex reports that require data from a large

number of tables. These concerns can sometimes lead to the process of denormalization,

where duplicate copies of the same data are intentionally introduced to reduce the

number of joins required for common queries. This is typically done on systems that

are primarily read-only, such as data-warehouse databases, and is often done by computing

temporary tables from properly normalized source data.

The First Normal Form

The First Normal Form, or 1NF, is the lowest level of normalization. It is primarily

concerned with making sure a table is in the proper format. There are three conditions

that must be met for a table to be in 1NF.

The first condition relates to ordering. To be in 1NF, the individual rows of a table

cannot have any meaningful or inherent order. Each row should be an isolated, standalone

record

The Second Normal FormThe Second Normal Form, or 2NF, deals with compound keys (multicolumn keys) and

how other columns relate to such keys. 2NF has only one condition: every column that

is not part of the primary key must be relevant to the primary key as a whole, and not

just a sub-part of the key.

The Third Normal Form

The Third Normal Form, or 3NF, extends the 2NF to eliminate transitive key dependencies.

A transitive dependency is when A depends on B, and B depends on C, and

therefore A depends on C. 3NF requires that each nonprimary key column has a direct

(nontransitive) dependency on the primary key

For example, consider an inventory database that is used to track laptops at a small

business. The laptop table will have a primary key that uniquely identifies each laptop,

such as an inventory control number. It is likely the table would have other columns

that include the make and model of the machine, the serial number, and perhaps a

purchase date. For our example, the laptop table will also include a responsible_

person_id column. When an employee is assigned a laptop, their employee ID number

is put in this column

Higher Normal FormsWe’re not going to get into the details of BCNF, or the Fourth or Fifth (or beyond)

Normal Forms, other than to mention that the Fourth and Fifth Normal Forms start

to deal with inter-table relationships and how different tables interact with each other.

Most database designers make a solid effort to get everything into 3NF and then stop

worrying about it. It turns out that if you get the hang of things and tend to turn out

table designs that are in 3NF, chances are pretty good that your tables will also meet

the conditions for 4NF and 5NF, if not higher. To a large extent, the higher Normal

Forms are formal ways of addressing some edge cases that are somewhat unusual,

especially in simpler designs.

IndexesIndexes (or indices) are auxiliary data structures used by the database system to enforce

unique constraints, speed up sort operations, and provide faster access to specific records.

They are often created in hopes of making queries run faster by avoiding table

scans and providing a more direct lookup method.

How They WorkEach index is associated with a specific table. Indexes can be either single column or

multicolumn, but all the columns of a given index must belong to the same table. There

is no limit to the number of indexes a table can have, nor is there a limit on the number

of indexes a column can belong to. You cannot create an index on a view or on a virtual

table.

Internally, the rows of a normal table are stored in an indexed structure. SQLite uses

a B-Tree for this purpose, which is a specific type of multi-child, balanced tree. The

details are unimportant, other than understanding that as rows are inserted into the

tree, the rows are sorted, organized, and optimized, so that a row with a specific, known

ROWID can be retrieved relatively directly and quickly.

When you create an index, the database system creates another tree structure to hold

the index data. Rather than using the ROWID column as the sort key, the tree is sorted

and organized using the column or columns you’ve specified in the index definition.

The index entry consists of a copy of the values from each of the indexed columns, as

well as a copy of the corresponding ROWID value. This allows an indexed entry to be

found very quickly using the values from the indexed columns.

If we have a table like this:

CREATE TABLE tbl ( a, b, c, d );

And then create an index that looks like this:

CREATE INDEX idx_tbl_a_b ON tbl ( a, b );

The database generates an internal data structure that is conceptually similar to:

SELECT a, b, ROWID FROM tbl ORDER BY a, b;

If SQLite needs to quickly find all the rows where, for example, a = 45, it can use the sorted index to quickly jump to that range of values and extract the relevant index entries. If it’s looking for the value of b, it can simply extract that from the index and be done. If we need any other value in the row, it needs to fetch the full row. This is done by looking up the ROWID. The last value of any index entry is the ROWID of its corresponding table row. Once SQLite has a list of ROWID values for all rows where a = 45, it can efficiently look up those rows in the original table and retrieve the full row. If everything works correctly, the process of looking up a small set of index entries, and then using those to look up a small set of table rows, will be much faster and more efficient than doing a full table scan.

INTEGER PRIMARY KEYsWhen you declare one or more columns to be a PRIMARY KEY, the database system

automatically creates a unique index over those columns. The fundamental purpose of

this index is to enforce the UNIQUE constraint that is implied with every PRIMARY KEY. It

also happens that many database operations typically involve the primary key, such as

natural joins, or conditional lookups in UPDATE or DELETE commands. Even if the index

wasn’t required to enforce the UNIQUE constraint, chances are good you would want an

index over those columns anyway.

Transferring Design ExperienceIf you have some experience designing application data structures or class hierarchies,

you may have noticed some similarities between designing runtime structures and

database tables. Many of the organization principles are the same and, thankfully, much

of the design knowledge and experience gained in the application development world

will transfer to the database world.

Tables Are TypesThe most common misconception is to think of tables as instances of a compound data

structure. A table looks a whole lot like an array or a dynamic list, so it is easy to make

this mistake

Keys Are Backwards PointersAnother stumbling block is the proper use of keys. Keys are very similar to pointers. A

primary key is used to identify a unique instance of a data structure. This is similar to

the address of a record. Anything that wants to reference that record needs to record

its address as a pointer or, in the cases of databases, as a foreign key. Foreign keys are

essentially database pointers.

ClosingAs with application development, database design is part science and part art. It may

seem quite complex, with keys to set up, different relationships to define, Normal

Forms to follow, and indexes to create.

Thankfully, the basics usually fall into place fairly quickly. If you start to get into larger

or more complex designs, some reading on more formal methods of data modeling and

database design might be in order, but most developers can get pretty far by just leveraging

their knowledge and experience in designing application data structures. In the

end, you’re just defining data structures and hooking them together.

C Programming InterfaceAPI Overview

Even when using the programming interface, the primary way of interacting with your

data is to issue SQL commands to the database engine. This chapter focuses on the

core of the API that is used to convey SQL command strings to the database engine. It

is important to understand that there are no public functions to walk the internal

structure of a table or, for example, access the tree structure of an index.

Structure

The C API for SQLite 3 includes a dozen-plus data structures, a fair number of constants,

and well over one hundred different function calls. While the API is somewhat

large, using it doesn’t have to be complex. A fair number of the functions are highly

specialized and infrequently used by most developers. Many of the remaining functions

are simple variations of the same basic operation. For example, there are a dozen

variations on the sqlite3_value_xxx() function, such as sqlite3_value_int(),

sqlite3_value_double(), and sqlite3_value_text(). All of these functions perform the

same basic operation and can be considered simple type variations of the same basic

interface

All public API function calls and datatypes have the prefix sqlite3_, indicating they

are part of version 3.x of the SQLite product. Most of the constants, such as error codes,

use the prefix SQLITE_. The design and API differences between SQLite 2.x and 3.x were

significant enough to warrant a complete change of all API names and structures. The

depth of these changes required anyone upgrading from SQLite 2 to SQLite 3 to modify

their application, so changing the names of the API functions only helped keep the

names distinct and keep any version questions clear. The distinct names also allowed

applications that were in transition to link to both libraries at the same time.

In addition to the sqlite3_ prefix, public function calls can be identified by the use of

lowercase letters and underscores in their names. Private functions use run-together

capitalized words (also known as CamelCase). For example, sqlite3_create_function()

is a public API function (used to register a user-defined SQL function), while

sqlite3CreateFunc() is an internal function that should never be called directly.

Internal functions are not in the public header file, are not documented, and are subject

to change at any time.

Strings and Unicode

There are a number of API functions that have a 16 variant. For instance, both an

sqlite3_column_text() function and an sqlite3_column_text16() function are available.

The first requests a text value in UTF-8 format, while the second will request a text

value in UTF-16.

Error CodesSQLite follows the convention of returning integer error codes in any situation when

there is a chance of failure. If data needs to be passed back to the function caller, it is

returned through a reference parameter.

In all cases, if a function succeeds, it will return the constant SQLITE_OK, which happens

to have the value zero. If something went wrong, API functions will return one of the

standard error codes to indicate the nature of the error.

Structures and AllocationsAlthough the native SQLite API is often referred to as a C/C++ API, technically the

interface is only available in C. As mentioned in “Building”, the SQLite

source code is strictly C based, and as such can only be compiled with a C compiler.

Once compiled, the library can be easily linked to, and called from, both C and C++

code, as well as any other language that follows the C linking conventions for your

platform.

Although the API is written in C, it has a distinct object-like flavor. Most of the program

state is held in a series of opaque data structures that act like objects. The most common

data structures are database connections and prepared statements. You should never

directly access the fields of these data structures. Instead, functions are provided to

create, destroy, and manipulate these structures in much the same way that object

methods are used to manipulate object instances.

Library InitializationBefore an application can use the SQLite library to do anything, the library must first

be initialized. This process allocates some standard resources and sets up any OSspecific

data structures. By default, most major API function calls will automatically

initialize the library, if it has not already been initialized. It is considered a good practice

to manually initialize the library, however.

int sqlite3_initialize( )

Initializes the SQLite library. This function should be called prior to any other

function in the SQLite API. Calling this function after the library has already been

initialized is harmless. This function can be called after a shutdown to reinitialize

the library. A return value of SQLITE_OK indicates success.

When an application is finished using the SQLite library, the library should be shut down.

int sqlite3_shutdown( )

Releases any resources allocated by sqlite3_initialize(). Calling this function

before the library has been initialized or after the library has already been shut

down is harmless. A return value of SQLITE_OK indicates success.

Closing

To close and release a database connection, call sqlite3_close().

int sqlite3_close( sqlite3 *db )

Closes a database connection and releases any associated data structures. All temporary

items associated with this connection will be deleted.

Any pointer returned by a call to sqlite3_open_xxx(), including a NULL pointer, can

be passed to sqlite3_close(). This function verifies there are no outstanding changes

to the database, then closes the file and frees the sqlite3 data structure

Example#include <stdlib.h>

int main( int argc, char **argv )

{

char *file = ""; /* default to temp db */

sqlite3 *db = NULL;

int rc = 0;

if ( argc > 1 )

file = argv[1];

sqlite3_initialize( );

rc = sqlite3_open_v2( file, &db, SQLITE_OPEN_READWRITE |

SQLITE_OPEN_CREATE, NULL );

if ( rc != SQLITE_OK) {

sqlite3_close( db );

exit( -1 );

}

/* perform database operations */

sqlite3_close( db );

sqlite3_shutdown( );

}

The default filename is an empty string. If passed to sqlite3_open_xxx(), this will result

in a temporary, on-disk database that will be deleted as soon as the database connection

is closed. If at least one argument is given, the first argument will be used as a filename.

If the database does not exist, it will be created and then opened for read/write access.

It is then immediately closed.

Prepared Statements

Once a database connection is established, we can start to execute SQL commands.

This is normally done by preparing and stepping through statements. Statements are

held in sqlite3_stmt data structures.

Statement Life CycleThe life cycle of a prepared statement is a bit complex. Unlike database connections,

which are typically opened, used for some period of time, and then closed, a statement

can be in a number of different states. A statement might be prepared, but not run, or

it might be in the middle of processing

/* create a statement from an SQL string */

sqlite3_stmt *stmt = NULL;

sqlite3_prepare_v2( db, sql_str, sql_str_len, &stmt, NULL );

/* use the statement as many times as required */

while( ... )

{

/* bind any parameter values */

sqlite3_bind_xxx( stmt, param_idx, param_value... );

...

/* execute statement and step over each row of the result set */

while ( sqlite3_step( stmt ) == SQLITE_ROW )

{

/* extract column values from the current result row */

col_val = sqlite3_column_xxx( stmt, col_index );

...

}

/* reset the statement so it may be used again */

sqlite3_reset( stmt );

sqlite3_clear_bindings( stmt ); /* optional */

}

/* destroy and release the statement */

sqlite3_finalize( stmt );

stmt = NULL;

The prepare process converts an SQL command string into a prepared statement. That

statement can then have values bound to any statement parameters. The statement is

then executed, or “stepped through.” In the case of a query, each step will make a new

results row available for processing.

The statement can then be reset, allowing it to be re-executed with a new set of bindings.

Preparing a statement can be somewhat costly, so it is a common practice to reuse

statements as much as possible.

Prepare

To convert an SQL command string into a prepared statement, use one of the

sqlite3_prepare_xxx() functions:

int sqlite3_prepare( sqlite3 *db, const char *sql_str, int sql_str_len,

sqlite3_stmt **stmt, const char **tail )

int sqlite3_prepare16( sqlite3 *db, const void *sql_str, int sql_str_len,

sqlite3_stmt **stmt, const void **tail )

It is strongly recommended that all new developments use the _v2 version of these

functions.

int sqlite3_prepare_v2( sqlite3 *db, const char *sql_str, int sql_str_len,

sqlite3_stmt **stmt, const char **tail )

int sqlite3_prepare16_v2( sqlite3 *db, const void *sql_str, int sql_str_len,

sqlite3_stmt **stmt, const void **tail )

Converts an SQL command string into a prepared statement. The first parameter

is a database connection. The second parameter is an SQL command encoded in

a UTF-8 or UTF-16 string. The third parameter indicates the length of the command

string in bytes. The fourth parameter is a reference to a statement pointer.

This is used to pass back a pointer to the new sqlite3_stmt structure.

If the length parameter is negative, the length will be automatically computed by the

prepare call. This requires that the command string be properly null-terminated. If the

length is positive, it represents the maximum number of bytes that will be parsed. For

optimal performance, provide a null-terminated string and pass a valid length value

that includes the null-termination character

Once a statement has been prepared, but before it is executed, you can bind parameter

values to the statement. Statement parameters allow you to insert a special token into

the SQL command string that represents an unspecified literal value. You can then bind

specific values to the parameter tokens before the statement is executed.

StepPreparing an SQL statement causes the command string to be parsed and converted

into a set of byte-code commands. This byte-code is fed into SQLite’s Virtual Database

Engine (VDBE) for execution. The translation is not a consistent one-to-one affair.

Depending on the database structure (such as indexes), the query optimizer may generate

very different VDBE command sequences for similar SQL commands. The size

and flexibility of the SQLite library can be largely attributed to the VDBE architecture

Reset and FinalizeWhen a call to sqlite3_step() returns SQLITE_DONE, the statement has successfully finished

execution. At that point, there is nothing further you can do with the statement.

If you want to use the statement again, it must first be reset.

int sqlite3_reset( sqlite3_stmt *stmt )

Resets a prepared statement so that it is ready for another execution. A statement

should be reset as soon as you’re done using it. This will ensure any locks are

released.

Statement TransitionsPrepared statements have a significant amount of state. In addition to the currently

bound parameter values and other details, every prepared statement is always in one

of three major states.

Knowing the current state of a statement is important. Although some API functions

can be called at any time (like sqlite3_reset()), other API functions can only be called

when a statement is in a specific state.

ExamplesHere are two examples of using prepared statements. The first example executes a

CREATE TABLE statement by first preparing the SQL string and then calling

sqlite3_step() to execute the statement:

sqlite3_stmt *stmt = NULL;

/* ... open database ... */

rc = sqlite3_prepare_v2( db, "CREATE TABLE tbl ( str TEXT )", -1, &stmt, NULL );

if ( rc != SQLITE_OK) exit( -1 );

rc = sqlite3_step( stmt );

if ( rc != SQLITE_DONE ) exit ( -1 );

sqlite3_finalize( stmt );

/* ... close database ... */

The CREATE TABLE statement is a DDL command that does not return any type of value

and only needs to be “stepped” once to fully execute the command. Remember to reset

or finalize statements as soon as they’re finished executing. Also remember that all

statements associated with a database connection must be fully finalized before the

connection can be closed.

Bound ParametersStatement parameters are special tokens that are inserted into the SQL command string

before it is passed to one of the sqlite3_prepare_xxx() functions

Once this statement has been prepared, it can be used to insert multiple rows. For each

row, simply bind the appropriate values, step through the statement, and then reset

the statement. After the statement has been reset, new values can be bound to the

parameters and the statement can be stepped again.

You can also use explicit index values:

INSERT INTO people (id, name) VALUES ( ?1, ?2 );

Using explicit parameter indexes has two major advantages. First, you can have multiple

instances of the same index value, allowing the same value to be bound to more

than one place in the same statement.

This level of abstraction can be taken even further by using named parameters. In this

case, you allow SQLite to assign parameter index values as it sees fit, in a similar fashion

to anonymous parameters. The difference is that you can ask SQLite to tell you the

index value of a specific parameter based off the name you’ve given it. Consider this

statement:

INSERT INTO people (id, name) VALUES ( :id, :name );

Binding ValuesWhen you first prepare a statement with parameters, all of the parameters start out

with a NULL assigned to them. Before you execute the statement, you can bind specific

values to each parameter using the sqlite3_bind_xxx() family of functions.

There are nine sqlite3_bind_xxx() functions available, plus a number of utility functions.

These functions can be called any time after the statement is prepared, but before

sqlite3_step() is called for the first time. Once sqlite3_step() has been called, these

functions cannot be called again until the statement is reset.

All the sqlite3_bind_xxx() functions have the same first and second parameters and

return the same result. The first parameter is always a pointer to an sqlite3_stmt, and

the second is the index of the parameter to bind. Remember that for anonymous parameters,

the first index value starts with one. For the most part, the third parameter

is the value to bind. The fourth parameter, if present, indicates the length of the data

value in bytes. The fifth parameter, if present, is a function pointer to a memory management

callback.

All the bind functions return an integer error code, which is equal to SQLITE_OK upon

success.

The bind functions are:

int sqlite3_bind_blob( sqlite3_stmt *stmt, int pidx,

const void *data, int data_len, mem_callback )

Binds an arbitrary length binary data BLOB.

int sqlite3_bind_double( sqlite3_stmt *stmt, int pidx, double data )

Binds a 64-bit floating point value.

int sqlite3_bind_int( sqlite3_stmt *stmt, int pidx, int data )

Binds a 32-bit signed integer value.

int sqlite3_bind_int64( sqlite3_stmt *stmt, int pidx, sqlite3_int64 )

Binds a 64-bit signed integer value.

int sqlite3_bind_null( sqlite3_stmt *stmt, int pidx )

Binds a NULL datatype.

int sqlite3_bind_text( sqlite3_stmt *stmt, int pidx,

const char *data, int data_len, mem_callback )

Database LockingSQLite employs a number of different locks to protect the database from race conditions.

These locks allow multiple database connections (possibly from different processes)

to access the same database file simultaneously without fear of corruption. The

locking system is used for both autocommit transactions (single statements) as well as

explicit transactions.

The locking system involves several different tiers of locks that are used to reduce contention

and avoid deadlocking. The details are somewhat complex, but the system

allows multiple connections to read a database file in parallel, but any write operation

requires full, exclusive access to the entire database file.

Most of the time the locking system works reasonably well, allowing applications to

easily and safely share the database file. If coded properly, most write operations only

last a fraction of a second. The library is able to get in, make the required modifications,

verify them, and then get out, quickly releasing any locks and making the database

available to other connections.

Busy handlersA busy handler is a callback function that is called by the SQLite library any time it is

unable to acquire a lock, but has determined it is safe to try and wait for it. The busy

handler can instruct SQLite to keep trying to acquire the lock, or to give up and return

an SQLITE_BUSY error.

DeadlocksSetting a busy handler will not fix every problem. There are some situations when

waiting for a lock will cause the database connection to deadlock. The deadlock happens

when a pair of database connections each have some set of locks and both need

to acquire additional locks to finish their task. If each connection is attempting to access

a lock currently held by the other connection, both connections will stall in a deadlock.

This can happen if two database connections both attempt to write to the database at

the same time. In this case, there is no point in both database connections waiting for

the locks to be released, since the only way to proceed is if one of the connections gives

up and releases all of its locks.

If SQLite detects a potential deadlock situation, it will skip the busy handler and will

have one of the database connections return SQLITE_BUSY immediately. This is done to

encourage the applications to release their locks and break the deadlock. Breaking the

deadlock is the responsibility of the application(s) involved—SQLite cannot handle

this situation for you.

Avoiding SQLITE_BUSYWhen developing code for a system that requires any degree of database concurrency,

the easiest approach is to use sqlite3_busy_timeout() to set a timeout value that is

reasonable for your application. Start with something between 250 to 2,000 milliseconds

and adjust from there. This will help reduce the number of SQLITE_BUSY response

codes, but it will not eliminate them.

Avoiding deadlocksThe rules to avoid deadlocks are fairly simple, although their application can cause

significant complexity in code.

First, the easy ones. The functions sqlite3_prepare_xxx(), sqlite3_backup_step(), and

sqlite3_blob_open() cannot cause a deadlock. If an SQLITE_BUSY code is returned from

one of these functions at any time, simply wait and call the function again.

If the function sqlite3_step(), sqlite3_reset(), or sqlite3_finalize() returns

SQLITE_BUSY from within a deferred transaction, the application must back off and try

again. For statements that are not part of an explicit transaction, the prepared statement

can simply be reset and re-executed. For statements that are inside an explicit deferred

transaction, the whole transaction must be rolled back and started over from the

beginning. In most cases, this will happen on the first attempt to modify the database.

When BUSY becomes BLOCKEDWhen a database connection needs to modify the database, a lock is placed that makes

the database read-only. This allows other connections to continue to read the database,

but prevents them from making modifications

Memory ManagementWhen SQLite needs to dynamically allocate memory, it normally calls the default

memory handler of the underlying operating system. This causes SQLite to allocate its

memory from the application heap

Regardless, you can access whatever memory manager SQLite is using with these

SQLite memory management functions:

void* sqlite3_malloc( int numBytes )

Allocates and returns a buffer of the size specified. If the memory cannot be allocated,

a NULL pointer is returned. Memory will always be 8-byte (64-bit) aligned.

This is a replacement for the standard C library malloc() function.

void* sqlite3_realloc( void *buffer, int numBytes )

Used to resize a memory allocation. Buffers can be made larger or smaller. Given

a buffer previously returned by sqlite3_malloc() and a byte count, *_realloc()

will allocate a new buffer of the specified size and copy as much of the old buffer

as will fit into the new buffer. It will then free the old buffer and return the new

buffer. If the new buffer cannot be allocated, a NULL is returned and the original

buffer is not freed.

If the buffer pointer is NULL, the call is equivalent to a call to sqlite3_malloc().

If the numBytes parameter is zero or negative, the call is equivalent to a call to

sqlite3_free().

This is a replacement for the standard C library realloc() function

void sqlite3_free( void *buffer )

Releases a memory buffer previously allocated by sqlite3_malloc() or

sqlite3_realloc(). Also used to free the results or buffers of a number of SQLite

API functions that call sqlite3_malloc() internally.

This is a replacement for the standard C library free() function.

advance sqlite3

Education