infovision anand s _ no sql workshop

Post on 23-Jan-2015

351 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

NoSQL, Non relational databases

TRANSCRIPT

NoSQLData ScientistGramener.com

Relational DatabasesNon

Relational DatabasesNon

Professor Hired on Course

Anand 10 Jan 2012 Maths

Bala 15 Jan 2012 Physics

Chandra 20 Jan 2012 Chemistry

Dileep 25 Jan 2012 ???

INSERT ANOMALY

UPDATE ANOMALY

Professor Hired on Course

Anand 10 Jan 2012 Maths

Bala 15 Jan 2012 Physics

Chandra 20 Jan 2012 Chemistry

Dileep 25 Jan 2012 Biology

DELETE ANOMALY

Name Address PhoneAnand 10 Mount Rd, Chennai 98765 43210

Bala 15, Janpath, New Delhi 90123 45678

Chandra 20, Marine Dr, Mumbai 91234 56780

Chandra 20, Marine Dr, Mumbai 91234 56781

WHY NOW?

DATA VOLUME IS GROWING

SEMI-STRUCTURED DATA DISTRIBUTED ARCHITECTURE

161253

397

623

988

2006 2007 2008 2009 2010

DATA IS INCREASINGLY NETWORKED

How many programmers?

… who’ve programmed NoSQL DBs

A POLL

How many non-IT folks?

data is stored in

TABLES

Key-value stores

Document databases

Graph databases

C

A

P

Brewer’s CAPTheorem

Pick Two

Consistency PartitionTolerance

Availability

Key-value stores

Document databases

Graph databases

Columnar databases

KEY VALUE STORES DOCUMENT DATABASES

COLUMNAR DATABASES GRAPH DATABASES

RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet

CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino

CassandraBigTableHypertableHbaseVerticaInfiniDB

Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph

KEY VALUE STORES DOCUMENT DATABASES

COLUMNAR DATABASES GRAPH DATABASES

RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet

CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino

CassandraBigTableHypertableHbaseVerticaInfiniDB

Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph

The first time round, the mistakes were around scalability. I used a SQL “ORDER BY RAND()” statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn’t matter since the button would only be clicked occasionally.

Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes.

The second time round I turned to my new favourite in-memory data structure server, redis, and its SRANDMEMBER command (a feature I requested a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewedpages for the current assignment and then SRANDMEMBER to pick one of those pages.

CouchDB

s.anand@gramener.com gramener.com s-anand.net @sanand0 on Twitter

+91 9741 552 552

EXERCISE: DESIGN THE SSLC MARKS DATABASE

Each student has an ID.There are totally 11 languages and 92 non-language subjects.

Students usually write 3 language and 3 non-language exams.

For example,

• (English, Hindi, Sanskrit), (Maths, Physics, Chemistry)• (Kannada, Urdu, Marathi), (Commerce, Accountancy, Economics)

You need to record their marks in all 6 subjects, and the total.

EXERCISE: DESIGN THE SSLC MARKS DATABASE

Common queries:

Who scored the highest in Maths?

Which subject had the highest fail %?

How many failed in 1 subject?

Some scenarios:

Access from multiple locations

Real-time marks updation

Guarantee of correctness

top related