db conan 1.0

33
A practical & intelligent enough data retrieval tool Raka Angga Jananuraga [email protected] http://jananuraga.blogspot.com

Upload: rakaangga

Post on 14-Jun-2015

444 views

Category:

Technology


0 download

DESCRIPTION

The software is available on http://www.box.net/shared/5ag8ja4eli

TRANSCRIPT

Page 1: DB Conan 1.0

A practical & intelligent enough data retrieval tool

Raka Angga [email protected]

http://jananuraga.blogspot.com

Page 2: DB Conan 1.0

How it can help you.Get specific records you want from a table, and

all related tables, quickly, in a repeatable manner.You wouldn’t have to crawl the database manually.Not even a single SQL-query to be typed.You will get all the results as a collection of CSV

files. Files stored in an automatically-created folder, ready for

packing and sending.“Data mozaic-ing” improves collaboration and

coordination.

Page 3: DB Conan 1.0

When it can help you.Software testing (esp. data verification)

You want to make sure the data have the correct (expected) state after the execution of an operation.

System-comprehension / reverse-engineering:You want to know / understand how an

operation affects the data.

Page 4: DB Conan 1.0

ScreenshotsPrototype is a pure console application,

written in Python, compiled to EXE using py2exe.

Consists of five small applications that make up the workflow.

Page 5: DB Conan 1.0

Screenshots (2 of 6)

Page 6: DB Conan 1.0

Screenshots (3 of 6)

Page 7: DB Conan 1.0

Screenshots (4 of 6)

Page 8: DB Conan 1.0

Screenshots (5 of 6)

Page 9: DB Conan 1.0

Screenshots (6 of 6)

Page 10: DB Conan 1.0

Brief description of each app.Schema builder: used to define a subset of the entire

database schema that you want to focus your (data)analysis on.

Exploration planner: used to define the starting point of your queries (over the schema), and filters for each table.

Exploration runner: used to run the exploration planner. It will traverse the schema, returns matching results from each table, and save them as CSV files.

Plan stitcher: used to stitch exploration plans together to form a mozaic (more on this later).

Mozaic renderer: used to render the mozaic (more on this later)

Page 11: DB Conan 1.0

Workflow (scenario 1)

Fire up the schema builder. It will load and analyze the structure of the database.

In the schema builder, specify name of a table that you want to be in your schema.

Starting from that table, add more tables to your list by following the associations.

Once you have all the tables you’re interested in in your list, save the schema as an SCM file.

Build schemaBuild

schema

Plan explorati

on

Plan explorati

on

Run explorati

on

Run explorati

on

scm

xpp

csv csv csv

Page 12: DB Conan 1.0

Workflow (scenario 1) – cont’d.

Run the exploration planner (feed in to it the SCM file you just created).

In the exploration planner, specify the name of “starting table”; that is the table where the crawling will be started from.

Additionaly, you can also specify filters in each table (except the starting table); to limit the results that you’d like to receive.

Save the plan as an XPP file.

Build schemaBuild

schema

Plan explorati

on

Plan explorati

on

Run explorati

on

Run explorati

on

scm

xpp

csv csv csv

Page 13: DB Conan 1.0

Workflow (scenario 1) – cont’d.

Execute the exploration runner (specify as parameter the XPP file you just created).

The exploration runner will ask you to enter the identifier (s) of the record(s) you’d like to match in the starting table.

Hit the ‘R’ button to start-off the crawling.The exploration runner will query all the

tables in the schema by following the associations.

The resulf of the query will be stored as XPP files (one for each table).

Build schemaBuild

schema

Plan explorati

on

Plan explorati

on

Run explorati

on

Run explorati

on

scm

xpp

csv csv csv

Page 14: DB Conan 1.0

Workflow (scenario 1) - demo

Page 15: DB Conan 1.0

Workflow (scenario 1) – cmds. Schema builder

dbconan_schemabuilder.exe –H 192.168.1.102 –p 1521 –n XE –u MYOHMY –a MYOHMY –o MYOHMY -H : the host where the Oracle DB is running. -p : the port where the Oracle DB is accepting connection on. -n : the name of the DB -u : the user for accessing the Oracle DB (must have read access to the DB catalog) -a : the password -o : the ownerd of the DB

Note: you can pass –s option, followed by path to an existing SCM file, to edit an existing schema.

Exploration planner dbconan_explorationplanner.exe –s e:\myschema_1.scm

-s : the path to the SCM file that contains the schema to be queried over. Note: you can pass –x option (instead of –s), followed by path to an existing

XPP file, to edit an existing exploration plan. Exploration runner

dbconan_explorationrunner.exe –x e:\myexplorationplan_1.xpp –a MYOHMY -x : the path to the XPP file to be “ran”. -a : the password for accessing the Oracle DB

Note: you can pass –H, -p, -n as well. Their values will override the values saved in the XPP.

Page 16: DB Conan 1.0

Workflow (scenario 2)

Build schemaBuild

schema

Plan explorati

on

Plan explorati

on Stitch plan

Stitch plan

scm (1)

xpp

(1)

mzc

Plan explorati

on

Plan explorati

on

xpp

(2)

Build schemaBuild

schema

Plan explorati

on

Plan explorati

on

xpp

(3)

scm (2)

Render mozaicRender mozaic

csv csv csv

csv csv

csv

Page 17: DB Conan 1.0

Workflow (scenario 2) – why?Three valid uses:

As a workaround for some constraints. Hint: no cycles are allowed in the schema.

To load data from multiple databases at once. Usually when there are correlations between the

content of those separate databases.Collaboration & coordination in a team.

Hint: split-and-(later)-merge way of working together.

Page 18: DB Conan 1.0

Mozaicing overcomes constraintThe constraint: your schema must not contain any cycles

(schemabuilder will ensure your schema wouldn’t break the constraint).

Consequence: you wouldn’t be able to load data of certain structures using single schema. E.g.: the typical “employee-manager” relationship; because it

involves a cycle (in this case a loop). Don’t worry: we’re talking about schema of this DBConan application;

not your database. You can design your database structure the way you want.

Workaround: for this particular “employee-manager” case, just as an example, create one schema, and create two* exploration plans out of that schema, and then stitch them together as a mozaic. The following diagram will make that point clear(er).

* Actually for this case you can make it with only one plan. It depends.

Page 19: DB Conan 1.0

The (data) mozaic

Mozaic element 1

Plan 1

Schema 1

Mozaicing overcome constraints

emp

emp

Mozaic element 2

Plan 2

Schema 1

emp

emp

trigger

Page 20: DB Conan 1.0

Mozaicing overcome constraints(what does the trigger say?)It’s something along this line:

From each record found in the employee table in mozaic-element* 1, take the value of column manager_id.

Use those values to query the employee table in mozaic-element 2 on emp_id column.

In generalg you can spell it this way:From each record found in the table A in mozaic-element

X, take the value of the columns that make up key M.Use those values to query the starting table of mozaic-

element Y, that we’ll refer to as table B. The key M must be compatible with the primary-key of table

B, in the sense that they are made up from the same number of columns.

Alternatively you can link by column (you will have to name the column in table A and table B that will be linked).

*) Mozaic-element is just a wrapper around exploration-plan, to contain additonal attributes (i.e.: links to another mozaic-elements)

Page 21: DB Conan 1.0

Another example of mozaicing(workaround the no-cycle constraint)Suppose we have the following database:

And our data retrieval task is: given a class, find the professor who delivers it, and all the students enrolled in the class. Additionaly find the mentor of each one of those students.

Student

ClassRegistration

Class

Professor

1 * * 1

1

* *

1delivered by

mentored by

Page 22: DB Conan 1.0

Another example of mozaicing(workaround the no-cycle constraint) – cont’d

You’ll have to create two separate schemas, and create a plan for each one of those schema, and finally stitch the two plans (by connecting the STUDENT table in plan A and PROFESSOR table in plan B, using the mentor_fk defined in STUDENT table).

Student

ClassRegistration

ClassProfesso

r

1 * * 1 * 1

Professor

Suppose the mozaic-rendering starts from mozaic-element A. Once the exploration on mozaic-element A is completed, exploration on mozaic-element B will be started, using the values extracted from the STUDENT table.

Page 23: DB Conan 1.0

Why introducing that constraint at the first place?Firstly, from my stand point as programmer: dealing

with cycle requires more programming, and I didn’t have time.

Secondly – this is more important I believe – I don’t want to confuse users by having the results of queries to a table, from various traversal paths, cramped into a single CSV file…, or having multiple CSV files for each query path, and twiddling with funky filenaming (e.g.: PROFESSOR_FROM_STUDENT_ROUTE_1.csv – I’ve tried that, and I said to myself wtf?). Each mozaic-element has its own folder, under a common

folder for the mozaic they belong to, for storing their CSV files.

So, I managed to convince myself cutting the cycle is a good idea after all.

Page 24: DB Conan 1.0

Mozaicing to load from multiple databases.This is possible because each plan that you

stitch together has its own database information (DB name, DB user, DB owner).The only limitation in the current version is: all

those databases must have the same password (as password information is not stored anywhere).

I actually use this capability in the current project, because there is separate database from each vendor of the products that we’re integrating.

Page 25: DB Conan 1.0

Mozaicing helps collaboration and coordination.In a large system there are several people, each

specializing / focusing on specific area of the system, looking only at a handful of tables (out of the 500++ tables in the system).

Of course, those parts don’t work in isolation. There are times we need to see how a change in one part (as a result of an operation) affects the other parts.

They’d put together the schema from each area / person, and define a data-mozaic out of them, and render it.

Page 26: DB Conan 1.0

Data mozaicing - demo

Page 27: DB Conan 1.0

Data mozaicing – cmds.Plan stitcher

dbconan_explorationplanner.exe e:\plan_1.xpp e:\plan_2.xpp f:\plan_3.xpp You can pass a variable a number of plans.

Mozaic rendererdbconan_mozaicrenderer.exe –a MYOHMY –m e:\

mymozaic_1.mzc -m : the path to the MZC file to be “rendered”. -a : the password for accessing the Oracle DB

Note: you can pass –H, -p, -n as well. Their values will override the values saved in the MZC (but of course it’s not recommended in case the plans in the mozaic are not from a single database).

Page 28: DB Conan 1.0

Can I trust the result?Yes you can . Allright, here’s an example

graph along with the walk. Suppose the exploration starts from B. Spot the zigzag manner in which the graph is traversed.

A B C

F

HG

D

I

J K

E

The walk: B C A F H D G E I K JThe walk: B C A F H D G E I K J

Page 29: DB Conan 1.0

Explanation of the walk.1. It zigzags. First, mark “going forward” as the current

direction (e.g.: from B’s standpoint, the link to C is a forward link).

2. Walk along the current direction. You’re zigging.3. Along the way, keep track of the nodes from which

there are link(s) going in the opposite direction (backward). Also mark the current node as “visited”. Do SQL query on the spot.

4. Once you hit the dead-end, mark “going backward” as the current direction.

5. From each and every node noted in step 3, go in current direction. You’re zagging. Do the same as in step 3, but now keep track of the nodes from which there are link(s) going forward.

6. Repeat the zigzagging until all the nodes in the graph are marked as visited.

Page 30: DB Conan 1.0

Explanation of the walk. I found the sequence of nodes doesn’t really matter.

It doesn’t matter if you go zagging first, or zigging first; the end result will be the same.

However, the way a node (table) is queried depends on whether it’s visited by a walk going “forward” or “backward”:

If the node is visited by a walk going “forward”, the node will be queried on its primary-key, using the values of the foreign-key in the previous node in the walk, that corresponds to the link that connects both nodes.

If the node is visited by a walk going “backward”, the node will be queried on its foreign-key, using the valus of the primary-key of the previous node in the walk.

Page 31: DB Conan 1.0

Downloading and running Download the application from

http://www.box.net/shared/5ag8ja4eli Just unzip it, you’ll find 5 EXE files in there. Currently this application works with Oracle database only.

So, here’s the possibilities: If you already have Oracle Express Edition installed, you

don’t have to do anything. Just execute the EXE from command line.

If you already have Oracle client 10.1 (or 10.2) installed, you don’t have to do anything else.

Otherwise: Download Oracle Instant Client 10.1 from Oracle’s website (free, but

you need to login, registration is free). Unzip the instant client on your computer (e.g.: c:\instantclient10.1) Execute the following lines on the command line before executing any

EXE of DBConan for the first time: SET ORACLE_HOME=C:\instantclient10.1 SET PATH=%ORACLE_HOME%;%PATH%

Page 32: DB Conan 1.0

What’s next?Obviously a rewrite, with a GUI, better user

experience, improved algorithm for graph analysis, more analytic tools, better output (preferrably an MS-Excel file, instead of a collection of CSV files. More handy for data analysis that works a alot with excel), etc.

More detail – such as manual / test cases – of this version will be made available on my site: http://jananuraga.blogspot.com

Feedbacks / participations in further development are gladly welcome. Please drop me an email: [email protected]

Page 33: DB Conan 1.0

Thank you, gracias, terima kasih, xie xie, matur suksma!