http:// download the website to work locally tool: surf offline 1.0 create perl program to extract...

Post on 18-Jan-2016

241 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

http://www.banrep.gov.co

Download the website to work locally

Tool: Surf Offline 1.0

Create PERL program to extractwebsite structure information and

storage in Oracle

Create Oracle Schema to store data

Tools: TOAD 9.0, Oracle9i

Tools: PERL 5.8

Create PERL programs to crawl the website and store data

In Oracle

Tools: PERL 5.8

Create the Matrix with links structure

1

1 1

1 1

1

P1 P2 P3 P4

P1

P2P3

P4

Tool: Surf Offline 1.0

Excel

...

1

1 1

1 1

1

P1 P2 P3 P4

P1

P2P3

P4

What can of thinks can I do with this Matrix?

• Visualize the website structure

...

Internethttp://www.banrep.gov.co

TABLES

HTML_DOCUMENT

HTML_LINK

HTML_MATRIX

structure.pl

crawler.pl

PL/SQL

PERL 5.8

Excel

Different Formats

excel.pl

SQL

PAJEK

NETDRAW

Collect DataCollect Data

VisualizingVisualizing

Graph

Graph

VIEWSHTML_1_255

HTML_256_510

HTML_511_756

HTML_757_999

1000 Webpages

http://oracle92.is.umbc.edu:7778/isqlplus

Surfoffline

example1

example2

example3

ARCHITECTURE

structure.pl

crawler.pl

PL/SQL

SQL Scripts to create input files with different formats

top related