deixto powerful web data extraction tool freeware gui tool (built with turbo delphi, windows-only)...

8

Upload: juniper-lee

Post on 23-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)
Page 2: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

DEiXToPowerful web data extraction tool

Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl) DEiXToBot agent (implemented in Perl)

W3C Document Object Model (DOM)DOM-based extraction rules (wrappers).

Extracted data can be exported to a wide variety of formats (tab delimited, XML, RSS, etc).Command Line Executor:

has database support via the Database independent interface for Perl

supports additional formats: Excel, CSV, OpenDocument Spreadsheet (.ods), HTML

Page 3: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

GUI DEiXTo user friendly graphical interface

enhanced, tree based, extraction rules

HTML tag filtering

fast, flexible and high performance tree pattern matching algorithm

regular expression support

can follow "Next Page" links and submit simple forms

can export results to XML and tab delimited formats and create RSS feeds

XML encoded wrapper project files (.wpf) that can be executed at will

last but not least, it's freeware!

Page 4: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

DEiXTo Command Line Executor (CLE) portable, efficient and fast command line executor of GUI DEiXTo generated

wrappers

provides options and flexibility that you cannot get with GUI DEiXTo

supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet

provides database support via DBI (the Database independent interface for Perl)

supports HTML output using an HTML template processor and an editable template file

overwrite, append and prepend output modes for all supported formats

can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)

it is free and open source, distributed under the GNU General Public License (GPL) Version 3!

Page 5: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

DEiXToBotA Mechanize agent (essentially a browser emulator)

capable of extracting data of interest.

Flexible and efficient.

Allows extensive customization.

Supports multiple patterns on a single page and combination of their results.

Allows post-processing of the extracted data and enables you to transform it to any format you wish.

Programming skills required though to utilize it.

Page 6: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

Corgialenios Library use caseFrom HTML unstructured

data To ESE format!

Page 7: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

DEiXTo ServicesWe can definitely help you to:

transform the contents of your digital library into OAI-PMH or another suitable format

quickly populate product catalogues with full specifications

search various web resources in real time and extract the results returned

prepare large, focused datasets for scientific tasks (i.e. data mining)

monitor prices of the competition<your extraction task goes here!>

Page 8: DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl)

Happy DEiXTo users!

For further information, please visit http://deixto.com