using xml files as real corpora making an xml database with the dbxml program

21
Using XML files as real corpora making an XML database with the dbXML program http://www.dbxml.com

Post on 19-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Using XML files as real corpora making an XML database with the dbXML program

Using XML files as real corpora

making an XML database with the dbXML program

http://www.dbxml.com

Page 2: Using XML files as real corpora making an XML database with the dbXML program

The dbXML program

• The dbXML program is one of a range of programs that lets you use a set of XML files as a database.

• The program is free and can be downloaded from the web.

• It is likely that many more programs like this will be springing up over the next couple of years.

Page 3: Using XML files as real corpora making an XML database with the dbXML program

Basic concepts

• Using a database requires the following basic concepts

– the set of files you are looking at is called a collection

– a collection of files must be indexed so that the program can find things quickly

– you ask questions by posting queries to the database manager

Page 4: Using XML files as real corpora making an XML database with the dbXML program

Using the dbXML program to manage an XML database

• Our starting point assumes that we have some set of marked-up XML files that we want to manage.

• We first set up these files as a database

• We then use the dbXML tool for extracting information from this database.

Page 5: Using XML files as real corpora making an XML database with the dbXML program

Example XML files in our data set

Page 6: Using XML files as real corpora making an XML database with the dbXML program

Steps…

• Now we will see:– how to add a collection of files to a database– how to index those files– how to ask queries to get information about

the content of those files

Page 7: Using XML files as real corpora making an XML database with the dbXML program

Getting started… (1)

• First, we need to start up the DBXML server program

This is the program the does all the actual work.

To do this:– Make sure you know where the dbxml folder is

– Run the program startup-server.bat in that folder (e.g., by double clicking on it).

– This should start the dbxml server with a message like:

dbXML 2.0 (Dragonfly)Logging to E:\junk\logging\dbXML.out

Page 8: Using XML files as real corpora making an XML database with the dbXML program

Getting started…(2)• Next, we turn a set of XML files into an XML

database. To do this we must start the dbxml administration program and tell it which files to use.– Start a DOS-Command window

– Make sure you know where the dbxml folder is

– Run the command ‘startup-command-line.bat’ that is in the dbxml folder

– This should then start the dbxml program and you should get something that looks like the window on the next slide…

Page 9: Using XML files as real corpora making an XML database with the dbXML program

The program when it starts…

Page 10: Using XML files as real corpora making an XML database with the dbXML program

The DBXML administration actions

• Now you can tell the program which files you want to include in your database.– To do this, you first have to login to the program:

You must use exactly this name and password for the moment!

– make a collection

– Finally, go to the collection and say that everyone is allowed to look at it and exit:

connect user= scott pass= tiger

mkcol myXMLfiles

col myXMLfilesgrant admin READ WRITE EXECUTE CREATEexit

Page 11: Using XML files as real corpora making an XML database with the dbXML program

The dbXML program proper

• With the administrative details aside, we can start the main program.

• Find the dbxml item in the normal program start menu from Windows and click on it.

• This should bring up the following window:

If it does not, or if you cannot find it, you will have to ask for help.

Page 12: Using XML files as real corpora making an XML database with the dbXML program

Finding your collection

Expand the items in the list under “localhost” until you find the collection that you made in the previous step.

Page 13: Using XML files as real corpora making an XML database with the dbXML program

Finding your collection

Page 14: Using XML files as real corpora making an XML database with the dbXML program

Adding files to your collection

Expand your collection to find the ‘documents’

Click on this.

Select ‘Documents>Import Documents’ from the menu bar.

You will then be asked which files are to be added to the collection.

Previous slide

Page 15: Using XML files as real corpora making an XML database with the dbXML program

When you have added your documents…

select them all at one go if possible

… you then have to index them…

Page 16: Using XML files as real corpora making an XML database with the dbXML program

Select the indexes folder in your collection…

Page 17: Using XML files as real corpora making an XML database with the dbXML program

Define an index as follows…

1. Give the index a name2. Then you must type “pattern=*@*” to index all

ELEMENTS + ATTRIBUTES3. and click on create.

1

2

3

Page 18: Using XML files as real corpora making an XML database with the dbXML program

… you can now ask questions about

their content

• using XPath

• XSLT

• full text

QUERY WINDOW

RESULT WINDOW

Page 19: Using XML files as real corpora making an XML database with the dbXML program

Selecting all ‘turns’ in the corpus

Page 20: Using XML files as real corpora making an XML database with the dbXML program

Selecting all ‘attrib’ in the corpus

Page 21: Using XML files as real corpora making an XML database with the dbXML program

The results….• are presented as

XML• therefore you can

pass them straight to a style sheet to look at them…