1 relaxation of xml documents wesley w. chu computer science department ucla [email protected] september...

42
1 Relaxation of XML Documents Wesley W. Chu Computer Science Department UCLA wwc@ucla. edu September 19, 2000

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

Relaxation of XML Documents

Wesley W. Chu

Computer Science Department

UCLA

[email protected]

September 19, 2000

2

Outline

Goals Transformation of XML to Relational Tables Relaxation Example queries Future directions

3

Goals

Given an XML file as sample data, map it into relational database

Generate TAHs on attributes from relational tables

Execute queries and perform relaxation on sample data

4

Architecture

SQL

XMLdoc

XMLdocXML

doc

1

RDB

TAH

query processingquery relaxation

23

RelaxedAnswers

XML Spy DTD

XPRESS

extract DTD from XML file

schema mapdata map

4 generate TAHs5

5

Extracting DTD From XML

Translation of XML to relational tables requires a DTD that provides schema for XML file

DTD for XML file can be extracted from the file using XML Spy tool

6

Ato.xml (provided by Stuart Hirshfield of AFRL [email protected])

<air_to_cofn> <operation_identification_data setid="OPER"> <operation_codeword> DESERT STORM (DECLASSIFIED) </operation_codeword> </operation_identification_data> <message_identification setid="MSGID">

<message_text_format_identifier> ATOCONF </message_text_format_identifier> <originator> USCENTAF_CMBT PLANS </originator> </message_identification> <effective_day_time_period setid="PERID"> … </effective_day_time_period>

7

Ato.xml (Cont)

<air_tasking_group> <air_tasking setid="AIRTASK"> UNIT TASKING </air_tasking> <task_unit_loc_gp>

<task_unit_and_location setid="TASKUNIT"> <tasked_unit_designator> C41 </tasked_unit_designator> </task_unit_and_location> <air_mission_data_group> <air_mission_data setid="MSNDAT">

<mission_number> 7353 </mission_number>

<package_identification> ZZN …

8

Ato.xml (Cont)

<air_mission_data_subgroup> <iff_sif_mode_and_code> 32053 </iff_sif_mode_and_code> </air_mission_data_subgroup> </air_mission_data> <mission_location setid="MSNLOC"> <mission_start_day_time> 240100Z </mission_start_day_time>

<mission_location_subgroup> <point_name> DAGGER </point_name>

</mission_location_subgroup> </mission_location> <amplification setid="AMPN"> <free_text> REMARK IDENTIFIER(S): G …

9

Ato.dtd (Generated From XML File)

<!ELEMENT air_to_cofn (operation_identification_data, message_identification, effective_day_time_period, air_tasking_group)>

<!ELEMENT operation_identification_data (operation_codeword)>

<!ELEMENT operation_codeword (#PCDATA)><!ELEMENT message_identification

(message_text_format_identifier, originator)><!ATTLIST message_identification setid CDATA #REQUIRED><!ELEMENT originator (#PCDATA)><!ELEMENT message_text_format_identifier (#PCDATA)><!ELEMENT air_tasking_group (air_tasking,

task_unit_loc_gp+)><!ELEMENT air_tasking (#PCDATA)><!ELEMENT task_unit_loc_gp (task_unit_and_location,

air_mission_data_group+)>

10

Ato.dtd (Cont)

<!ELEMENT air_mission_data_group (air_mission_data+ | mission_location+ | target_location+ | recon_mis_info_gp+ | refueling_information+ | air_ref_info+ | amplification+ | narrative_information+ | remarks+)+>

<!ELEMENT air_mission_data (mission_number, package_identification, aircraft_call_sign, num_type_aircraft_model, mission_type, alert_status, primary_configuration_code, secondary_configuration_code, air_mission_data_subgroup+)>

<!ELEMENT mission_number (#PCDATA)> …<!ELEMENT mission_location (mission_start_day_time,

mission_stop_day_time, mission_location_name, altitude_in_hundreds_of_feet?, air_support_request_number?, mission_location_subgroup?)>

11

Ato.dtd (Cont)

<!ELEMENT amplification (free_text)><!ATTLIST amplification setid CDATA #REQUIRED><!ELEMENT target_location (on_target_day_time,

off_target_day_time, target_identifier_be_number?, mission_location_name?, target_type, dmpi_lat_long_seconds?, dmpi_lat_long_minutes?, air_support_request_number?, dmpi_lat_long_deciseconds?, target_comments?, air_support_request_number?)>

<!ELEMENT recon_mis_info (request_number, mission_priority, on_target_day_time, latest_date_time, recon_mis_type, type_recon_sur_coverage, imagery_type, image_qualifier, coverage_extent_and_mode, recon_target_code, scale_of_prints, recon_mis_info_subgroup)> …

12

XPRESS

XPRESS (Xml Processing and Relaxation in rElational Storage System) project @ UCLA/CSD Transfers XML file with DTD to relational tables

Efficiently: handles a large number of XML documents

Accurately: retains structure & constraint Automatically: does not require human

intervention The sample Ato XML file is 366KB, translating into 15

tables and ~900 tuples. Generated DTD is 6KB Xpress has been tested with over 60MB of XML data,

translating into over half a million tuples

13

Steps

Schema mapping: CREATE TABLE statements are generated by XPRESS from the DTD

An intermediate Map object is created to provide XPRESS data mapping information

Data mapping: Using the Map object, XPRESS transfers the XML file into the database

14

Generated Tables for Ato Data

air_to_cofn (1) task_unit_loc_gp (24)

• air_mission_data_group (129)– target_location– narrative_information– amplification– remarks– air_mission_data

» air_mission_data_subgroup– recon_mis_info_gp

» location_trace_plot– refueling_information– air_ref_info

» air_ref_info_subgroup– mission_location

15

Query Relaxation for XML File

When no answer or not enough answers are found, derive approximate answers by relaxing the query condition TAH: knowledge structure to aid relaxation TAHs are generated from translated relational

tables

16

CoBase Architecture

17

Example Query 1

Find the id of any task unit location group that has an air refueling subgroup with 25,000 pounds of offload fuel. Also list the amount offload fuel

SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB

FROM AIR_REF_INFO_SUBGROUP, AIR_REF_INFO, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP

WHERE FK_AIR_REF_INFO = AIR_REF_INFO.ID

AND FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID

AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID

AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = 25

Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB

-- --------------------------

2524116 25

18

Relax Query 1

Relax previous query to an offload fuel of approximately 25,000 lbs, and at least 5 answers

SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB…AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = ~25

AT-LEAST 5

Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB --------------- -------------------------- 2524048 10 2524048 20 2524048 26 2524116 14 2524116 20 2524116 24 2524116 25 --------------- -------------------------- # of answer tuples: 7

19

TAH total_offload_fuel_thou_lb

20

Example Query 2

Find the call sign of all aircraft that has a 10 minute alert status. Also list its tasked unit designator and alert status

SELECT AIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUSFROM AIR_MISSION_DATA, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GPWHERE FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.IDAND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.IDAND AIR_MISSION_DATA.ALERT_STATUS = 10

AnswerCALL_SIGN DESIGNATOR ALERT_STATUS

--------- ---------- ------------ VOLVO 51 4TFW 10

21

Relax Query 2

Relax previous query to an alert status of approximately 10 minutes, and at least 3 answers

SELECT AIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS

…AND AIR_MISSION_DATA.ALERT_STATUS = ~10

AT-LEAST 3 Answer CALL_SIGN DESIGNATOR ALERT_STATUS

--------------- --------------- --------------- VOLVO 51 4TFW 10

ARCO 05 1TFW 5 ARCO 03 1TFW 5 STANDARD 63 33TFW 5 STANDARD 61 33TFW 5 ARCO 01 1TFW 5 --------------- --------------- --------------- # of answer tuples: 6

22

TAH alert_status

23

Example Query 3

Find the mission location at an altitude of 22,000 ft. Also list any remarks on amplification

SELECT MISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXTFROM MISSION_LOCATION, AMPLIFICATIONWHERE MISSION_LOCATION.FK_AIR_MISSION_DATA_GROUP = AMPLIFICATION.FK_AIR_MISSION_DATA_GROUPAND ALTITUDE_IN_HUNDREDS_OF_FEET = 220

AnswerMISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT---------------- ------------------ ----------no rows selected

24

Relax Query 3

Relax previous query to an altitude of approximately 22,000 ft, and at least 7 answers

SELECT MISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT…AND ALTITUDE_IN_HUNDREDS_OF_FEET = ~220AT-LEAST 7

Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT

---------------- ------------------ ---------- ABCCC 4 190 IDENTIFIER(S): E F H

ABCCC 5 200 IDENTIFIER(S): A C F G ABCCC 5 200 IDENTIFIER(S): C F G

CADDY HIGH 210 IDENTIFIER(S): A E RAISIN PRE 190 IDENTIFIER(S): A W F RAISIN PRE 200 IDENTIFIER(S): A W F RAISIN PRE 210 IDENTIFIER(S): A W F

25

TAH altitude

26

Future Directions

More XML test data Current query relaxation is performed by

CoBase with CoSQL Future research direction

Investigating query relaxation via CoXQL Generating non-numerical TAHs from semi-

structured data

27

Appendix – Generated Tables

28

air_to_cofn

CREATE TABLE air_to_cofn (

originator VARCHAR(100),

to_day_time VARCHAR(100),

id NUMBER,

operation_codeword VARCHAR(100),

flddesc VARCHAR(100),

root_elm VARCHAR(100),

from_day_time VARCHAR(100),

air_tasking VARCHAR(100), message_text_format_identifier VARCHAR(100),

setid VARCHAR(100),

PRIMARY KEY (id)

);

29

task_unit_loc_gp

CREATE TABLE task_unit_loc_gp (

fk_air_to_cofn NUMBER, tasked_unit_designator VARCHAR(100),

id NUMBER,

setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),

FOREIGN KEY (fk_air_to_cofn) REFERENCES air_to_cofn(id), PRIMARY KEY (id)

);

30

air_mission_data_group

CREATE TABLE air_mission_data_group (

fk_task_unit_loc_gp NUMBER,

id NUMBER,

root_elm VARCHAR(100),

FOREIGN KEY (fk_task_unit_loc_gp) REFERENCES task_unit_loc_gp(id),

PRIMARY KEY (id)

);

31

target_location

CREATE TABLE target_location (on_target_day_time VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, dmpi_lat_long_minutes VARCHAR(100), target_type VARCHAR(100), target_identifier_be_nmber VARCHAR(100), root_elm VARCHAR(100), off_target_day_time VARCHAR(100), mission_location_name VARCHAR(100), target_comments VARCHAR(100), dmpi_lat_long_seconds VARCHAR(100), dmpi_lat_long_deciseconds VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

32

narrative_information

CREATE TABLE narrative_information (

free_text VARCHAR(100), fk_air_mission_data_group NUMBER,

setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),

id NUMBER,

FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),

PRIMARY KEY (id)

);

33

amplification

CREATE TABLE amplification (

free_text VARCHAR(100), fk_air_mission_data_group NUMBER,

setid VARCHAR(100),

root_elm VARCHAR(100),

id NUMBER,

FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),

PRIMARY KEY (id)

);

34

remarks

CREATE TABLE remarks (

free_text VARCHAR(100), fk_air_mission_data_group NUMBER,

setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),

id NUMBER,

FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),

PRIMARY KEY (id)

);

35

air_mission_data

CREATE TABLE air_mission_data (mission_number VARCHAR(100), primary_configuration_code VARCHAR(100), id NUMBER, fk_air_mission_data_group NUMBER, package_identification VARCHAR(100), num_type_aircraft_model VARCHAR(100), mission_type VARCHAR(100), root_elm VARCHAR(100), alert_status NUMBER, aircraft_call_sign VARCHAR(100), secondary_configuration_code VARCHAR(100), setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

36

recon_mis_info_gp

CREATE TABLE recon_mis_info_gp (coverage_extent_and_mode VARCHAR(100), on_target_day_time VARCHAR(100), id NUMBER, imagery_type VARCHAR(100), fk_air_mission_data_group NUMBER, flddesc VARCHAR(100), delivery_address VARCHAR(100), type_recon_sur_coverage VARCHAR(100), recon_target_code VARCHAR(100), scale_of_prints NUMBER, root_elm VARCHAR(100), image_qualifier VARCHAR(100), latest_date_time VARCHAR(100), mission_priority NUMBER, request_number VARCHAR(100), recon_mis_type VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

37

location_trace_plot

CREATE TABLE location_trace_plot (

location_lat_long_seconds VARCHAR(100), fk_recon_mis_info_gp NUMBER,

setid VARCHAR(100),

root_elm VARCHAR(100),

id NUMBER,

FOREIGN KEY (fk_recon_mis_info_gp) REFERENCES recon_mis_info_gp(id),

PRIMARY KEY (id)

);

38

refueling_information

CREATE TABLE refueling_information (id NUMBER, fk_air_mission_data_group NUMBER, total_offload_fuel_thou_lb NUMBER, primary_frequency_designator VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), air_ref_cont_pn VARCHAR(100), air_refueling_control_day_time VARCHAR(100), tanker_mission_number VARCHAR(100), primary_frequency_in_megahertz NUMBER, tanker_call_sign VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id)

);

39

air_mission_data_subgroup

CREATE TABLE air_mission_data_subgroup (

iff_sif_mode_and_code NUMBER,

fk_air_mission_data NUMBER,

root_elm VARCHAR(100),

id NUMBER,

FOREIGN KEY (fk_air_mission_data) REFERENCES air_mission_data(id),

PRIMARY KEY (id)

);

40

air_ref_info

CREATE TABLE air_ref_info (

fk_air_mission_data_group NUMBER,

id NUMBER,

setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),

FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),

PRIMARY KEY (id)

);

41

air_ref_info_subgroup

CREATE TABLE air_ref_info_subgroup (mission_number VARCHAR(100), id NUMBER, tanker_assignment VARCHAR(100), refueling_fuel_type VARCHAR(100), total_offload_fuel_thou_lb NUMBER, num_type_aircraft_model VARCHAR(100), fk_air_ref_info NUMBER, root_elm VARCHAR(100), air_refueling_control_day_time VARCHAR(100), aircraft_call_sign VARCHAR(100), FOREIGN KEY (fk_air_ref_info) REFERENCES air_ref_info(id), PRIMARY KEY (id)

);

42

mission_location

CREATE TABLE mission_location (

mission_stop_day_time VARCHAR(100), area_coor_lat_long_min VARCHAR(100),

id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER,

point_name VARCHAR(100),

flddesc VARCHAR(100),

root_elm VARCHAR(100), mission_location_name VARCHAR(100), mission_start_day_time VARCHAR(100), altitude_in_hundreds_of_feet NUMBER,

setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));