1 relaxation of xml documents wesley w. chu computer science department ucla [email protected] september...
Post on 22-Dec-2015
214 views
TRANSCRIPT
1
Relaxation of XML Documents
Wesley W. Chu
Computer Science Department
UCLA
September 19, 2000
2
Outline
Goals Transformation of XML to Relational Tables Relaxation Example queries Future directions
3
Goals
Given an XML file as sample data, map it into relational database
Generate TAHs on attributes from relational tables
Execute queries and perform relaxation on sample data
4
Architecture
SQL
XMLdoc
XMLdocXML
doc
1
RDB
TAH
query processingquery relaxation
23
RelaxedAnswers
XML Spy DTD
XPRESS
extract DTD from XML file
schema mapdata map
4 generate TAHs5
5
Extracting DTD From XML
Translation of XML to relational tables requires a DTD that provides schema for XML file
DTD for XML file can be extracted from the file using XML Spy tool
6
Ato.xml (provided by Stuart Hirshfield of AFRL [email protected])
<air_to_cofn> <operation_identification_data setid="OPER"> <operation_codeword> DESERT STORM (DECLASSIFIED) </operation_codeword> </operation_identification_data> <message_identification setid="MSGID">
<message_text_format_identifier> ATOCONF </message_text_format_identifier> <originator> USCENTAF_CMBT PLANS </originator> </message_identification> <effective_day_time_period setid="PERID"> … </effective_day_time_period>
7
Ato.xml (Cont)
<air_tasking_group> <air_tasking setid="AIRTASK"> UNIT TASKING </air_tasking> <task_unit_loc_gp>
<task_unit_and_location setid="TASKUNIT"> <tasked_unit_designator> C41 </tasked_unit_designator> </task_unit_and_location> <air_mission_data_group> <air_mission_data setid="MSNDAT">
<mission_number> 7353 </mission_number>
<package_identification> ZZN …
8
Ato.xml (Cont)
<air_mission_data_subgroup> <iff_sif_mode_and_code> 32053 </iff_sif_mode_and_code> </air_mission_data_subgroup> </air_mission_data> <mission_location setid="MSNLOC"> <mission_start_day_time> 240100Z </mission_start_day_time>
<mission_location_subgroup> <point_name> DAGGER </point_name>
</mission_location_subgroup> </mission_location> <amplification setid="AMPN"> <free_text> REMARK IDENTIFIER(S): G …
9
Ato.dtd (Generated From XML File)
<!ELEMENT air_to_cofn (operation_identification_data, message_identification, effective_day_time_period, air_tasking_group)>
<!ELEMENT operation_identification_data (operation_codeword)>
<!ELEMENT operation_codeword (#PCDATA)><!ELEMENT message_identification
(message_text_format_identifier, originator)><!ATTLIST message_identification setid CDATA #REQUIRED><!ELEMENT originator (#PCDATA)><!ELEMENT message_text_format_identifier (#PCDATA)><!ELEMENT air_tasking_group (air_tasking,
task_unit_loc_gp+)><!ELEMENT air_tasking (#PCDATA)><!ELEMENT task_unit_loc_gp (task_unit_and_location,
air_mission_data_group+)>
10
Ato.dtd (Cont)
<!ELEMENT air_mission_data_group (air_mission_data+ | mission_location+ | target_location+ | recon_mis_info_gp+ | refueling_information+ | air_ref_info+ | amplification+ | narrative_information+ | remarks+)+>
<!ELEMENT air_mission_data (mission_number, package_identification, aircraft_call_sign, num_type_aircraft_model, mission_type, alert_status, primary_configuration_code, secondary_configuration_code, air_mission_data_subgroup+)>
<!ELEMENT mission_number (#PCDATA)> …<!ELEMENT mission_location (mission_start_day_time,
mission_stop_day_time, mission_location_name, altitude_in_hundreds_of_feet?, air_support_request_number?, mission_location_subgroup?)>
11
Ato.dtd (Cont)
<!ELEMENT amplification (free_text)><!ATTLIST amplification setid CDATA #REQUIRED><!ELEMENT target_location (on_target_day_time,
off_target_day_time, target_identifier_be_number?, mission_location_name?, target_type, dmpi_lat_long_seconds?, dmpi_lat_long_minutes?, air_support_request_number?, dmpi_lat_long_deciseconds?, target_comments?, air_support_request_number?)>
<!ELEMENT recon_mis_info (request_number, mission_priority, on_target_day_time, latest_date_time, recon_mis_type, type_recon_sur_coverage, imagery_type, image_qualifier, coverage_extent_and_mode, recon_target_code, scale_of_prints, recon_mis_info_subgroup)> …
12
XPRESS
XPRESS (Xml Processing and Relaxation in rElational Storage System) project @ UCLA/CSD Transfers XML file with DTD to relational tables
Efficiently: handles a large number of XML documents
Accurately: retains structure & constraint Automatically: does not require human
intervention The sample Ato XML file is 366KB, translating into 15
tables and ~900 tuples. Generated DTD is 6KB Xpress has been tested with over 60MB of XML data,
translating into over half a million tuples
13
Steps
Schema mapping: CREATE TABLE statements are generated by XPRESS from the DTD
An intermediate Map object is created to provide XPRESS data mapping information
Data mapping: Using the Map object, XPRESS transfers the XML file into the database
14
Generated Tables for Ato Data
air_to_cofn (1) task_unit_loc_gp (24)
• air_mission_data_group (129)– target_location– narrative_information– amplification– remarks– air_mission_data
» air_mission_data_subgroup– recon_mis_info_gp
» location_trace_plot– refueling_information– air_ref_info
» air_ref_info_subgroup– mission_location
15
Query Relaxation for XML File
When no answer or not enough answers are found, derive approximate answers by relaxing the query condition TAH: knowledge structure to aid relaxation TAHs are generated from translated relational
tables
17
Example Query 1
Find the id of any task unit location group that has an air refueling subgroup with 25,000 pounds of offload fuel. Also list the amount offload fuel
SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB
FROM AIR_REF_INFO_SUBGROUP, AIR_REF_INFO, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP
WHERE FK_AIR_REF_INFO = AIR_REF_INFO.ID
AND FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID
AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID
AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = 25
Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB
-- --------------------------
2524116 25
18
Relax Query 1
Relax previous query to an offload fuel of approximately 25,000 lbs, and at least 5 answers
SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB…AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = ~25
AT-LEAST 5
Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB --------------- -------------------------- 2524048 10 2524048 20 2524048 26 2524116 14 2524116 20 2524116 24 2524116 25 --------------- -------------------------- # of answer tuples: 7
20
Example Query 2
Find the call sign of all aircraft that has a 10 minute alert status. Also list its tasked unit designator and alert status
SELECT AIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUSFROM AIR_MISSION_DATA, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GPWHERE FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.IDAND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.IDAND AIR_MISSION_DATA.ALERT_STATUS = 10
AnswerCALL_SIGN DESIGNATOR ALERT_STATUS
--------- ---------- ------------ VOLVO 51 4TFW 10
21
Relax Query 2
Relax previous query to an alert status of approximately 10 minutes, and at least 3 answers
SELECT AIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS
…AND AIR_MISSION_DATA.ALERT_STATUS = ~10
AT-LEAST 3 Answer CALL_SIGN DESIGNATOR ALERT_STATUS
--------------- --------------- --------------- VOLVO 51 4TFW 10
ARCO 05 1TFW 5 ARCO 03 1TFW 5 STANDARD 63 33TFW 5 STANDARD 61 33TFW 5 ARCO 01 1TFW 5 --------------- --------------- --------------- # of answer tuples: 6
23
Example Query 3
Find the mission location at an altitude of 22,000 ft. Also list any remarks on amplification
SELECT MISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXTFROM MISSION_LOCATION, AMPLIFICATIONWHERE MISSION_LOCATION.FK_AIR_MISSION_DATA_GROUP = AMPLIFICATION.FK_AIR_MISSION_DATA_GROUPAND ALTITUDE_IN_HUNDREDS_OF_FEET = 220
AnswerMISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT---------------- ------------------ ----------no rows selected
24
Relax Query 3
Relax previous query to an altitude of approximately 22,000 ft, and at least 7 answers
SELECT MISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT…AND ALTITUDE_IN_HUNDREDS_OF_FEET = ~220AT-LEAST 7
Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT
---------------- ------------------ ---------- ABCCC 4 190 IDENTIFIER(S): E F H
ABCCC 5 200 IDENTIFIER(S): A C F G ABCCC 5 200 IDENTIFIER(S): C F G
CADDY HIGH 210 IDENTIFIER(S): A E RAISIN PRE 190 IDENTIFIER(S): A W F RAISIN PRE 200 IDENTIFIER(S): A W F RAISIN PRE 210 IDENTIFIER(S): A W F
26
Future Directions
More XML test data Current query relaxation is performed by
CoBase with CoSQL Future research direction
Investigating query relaxation via CoXQL Generating non-numerical TAHs from semi-
structured data
28
air_to_cofn
CREATE TABLE air_to_cofn (
originator VARCHAR(100),
to_day_time VARCHAR(100),
id NUMBER,
operation_codeword VARCHAR(100),
flddesc VARCHAR(100),
root_elm VARCHAR(100),
from_day_time VARCHAR(100),
air_tasking VARCHAR(100), message_text_format_identifier VARCHAR(100),
setid VARCHAR(100),
PRIMARY KEY (id)
);
29
task_unit_loc_gp
CREATE TABLE task_unit_loc_gp (
fk_air_to_cofn NUMBER, tasked_unit_designator VARCHAR(100),
id NUMBER,
setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),
FOREIGN KEY (fk_air_to_cofn) REFERENCES air_to_cofn(id), PRIMARY KEY (id)
);
30
air_mission_data_group
CREATE TABLE air_mission_data_group (
fk_task_unit_loc_gp NUMBER,
id NUMBER,
root_elm VARCHAR(100),
FOREIGN KEY (fk_task_unit_loc_gp) REFERENCES task_unit_loc_gp(id),
PRIMARY KEY (id)
);
31
target_location
CREATE TABLE target_location (on_target_day_time VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, dmpi_lat_long_minutes VARCHAR(100), target_type VARCHAR(100), target_identifier_be_nmber VARCHAR(100), root_elm VARCHAR(100), off_target_day_time VARCHAR(100), mission_location_name VARCHAR(100), target_comments VARCHAR(100), dmpi_lat_long_seconds VARCHAR(100), dmpi_lat_long_deciseconds VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
32
narrative_information
CREATE TABLE narrative_information (
free_text VARCHAR(100), fk_air_mission_data_group NUMBER,
setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),
id NUMBER,
FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),
PRIMARY KEY (id)
);
33
amplification
CREATE TABLE amplification (
free_text VARCHAR(100), fk_air_mission_data_group NUMBER,
setid VARCHAR(100),
root_elm VARCHAR(100),
id NUMBER,
FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),
PRIMARY KEY (id)
);
34
remarks
CREATE TABLE remarks (
free_text VARCHAR(100), fk_air_mission_data_group NUMBER,
setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),
id NUMBER,
FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),
PRIMARY KEY (id)
);
35
air_mission_data
CREATE TABLE air_mission_data (mission_number VARCHAR(100), primary_configuration_code VARCHAR(100), id NUMBER, fk_air_mission_data_group NUMBER, package_identification VARCHAR(100), num_type_aircraft_model VARCHAR(100), mission_type VARCHAR(100), root_elm VARCHAR(100), alert_status NUMBER, aircraft_call_sign VARCHAR(100), secondary_configuration_code VARCHAR(100), setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
36
recon_mis_info_gp
CREATE TABLE recon_mis_info_gp (coverage_extent_and_mode VARCHAR(100), on_target_day_time VARCHAR(100), id NUMBER, imagery_type VARCHAR(100), fk_air_mission_data_group NUMBER, flddesc VARCHAR(100), delivery_address VARCHAR(100), type_recon_sur_coverage VARCHAR(100), recon_target_code VARCHAR(100), scale_of_prints NUMBER, root_elm VARCHAR(100), image_qualifier VARCHAR(100), latest_date_time VARCHAR(100), mission_priority NUMBER, request_number VARCHAR(100), recon_mis_type VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
37
location_trace_plot
CREATE TABLE location_trace_plot (
location_lat_long_seconds VARCHAR(100), fk_recon_mis_info_gp NUMBER,
setid VARCHAR(100),
root_elm VARCHAR(100),
id NUMBER,
FOREIGN KEY (fk_recon_mis_info_gp) REFERENCES recon_mis_info_gp(id),
PRIMARY KEY (id)
);
38
refueling_information
CREATE TABLE refueling_information (id NUMBER, fk_air_mission_data_group NUMBER, total_offload_fuel_thou_lb NUMBER, primary_frequency_designator VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), air_ref_cont_pn VARCHAR(100), air_refueling_control_day_time VARCHAR(100), tanker_mission_number VARCHAR(100), primary_frequency_in_megahertz NUMBER, tanker_call_sign VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id)
);
39
air_mission_data_subgroup
CREATE TABLE air_mission_data_subgroup (
iff_sif_mode_and_code NUMBER,
fk_air_mission_data NUMBER,
root_elm VARCHAR(100),
id NUMBER,
FOREIGN KEY (fk_air_mission_data) REFERENCES air_mission_data(id),
PRIMARY KEY (id)
);
40
air_ref_info
CREATE TABLE air_ref_info (
fk_air_mission_data_group NUMBER,
id NUMBER,
setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100),
FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id),
PRIMARY KEY (id)
);
41
air_ref_info_subgroup
CREATE TABLE air_ref_info_subgroup (mission_number VARCHAR(100), id NUMBER, tanker_assignment VARCHAR(100), refueling_fuel_type VARCHAR(100), total_offload_fuel_thou_lb NUMBER, num_type_aircraft_model VARCHAR(100), fk_air_ref_info NUMBER, root_elm VARCHAR(100), air_refueling_control_day_time VARCHAR(100), aircraft_call_sign VARCHAR(100), FOREIGN KEY (fk_air_ref_info) REFERENCES air_ref_info(id), PRIMARY KEY (id)
);
42
mission_location
CREATE TABLE mission_location (
mission_stop_day_time VARCHAR(100), area_coor_lat_long_min VARCHAR(100),
id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER,
point_name VARCHAR(100),
flddesc VARCHAR(100),
root_elm VARCHAR(100), mission_location_name VARCHAR(100), mission_start_day_time VARCHAR(100), altitude_in_hundreds_of_feet NUMBER,
setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));