goal 2 activities 4, 6, 7 letizia tanca politecnico di milano

38
Goal 2 Goal 2 Activities 4, 6, 7 Activities 4, 6, 7 Letizia Tanca Letizia Tanca Politecnico di Milano Politecnico di Milano

Upload: austin-rivera

Post on 26-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Goal 2Goal 2Activities 4, 6, 7Activities 4, 6, 7

Letizia TancaLetizia Tanca

Politecnico di MilanoPolitecnico di Milano

Page 2: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Goal 2: Goal 2: Knowledge ManagementKnowledge Management(Polimi)(Polimi)

Activity 4:Activity 4: Knowledge extraction from natural language actions (Polimi + IBM + Bari)

Activity 6:Activity 6: Knowledge extraction, modeling and integration from semi-structured information sources, driven by domain ontologies (PoliMI)

Activity 7:Activity 7: Knowledge fusion, “tailoring” and dissemination for business model redesign (PoliMI)

Page 3: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Context-aware Web PortalContext-aware Web Portal

Page 4: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Contextual data analysisContextual data analysis At GialloRosso the oenologist and the agronomist

interact with the data related to harvesting and to the wine ageing – the information they interact with depend on their role and

on the workflow phase– The agronomist inserts information related to the nature of

the natural phoenomena– The agronomist and the oenologist ask information related

to the phase

At BiancoRosso the sales manager:– analyzes sales data – in a different moment analyzes the market trends, then – reads similar information in natural language from the web

GialloRosso performs market analyses by accessing its own information combined with market information collected by its ally BiancoRosso

Page 5: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

GialloRosso Logical SchemaGialloRosso Logical Schema

Page 6: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

BiancoRosso Logical SchemaBiancoRosso Logical Schema

VINO(ID_Vino, nome, vinificazione, invecchiamento, denominazione, temperatura, min_temp, note)

EVENTO(ID_Evento, nome, tipo, data, luogo)

TRENDSETTER(ID_Trend, nome, professione)

FONTE(ID_Fonte, nome, uri, tipo, rilevanza, provenienza, descrizione)

DOCUMENTO(ID_Doc, riassunto, url, data, autore, titolo, argomento, descrittore, ID_Fonte)

VALUTAZIONEMERITO(ID_valutazione, descrizione, giudizio, lingua)

RISULTATORICERCA(ID_risultato, ID_vino, ID_evento, ID_trend, ID_fonte, ID_doc, posizione, ID_valutazione)

Page 7: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Context-aware data tailoringContext-aware data tailoring

Page 8: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Data tailoring via view Data tailoring via view compositioncomposition

Page 9: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Context Dimension TreeContext Dimension Tree

Page 10: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Some relevant areasSome relevant areas

Page 11: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

AT GIALLOROSSO THE OENOLOGIST AND AT GIALLOROSSO THE OENOLOGIST AND THE AGRONOMIST INTERACT WITH THE THE AGRONOMIST INTERACT WITH THE DATA RELATED TO CULTIVATION AND TO DATA RELATED TO CULTIVATION AND TO THE CELLARTHE CELLAR

Page 12: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

A PORTION OF THE CDT A PORTION OF THE CDT OF OUR SCENARIOOF OUR SCENARIO

oenologist

Page 13: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Some contextual viewsSome contextual views

C1=<role=agronomist, *, phase=harvesting>

C2 =<role=agronomist, *, phase=ageing> C3=<role=enologist, *, phase=harvesting>C4 =<role=enologist, *, phase=ageing>

Page 14: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Some contextual queriesSome contextual queries

The agronomist during the harvesting phase (context C1) wants to collect all the available information coming from sensors:

SELECT m.date_time,m.value,s.s_id,s.meas_unit FROM sensor s, measure_data m

WHERE s.s_id=m.s_id;

S/he obtains only the information from sensors placed in the vineyards (see Rel(C1))

Page 15: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Some contextual queriesSome contextual queries

The oenologist during the harvesting phase (context C3) wants to collect all the available information about bottles of “Aglianico” wine:

SELECT * FROM bottle b WHERE b.appellation="aglianico";But the query is out of context, in the

context C3 only information about vineyard and grapevine are available for the oenologist.

Page 16: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Some more contextual queriesSome more contextual queries

The previous query makes sense in context C4, where the oenologist is in the ageing phase:

SELECT * FROM bottle b WHERE b.appellation="aglianico";Produces a non- empty result.

Page 17: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

AT BIANCOROSSO:AT BIANCOROSSO:

1.1.THE SALES MANAGER ANALYZES SALES DATA THE SALES MANAGER ANALYZES SALES DATA

2.2.THE OENOLOGIST ANALYZES WINE FEATURES TO DESIGN A NEW THE OENOLOGIST ANALYZES WINE FEATURES TO DESIGN A NEW WINEWINE

3.3.THEN S/HE READS SIMILAR INFORMATION IN NATURAL LANGUAGE THEN S/HE READS SIMILAR INFORMATION IN NATURAL LANGUAGE FROM THE WEBFROM THE WEB

4.4.ALSO INTENSIONAL QUERIES ARE PERFORMED ALSO INTENSIONAL QUERIES ARE PERFORMED

Page 18: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Sales and promotions planning Sales and promotions planning (Q1)(Q1)Sales and promotions planning for events

and festivals The sales manager of BiancoRosso wants

to select the wines to promote for each event or festival– For each event or type of event he/she needs to

identify the most related wines– Interesting wines for each event can be

obtained by analyzing frequent rules in the form• EventType=value → Wine=value • E.g., EventType=“Summer party” → Wine=“White

wine” support=20%, confidence=36%

Page 19: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Sales and promotions planning Sales and promotions planning (Q2)(Q2)Sales and promotions planning depending on

time periods The sales manager wants to plan specific

promotions for each time period of the year– For each time period (e.g., month) the manager

needs to select the most related wines– Interesting wines can be obtained by analyzing

frequent rules in the form• Month=value → Wine=value • E.g., Month=“June” → Wine=“White wine”

support=20%, confidence=36%

Page 20: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Design of wine (Q3)Design of wine (Q3)

Analysis of the main characteristics of wines The oenologist of BiancoRosso wants to

produce new wines He/she needs to know the main

characteristics of each wine to select the most interesting wines to produce– He/she obtains the characteristics of each

wine by exploiting rules in the form• Wine=value → Characteristic=value• E.g., Wine=“White wine” → Characteristic=“Mainly

drunk in a specific time period” support=6%, confidence=100%

Page 21: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Design of wine (Q4)Design of wine (Q4)

Identification of correlations between wines and time periods

The time period in which each wine is mainly consumed is useful to select the wines to produce

For each wine the oenologist wants to obtain the time period (e.g., month) in which the wine is mainly consumed– Allows selecting wines related to time periods not

already covered by the wines currently produced by BiancoRosso

– He/she uses rules in the form• Wine=value → Month=value • E.g., Wine=“White wine” → Month=“June” support=20%,

confidence=100%

Page 22: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Design of wine (Q5)Design of wine (Q5)

Identification of correlations between wines and information sources

Once the oenologist has selected the new wines to be produced, he/she needs to identify the sources containing documents related to the selected wines– The oenologist identifies the sources containing

information about the wines of his/her interest by exploiting the following rules• Wine=value → Source=value • E.g., Wine=“Montello e colli asolani cabernet

superiore” → Source=“Gambero Rosso” support=11%, confidence=100%

Page 23: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

DIESIRAEDIESIRAEA semantic search engine A semantic search engine based on based on Natural Language ProcessingNatural Language Processing

Page 24: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Knowledge ManagementKnowledge Management

Page 25: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Knowledge Indexing & Knowledge Indexing & Extraction: GoalsExtraction: Goals Domain model Ontology (W3C OWL standard)

– Describes the concepts of the domain

Domain vocabulary Semantic Network– Describes the lemmas of the domain

Mapping model Stochastic model– 2° order HMM-inspired model– Transition probs approximated by means of MaxEnt

models– Solves mapping ambiguities

Queries:– Keyword-based (AND/OR; max probability/exaustive)– Phrase-based (Disambiguated Word queries and

Ontological queries)

Page 26: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Knowledge indexing & Knowledge indexing & extraction: Functionalitiesextraction: Functionalities

Training Indexing, querying, and extending

Page 27: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Knowledge indexing & Knowledge indexing & extraction: Information extraction: Information Extraction EngineExtraction Engine

Training Indexing, querying, and extending

Linguistic Context Extractor:– Calls linguistic tools (Stanford

Parser, FreeLing, JavaRAP,…)

– words Wi (lemmas Li , linguistic context information Ii )

MaxEnt Models:– Calculates HMM

transition probabilities (takes in account the linguistic context info)

Extended Viterbi:– (Li , Ii) concepts Ci

TF-IDF:– Document ranking,

based on concept frequencies

Page 28: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Art deco Wine Domain OntologyArt deco Wine Domain Ontology

Page 29: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Keyword-based queriesKeyword-based queries

Sequence of isolated words – No linguistic structure

Exhaustive AND/OR keywords– No concept disambiguation– Searches for multiple tuples– Example: light wine several meanings found…

country wine search for instances…

taste wine search for subclasseses…

Max probability AND/OR keywords– Searches for a single tuple– Exploits the a-priori concept probabilities– Example: [light wine] max probability meaning

Page 30: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Phrase-based queriesPhrase-based queries

Phrase– Linguistic structure– Context-based disambiguation

Disambiguated Word queries– Context used for concept disambiguation

• Index the phrase ( extract concepts)• Search for AND-ed concepts

– Example: (fruit taste) disambiguates fruit Ontological queries

– Context used to select the request to the ontology• Indexes the sentences• Select the request; searches the ontology for the mapped

concepts

– Example: “type of tannins in wine” instance list

Page 31: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

GIALLOROSSO PERFORMS MARKET ANALYSES GIALLOROSSO PERFORMS MARKET ANALYSES BY ACCESSING ITS OWN INFORMATION BY ACCESSING ITS OWN INFORMATION COMBINEDCOMBINED WITH MARKET INFORMATION WITH MARKET INFORMATION COLLECTED BY ITS ALLY BIANCOROSSOCOLLECTED BY ITS ALLY BIANCOROSSO

Page 32: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

The Integration problemThe Integration problemfrom the user point of viewfrom the user point of view

DATA SOURCE 1(RDBMS)

DATA SOURCE 2(XML)

DATA SOURCE 3(WWW)

GLOBAL KNOWLEDGE INTERFACE

query answer

DATA SOURCE 4(Base station)

User

APPLICATION

Page 33: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Information integration in ART Information integration in ART DECO DECO

Page 34: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Knowledge retrieval from the Knowledge retrieval from the sourcessourcesIn order to integrate the two original

sources, we define the following query to populate the ontology:

PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?w1 ?w2 ?wn1 ?wn2 ?wb ?bq ?dse ?dso ?snFROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {

?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bq .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wn2 .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?sn .

}

Page 35: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Query 1Query 1Quantity of bottles (in the GialloRosso DB)

available for each wine cited by the web source “Percorsi di Vino” (stored in the BiancoRosso DB):PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?wine_name sum(?bottle_quantity)FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {

?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bottle_quantity .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wine_name .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?source_name .

FILTER regex(?source_name, “PercorsiDiVino")FILTER fn:contains(?wine_name, ?wn1)

}

GROUP BY ?wine_name ?source_name

Page 36: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Query 2Query 2Which sources (from BiancoRosso) cite wines of which

we (GialloRosso) have at least a bottle available?

PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?wine_name ?source_nameFROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {

?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bottle_quantity .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wine_name .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?source_name .

FILTER (?bottle_quantity > 0)FILTER fn:contains(?wine_name, ?wn1)

}

GROUP BY ?wine_name ?source_name

Page 37: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Q & AQ & A

Q & A(If you see this slide we’ve not run out of

time)

Page 38: Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

Part 3 of the bookPart 3 of the book

Ontology-based knowledge elicitation: an architecture (Chapter editor Licia Sbattella, Roberto Tedesco, Giorgio Orsi, Politecnico di Milano, Marcello Montedoro, IBM Italia)

Knowledge extraction from Natural Language (Chapter editor Licia Sbattella, Roberto Tedesco, Politecnico di Milano)

Knowledge extraction from event flows (Chapter editor Alberto Sillitti, Università di Bolzano)

Context-aware knowledge querying in a networked enterprise (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Fabio A. Schreiber, Politecnico di Milano, Teresa Baldassare, Università di Bari)

On-the-fly and Context-Aware Integration of Heterogeneous Data Sources (Chapter editors Giorgio Orsi, Letizia Tanca, Politecnico di Milano)

A methodology for context-driven data-warehouse design (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Letizia Tanca, Politecnico di Milano)