automatic data transformation|breaching the walled gardens...

10
Automatic Data Transformation—Breaching the Walled Gardens of Social Network Platforms Martin Wischenbart 2 Stefan Mitsch 1 Elisabeth Kapsammer 1 Angelika Kusel 1 Stephan Lechner 3 Birgit Pr¨ oll 1 Werner Retschitzegger 1 Johannes Sch¨ onb¨ ock 2 Wieland Schwinger 1 Manuel Wimmer 2 1 Department of Cooperative Information Systems Johannes Kepler University, Linz, Austria, Email: {firstname.lastname}@jku.at 2 Business Informatics Group, Institute of Software Technology and Interactive Systems Vienna University of Technology, Vienna, Austria, Email: {lastname}@big.tuwien.ac.at 3 Netural GmbH, Linz, Austria, Email: [email protected] Abstract Although many social networks on the Web allow ac- cess via dedicated apis, the extraction of instance data for further use by applications is often a tedious task. As a result, instance data transformation to Linked Data in the form of owl, as well as the inte- gration with other data sources, are aggravated. To alleviate these problems, this paper proposes a model- driven approach to overcome data model heterogene- ity by automatically transforming schemas and in- stance data from json to owl/xml, utilizing the se- mantic features of owl and Jena inference rules. We present a prototypical implementation on the basis of the Eclipse Modeling Framework. This implementa- tion is applied and evaluated on data from Facebook, Google+, and LinkedIn. Finally, we provide prospects for semantic integration and managing evolution, as well as a discussion of how to generalize our approach to other domains and transformations between arbi- trary technical spaces. Keywords: Schema and instance transformation, data model heterogeneity, model driven approach, social network data integration, json to rdf & owl 1 Introduction In recent years, online social networks have gained great popularity amongst internet users. These net- works serve different purposes and communities, for instance, socializing on Facebook or Google+, or es- tablishing professional networks in LinkedIn 1 (Kim et al. 2010). Since users are members of several social networks, integrated profiles from multiple networks are desired to achieve a comprehensive view on users, which would, for instance, increase the quality of per- sonalized recommendations (Abel et al. 2011), or sup- port users’ search activities (Bozzon et al. 2012). In our research project TheHiddenU 2 we try to build such comprehensive user profiles in owl, enriching This work has been funded by the Austrian Federal Ministry of Transport, Innovation, and Technology (BMVIT) under grant FIT-IT 825070 and BRIDGE 832160. Copyright c 2013, Australian Computer Society, Inc. This pa- per appeared at the 9th Asia-Pacific Conference on Concep- tual Modelling (APCCM 2013), Adelaide, Australia, January- February 2013. Conferences in Research and Practice in Infor- mation Technology, Vol. 143. F. Ferrarotti and G. Grossmann, Eds. Reproduction for academic, not-for profit purposes per- mitted provided this text is included. them with machine learning and information extrac- tion methods, for use in recommender applications. On our platform, non-experts shall express transfor- mations of social network data to their preferred tar- get format. Furthermore, prime goals are to make the system transparent and trustworthy, by (i) providing provenance information whenever demanded, and (ii) by respecting and supporting users’ privacy needs at all times. For building comprehensive user profiles, in prin- ciple, existing user models may be reused as target schemas for the user profiles to be integrated, such as grapple (Aroyo & Houben 2010) or others (Vi- viani et al. 2010). However, in contrast to social net- works, these approaches often base on common on- tologies expressed in owl (foaf, etc. 3 ). Social net- works would benefit from using such ontologies, Se- mantic Web technologies, and Linked Data, for in- stance, to solve portability issues and to enable data reuse (Razmerita et al. 2009). Thus, to ultimately extract Linked Data and breach the walled gardens of social networks, the resulting difference in techni- cal spaces demands, as depicted in Fig. 1, that we first tackle technical, syntactic (cf. 1 ), and data model heterogeneity (cf. 2 ), before structural and semantic heterogeneity (cf. 3 ) can be resolved, to finally build integrated user profiles that are (i) com- plete, concise, and consistent (Bleiholder & Naumann 2009), (ii) within aligned ontologies (Parundekar et al. 2010), and (iii) in the form of Linked Data (Heath & Bizer 2011). Dividing these required transforma- tion steps helps to cope with evolution, and facilitates reuse across data sources, because changes are kept local (e.g., json replaces xml, while semantic map- pings remain unchanged). For handling all these kinds of heterogeneities, ex- isting tools in the research area of the Semantic Web, such as Virtuoso 4 , Triplr 5 , and Aperture 6 , can be ex- tended with components for user profile extraction. Typically, these components must be configured (i) on the schema level with respect to the target schema, which often is an existing ontology (e. g., foaf) com- plemented with a manually created one, and (ii) on the instance level with respect to transformation spec- 1 www.facebook.com, plus.google.com, www.linkedin.com 2 www.social-nexus.net 3 foaf: foaf-project.org, sioc: sioc-project.org, Relationship Ontology: vocab.org/relationship 4 virtuoso.openlinksw.com 5 triplr.org 6 aperture.sourceforge.net Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia 89

Upload: hoangthu

Post on 12-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Automatic Data Transformation—Breaching the Walled Gardens ofSocial Network Platforms

Martin Wischenbart2 Stefan Mitsch1 Elisabeth Kapsammer1

Angelika Kusel1 Stephan Lechner3 Birgit Proll1 Werner Retschitzegger1

Johannes Schonbock2 Wieland Schwinger1 Manuel Wimmer2

1 Department of Cooperative Information SystemsJohannes Kepler University, Linz, Austria, Email: {firstname.lastname}@jku.at

2 Business Informatics Group, Institute of Software Technology and Interactive SystemsVienna University of Technology, Vienna, Austria, Email: {lastname}@big.tuwien.ac.at

3 Netural GmbH, Linz, Austria, Email: [email protected]

Abstract

Although many social networks on the Web allow ac-cess via dedicated apis, the extraction of instancedata for further use by applications is often a tedioustask. As a result, instance data transformation toLinked Data in the form of owl, as well as the inte-gration with other data sources, are aggravated. Toalleviate these problems, this paper proposes a model-driven approach to overcome data model heterogene-ity by automatically transforming schemas and in-stance data from json to owl/xml, utilizing the se-mantic features of owl and Jena inference rules. Wepresent a prototypical implementation on the basis ofthe Eclipse Modeling Framework. This implementa-tion is applied and evaluated on data from Facebook,Google+, and LinkedIn. Finally, we provide prospectsfor semantic integration and managing evolution, aswell as a discussion of how to generalize our approachto other domains and transformations between arbi-trary technical spaces.

Keywords: Schema and instance transformation, datamodel heterogeneity, model driven approach, socialnetwork data integration, json to rdf & owl

1 Introduction

In recent years, online social networks have gainedgreat popularity amongst internet users. These net-works serve different purposes and communities, forinstance, socializing on Facebook or Google+, or es-tablishing professional networks in LinkedIn1 (Kim etal. 2010). Since users are members of several socialnetworks, integrated profiles from multiple networksare desired to achieve a comprehensive view on users,which would, for instance, increase the quality of per-sonalized recommendations (Abel et al. 2011), or sup-port users’ search activities (Bozzon et al. 2012). Inour research project TheHiddenU2 we try to buildsuch comprehensive user profiles in owl, enriching

This work has been funded by the Austrian Federal Ministry ofTransport, Innovation, and Technology (BMVIT) under grantFIT-IT 825070 and BRIDGE 832160.

Copyright c©2013, Australian Computer Society, Inc. This pa-per appeared at the 9th Asia-Pacific Conference on Concep-tual Modelling (APCCM 2013), Adelaide, Australia, January-February 2013. Conferences in Research and Practice in Infor-mation Technology, Vol. 143. F. Ferrarotti and G. Grossmann,Eds. Reproduction for academic, not-for profit purposes per-mitted provided this text is included.

them with machine learning and information extrac-tion methods, for use in recommender applications.On our platform, non-experts shall express transfor-mations of social network data to their preferred tar-get format. Furthermore, prime goals are to make thesystem transparent and trustworthy, by (i) providingprovenance information whenever demanded, and (ii)by respecting and supporting users’ privacy needs atall times.

For building comprehensive user profiles, in prin-ciple, existing user models may be reused as targetschemas for the user profiles to be integrated, suchas grapple (Aroyo & Houben 2010) or others (Vi-viani et al. 2010). However, in contrast to social net-works, these approaches often base on common on-tologies expressed in owl (foaf, etc.3). Social net-works would benefit from using such ontologies, Se-mantic Web technologies, and Linked Data, for in-stance, to solve portability issues and to enable datareuse (Razmerita et al. 2009). Thus, to ultimatelyextract Linked Data and breach the walled gardensof social networks, the resulting difference in techni-cal spaces demands, as depicted in Fig. 1, that wefirst tackle technical, syntactic (cf. 1 ), and datamodel heterogeneity (cf. 2 ), before structural andsemantic heterogeneity (cf. 3 ) can be resolved, tofinally build integrated user profiles that are (i) com-plete, concise, and consistent (Bleiholder & Naumann2009), (ii) within aligned ontologies (Parundekar etal. 2010), and (iii) in the form of Linked Data (Heath& Bizer 2011). Dividing these required transforma-tion steps helps to cope with evolution, and facilitatesreuse across data sources, because changes are keptlocal (e.g., json replaces xml, while semantic map-pings remain unchanged).

For handling all these kinds of heterogeneities, ex-isting tools in the research area of the Semantic Web,such as Virtuoso4, Triplr5, and Aperture6, can be ex-tended with components for user profile extraction.Typically, these components must be configured (i)on the schema level with respect to the target schema,which often is an existing ontology (e. g., foaf) com-plemented with a manually created one, and (ii) onthe instance level with respect to transformation spec-

1www.facebook.com, plus.google.com, www.linkedin.com2www.social-nexus.net3foaf: foaf-project.org, sioc: sioc-project.org,

Relationship Ontology: vocab.org/relationship4virtuoso.openlinksw.com5triplr.org6aperture.sourceforge.net

Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia

89

Page 2: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Ls1: FB Schema

Ls2: LI Schema

Ls3: G+ Schema

conforms to

Lss‘: OWL

Ls1‘: FB T-Box

Ls2‘: LI T-Box

conforms to

Transformation (data model heterogeneity) 2

Ls1‘‘: Integrated T-Box (e.g. FOAF)

Lss: JSON Schema

Integration (structural and semantic heterogeneity) 3

Lx2: LI Instance

Lx3: G+ Instance

Lx1‘: FB A-Box

Lx2‘: LI A-Box

Lx3‘: G+ A-Box

Lx‘‘: Integrated A-Box

Lx1: FB Instance

Extr

actio

n (t

echn

ical

and

sy

ntac

tic h

eter

ogen

eity

)

Ls3‘: G+ T-Box

Existing tools (Example: Virtuoso Sponger cartridge for Google+) Legend Graph API REST API G+ API 1

M2

M1

M0

Figure 1: Overview of heterogeneities during user profile integration

ifications between the source schema of the extracteddata and the desired target schema, in order to trans-form instances. Often, such components mix the res-olution of data model, structural, and semantic het-erogeneities in a single transformation step, therebyaggravating maintenance and modifications. Further-more, in the face of new social networks arising fre-quently and evolution of existing ones (i. e., changesof schemas and apis), manual creation of schemasand transformation specifications for all possible datasources is not an adequate option, not least sinceschemas may be extensive in size.

Existing automated transformation approaches,for instance, by (Atzeni et al. 2005) or our previ-ous work (Kapsammer et al. 2012), focus on resolv-ing data model heterogeneity on the schema level.They extract schemas in the source technical spaceand transform them into schemas of the desired tar-get technical space. Thus, in this paper, we com-plement the above approaches with a model-drivenone, automating the configuration of instance datatransformation processes, with a focus on resolvingdata model heterogeneity and a clear separation fromother tasks during integration and from the resolv-ing of other kinds of heterogeneities. We assume,that technical heterogeneities are already resolved, forinstance, by building appropriate adaptors employ-ing social network apis (e. g., using http, OAuth,etc.). For resolving structural and semantic hetero-geneities later on, established integration techniquesmay be utilized, which are supported by general pur-pose modeling tools, such as Enterprise Architect7,and by a multitude of dedicated integration tools,such as coma++ (Massmann et al. 2011) for simi-larity matching, or MapForce8 for mapping.

This model-driven approach is applied to instancedata transformation between json data from socialnetworks as source, and owl, as proposed by theLinked Data Initiative (Heath & Bizer 2011), astarget schema. json was used due to its popularityin social networks (also, existing approaches did notconsider it much yet). owl was chosen over rdf tobe extensible for integration mappings to consolidateprofiles. Exploiting the semantic features of owl,generic instance transformation specifications maybe applied, which are independent of concrete socialnetwork schemas and owl ontologies.

7www.enterprisearchitect.at8www.altova.com/mapforce

Structure of the paper. In the next section, wediscuss related research, before in Section 3 and Sec-tion 4 we present our approach and an implementa-tion thereof using the Eclipse Modeling Framework.In Section 5 we discuss results and lessons learnedfrom the application on actual user profiles from Face-book, Google+, and LinkedIn. Finally, in Section 6we give an outlook on integration of user profile data,present prospects for handling evolution of sourcesand the propagation of changes, and discuss the gen-eralization of the approach for arbitrary source andtarget technical spaces beside json and owl.

2 Related Work

In this section, we discuss closely related work withrespect to instance transformation between json andowl for overcoming data model heterogeneity, ontol-ogy engineering, as well as related approaches fromthe data and model engineering domains.

Concerning the specific transformation of jsondata into rdf graphs or owl ontologies, most closelyrelated work can be found in terms of so-calledrdfizers. Such rdfizers are available for a plethoraof different sources, ranging from specific websitesover social networks to relational databases and plainfiles. rdfizers are implemented, for instance, as partof Virtuoso (an enterprise data integration server),Aperture for extracting and querying several informa-tion systems in the Semantic Desktop project (Dengel2007), in d2r and r2rml for publishing relationaldatabases to the Semantic Web (Bizer & Cyganiak2006, Das et al. 2012), for transforming various mi-croformats into rdf using, for instance, Triplr orAny239, as well as for extracting information fromthe Old rest api of Facebook (Rowe & Ciravegna2008) and from Twitter (Mendes et al. 2010). Allthese frameworks either must be configured manu-ally with respect to data extraction, transformationto rdf, and integration into common ontologies, orsimply utilize hard-coded rules for this purpose (spec-ified manually). Also, these approaches often mix theresolving of data model, structural, and semantic het-erogeneities, incurring the disadvantages already dis-cussed above. For example, in Virtuoso, so-calledcartridges are responsible for extracting data fromvarious sources, and for transforming them into rdfgraphs. These graphs utilize the vocabularies pro-

9developers.any23.org

CRPIT Volume 143 - Conceptual Modelling 2013

90

Page 3: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

vided by various target ontologies, for instance, foafand a complementing Facebook ontology provided byOpenLink, which captures the concepts of Facebooknot present in foaf. For this, Virtuoso requires thetarget ontologies and xslt definitions of transforma-tions to these target ontologies, which are often spec-ified manually. Semion (Nuzzolese et al. 2010) goesbeyong such triplifiers by focusing on a customiz-able triplification process. Keeping in mind that datasource schemas may frequently change, creating con-figurations, hard-coded rules, and target ontologiesmanually is a laborious and error-prone task. Ourapproach, in contrast, separates overcoming differentkinds of heterogeneities into sequential transforma-tion and integration steps, also aiming at automatingthese steps.

Addressing the challenges of ontology engineering,the application of model driven architectures for de-velopment of Semantic Web ontologies has been pro-posed by (Gasevic et al. 2009). An in-depth dis-cussion of meta-models in combination with ontolo-gies for software engineering was done by (Henderson-Sellers 2011), including extensive related work. Themethod by (Cranefield & Pan 2007) employs Jenarules to create rdf from mof-based models, whereas(Gasevic et al. 2007) use xslt to generate owl fromuml. The approach of (Glimm et al. 2010) uses owl2for meta-modeling, specifying class constraints andrelationships as owl axioms, and synchronizing themwith individuals’ role assertions. Our approach com-bines and generalizes these ontology engineering con-cepts into a model-driven approach that may be con-figured to arbitrary source and target technical spaces.

Concerning the individual steps in the transfor-mation process, in particular with respect to schemaand instance transformation, in the data engineer-ing domain a considerable amount of research hasbeen conducted. Existing generic approaches mapdifferent schemas of the same technical space and ex-change data between these schemas (e. g., cf. (Doan& Halevy 2005, Fagin et al. 2009, Legler & Naumann2007) for surveys on such approaches). More spe-cific approaches (i) map between relational and xmlschemas and instances, cf. (Yahia et al. 2004), (ii)map structured sources into rdf, such as (Knoblocket al. 2012, Speiser & Harth 2010), (iii) transform xmlto json or xml to owl/rdf (Bohring & Auer 2005,Cardoso & Bussler 2011, Kobeissy et al. 2007, Bischofet al. 2011), and (iv) align the individuals of differ-ent ontologies, such as (Noy 2004). To date, how-ever, most approaches rely on manually created andspecifically tailored transformation specifications andas a consequence, are vulnerable to evolution of sourceand target schemas. The requirements mentionedabove (schema and metamodel evolution, platform-independent transformations, etc.) that drove ourchoice of a model-driven architecture pare of minorimportance in these references.

Being applicable also in presence of schema evo-lution, especially interesting are the approach of(Bohring & Auer 2005) and clio (Fagin et al. 2009,Haas et al. 2005). These approaches generate trans-formation code in the form of xslt, XQuery, and sqlqueries (depending on the source and target technicalspace) in order to overcome technical and syntactic,data model, structural, and semantic heterogeneityin a single step. The ideas of such transformationcode generation are the basis for the instance transfor-mation step in our model-driven transformation ap-proach for social network data integration, which, asalready noted above, separates overcoming differentkinds of heterogeneities into sequential steps.

Concerning schema and instance transformation

in the model engineering domain, the general ideaof bridging several modeling layers within one ap-proach has been presented, for instance, in the well-known work by (Atzeni et al. 2005). For bridging themeta-model layer, a so-called supermodel has beenproposed allowing to transform schemas between dif-ferent technical spaces, such as oo, or, er, uml, andxsd. For actually transforming the corresponding in-stances thereof, so-called down functions have beenrealized (Atzeni et al. 2006), allowing to transformmeta-model translation rules down to instances. Incontrast, we exploit the inference capabilities of owl,and thus, need not create instance transformationsfrom schema transformations.

3 Architectural Overview

An initial overview of required steps for resolvingall kinds of heterogeneities was shown in Fig. 1above. In order to provide transformations indepen-dently of input (e. g., json) and output formats (e. g.,owl/xml), we propose a model-driven approach tobridging data model heterogeneity. In the follow-ing, the approach is detailed by means of a sourcejson document, as extracted from a social network,and a target owl/xml document, including the cor-responding transformation of schemas. The proposedmodel-driven transformation process for resolving thisdata model heterogeneity is depicted in Fig. 2, anddiscussed in the following.

Meta-modeling layers and transformations.Our approach anchors the transformations of schemasand instances along the four meta-modeling layers ofmof (Object Management Group 2011). The bottomlayer (m0) describes actual instances (e. g., user pro-file instance data from Facebook in json or in an owlontology A-Box). These instances conform to modelsat the model layer (m1, e. g., the Facebook Graphapi’s schema and json in general, or its representa-tion as Facebook T-Box). In turn, these models con-form to so-called meta-models (m2, e. g., json Schemaas a language for defining the schemas of json doc-uments or owl as language for defining ontologies inthe Semantic Web technology stack). Finally, meta-models are described in terms of a meta-meta-model(m3), such as Ecore10.

In this four-layer representation, transformationsfor bridging technical spaces on a particular layer arealways specified on the superior layer: thus, transfor-mations on the m1 layer (e. g., from Facebook schemato a Facebook T-Box) are specified on the m2 layer,and transformations on the m0 layer (e. g., from Face-book instances to individuals in a Facebook A-Box)are specified on the m1 layer, as depicted in Fig. 2.

Schema and instance transformation. Forautomatically transforming schemas, a transforma-tion tss : Lm2 → Lm2′ has to be specified, for in-stance, from json Schema to owl T-Box axioms,which allows to transform the corresponding schemas.For instance, a Facebook schema may be transformedto an according T-Box by executing the transforma-tion specification, i. e., Lm1s′ = tss(Lm1s).

For actual instance data, existing approaches re-quire specific instance transformation specificationstsis : Lm1s → Lm1s′ between pairs of source Lm1s andtarget Lm1s′ models. Thus, to automatically trans-form Facebook instance data into Facebook A-Boxaxioms, a transformation specification between theFacebook Schema and the Facebook T-Box would berequired, and for transforming data from LinkedIn

10Ecore is the realization of mof in the Eclipse Modeling Frame-work (emf) www.eclipse.org/modeling/emf

Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia

91

Page 4: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Lm3 (e.g., Ecore)

Lm1s (e.g. FB Schema)

Lm2‘(e.g.,OWL T-Box)

Lm2(e.g., JSON Schema)

Lm1s‘ (e.g. FB T-Box)

Schema Transformation

Specification (tss)

Conceptual

Lm0 (e.g. FB Instance)

Lm1

execute

<Onto><Decl><Class IRI="#...

merge

Lm1g (e.g. JSON)

Lm1g‘(e.g. OWL A-Box)

Generic Instance Transformation

Specification (tsig)

<Onto><ClassAssertion><Class …

FB A-Box OWL/XML

seria

lize

<Onto><Decl><Class IRI="#...

FB T-Box OWL/XML

seria

lize

{"id": “3485“,"name": "Rich…

FB JSON Instance

{"type": "object","id": "user",

FB JSONSchema de

seria

lize

execute

dese

rializ

e

Lm0‘ (e.g. FB A-Box)

conforms to

M3

M2

M1

M0FB Ont.

OWL/XML

JSON Schema

OWL

merge

Figure 2: Overview of the model driven transformation process

and Google+, another two specifications would beneeded. We argue, that instead of multiple specific in-stance transformation specifications tsis, generic onestsig : Lm1g → Lm1g′ can be defined, if the sourceinstances carry certain partial schema informationwhen being interpreted in terms of a generic sourcelanguage Lm1g, and the target language Lm1g′ sup-ports dl and rule inference. Markup languages, suchas json and xml, used in today’s online social net-work apis, and owl satisfy this requirement.

In the next section, we discuss our transformationapproach for resolving data model heterogeneity be-tween json Schema and owl T-Box axioms on themodel layer (m1), as well as between json and owlA-Box axioms on the instance layer (m0). This trans-formation approach makes use of owl inference ca-pabilities to enable the specification of schema trans-formations on the meta-model layer, as well as in-stance transformations on the model layer, which re-place manually created transformation specificationsfor each kind of data source.

4 JSON to OWL Transformation

In principle, our approach as depicted in Fig. 2comprises two steps on the four-layer meta-modelingstack, transforming (i) json Schema to owl T-Boxaxioms, and (ii) json instances to owl A-Box axioms.Both these steps can be realized using model trans-formation techniques. The execution of a schematransformation specification, first, processes a sourcemodel (e. g., a Facebook schema deserialized from itstextual representation) to create a corresponding tar-get model (e. g., a Facebook T-Box, serializable intoa textual representation). Second, when executed,a generic instance transformation specification (i. e.,independent of Facebook Schema and T-Box) pro-cesses source instances (e. g., Facebook instances de-serialized from json responses of Facebook) to createthe target instances (e. g., a Facebook A-Box), whichare serialized into a textual representation, such asowl/xml. Finally, the result of the schema transfor-mation execution is merged with the instance trans-formation execution result into a coherent Facebook

ontology. The specifics of json and owl, which areexploited in these generic transformations, are de-tailed below. The actual transformation specifica-tions are given in Sect. 4.2.

4.1 JSON to OWL by Example

Let us consider user information from a social network(e. g., Facebook) extracted in json, as depicted inFig. 3. This snippet, which is rather simple for thesake of understandability, shows a json object witha single property name, whose value is ‘Jane Doe’.The json object conforms to a simplified Facebookschema, defining that every User is of type objectand comprises a property name.

Transformation of JSON Schema to OWLT-Box. In order to provide the Facebook schema ina T-Box (e. g., to support querying and reasoning), ina first transformation step, the schema is transformedinto corresponding owl T-Box axioms. These axiomsdefine that User is equivalent with the class of thingsthat have a name property of type String (in owl,this may be specified by the domain and range of adata property).

Transformation of JSON to OWL A-Box.In a second transformation step, the Facebook in-stance is transformed into corresponding owl A-Boxaxioms, which again are specified in dl notation. Asbasic schema information, the format of the nameproperty’s value in our sample json snippet allowsus to derive the property’s type: json distinguishesbetween string, number, boolean, object, and array.Further schema information, such as the concretetype of object (e. g., a person vs. an address), isnot available in the instances. Anyhow, this is whereowl, implementing the family SROIQ of descriptionlogic (Grau et al. 2008), is a perfect fit on the tar-get side: description logic reasoners are specificallydesigned to classify objects according to their roleassertions, and hence, are able to infer the schemainformation that is not explicitly present in json in-stances (e. g., given a sample T-Box axiom specifiedin description logic notation User ≡ ∃name.String,a description logic reasoner infers that anything with

CRPIT Volume 143 - Conceptual Modelling 2013

92

Page 5: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

{

"type": "object",

"id": "User",

"properties": {

"name": {"type": "string"}

}

} facebook_schema.schema

conforms to

facebook_t-box.owl

Thing(a).

name(a, "Jane Doe").

facebook_a-box.owl

transformed to

merged with

Facebook knowledge base

User(a).

String("Jane Doe").

asserted

asserted

inferredinference

transformed to{

"name": "Jane Doe"

} facebook_instance.json

Figure 3: Sample JSON to OWL transformation

at least one name is a user). Solutions to cope withpotential ambiguities will be discussed in Section 4.4below. As a result, the instance transformation spec-ifications do not necessarily need information from aconcrete model (e. g., a Facebook schema) for trans-formation. Thus, all json objects can be transformedinto generic concept assertions of the kind Thing(a)(instead of specific ones, such as User(a)). In asimilar manner, every value of a primitive property(e. g., ‘Jane Doe’) can be transformed into a con-cept assertion of the kind Literal. Finally, theconnections between objects and the values of theirprimitive and complex properties (e. g., the fact thatour sample user has the name ‘Jane Doe’) have tobe transformed into role assertions (e. g., name(a,‘Jane Doe’)). In case that a primitive or complexproperty allows an array of values, every array ele-ment must be represented with a corresponding roleassertion.

Merging of OWL T-Box and A-Box. Whenbeing merged with the T-Box axioms, a descriptionlogic reasoner, such as HermiT11, infers concept asser-tions, as explained above. As a result, we can specifyall instance transformations tsig : Lm1g → Lm1g′ be-tween json as source model and owl A-Box axiomsas target model in a generic manner. The specifica-tions for both, generic schema and instance transfor-mation, are detailed below.

4.2 Transformation Specifications

For specifying the actual schema and instance trans-formations, one may resort to existing (model) trans-formation languages, such as atl (Jouault et al.2008), qvt (Object Management Group 2009), orxslt. Since existing tools for publishing social net-work data in rdf format use various transformationlanguages, we utilize our Mapping Operator language(MOps) (Wimmer et al. 2010b), which is designed asa suitable basis for creating transformation specifica-tions in different kinds of transformation languages.Also, in our prototype, we use an executable imple-mentation of MOps to perform the actual transfor-mation. The basic building blocks of the MOps lan-guage for specifying transformations are so-called ker-nel MOps, which are composed to reusable higher-level transformations, denoted as composite MOps.Kernel MOps, their interplay, as well as the composi-tion to composite MOps are explained below, as partof their application to schema and instance transfor-mation specification. For a detailed description ofkernel and composite MOps we refer to (Wimmer etal. 2010a,b).

11hermit-reasoner.com

Schema transformation specification. Thespecification of schema transformations from jsonSchema to owl T-Box using MOps is depicted inFig. 4. Each MOp has input ports for accepting inputon the left side and output ports for producing targetobjects on the right side. Ports are typed to classes(C) or attributes (A).

The json Schema meta-model is a subset of thejson Schema Internet Draft12, which was selected forease of presentation in a straightforward manner. Forthe owl T-Box meta-model, we based on omg’s On-tology Definition Metamodel13. Note, that for pre-sentation purposes, the owl T-Box meta-model wassimplified, i. e., axioms for domain and range of prop-erties are modeled as relationships (instead of sub-classes of Axiom).

In principle, every source element (Schema andProperty) results in a target Declaration, as in-dicated by two Copier MOps in Fig. 4. The kindof declared entity depends on the type of the trans-formed source Schema: (i) complex schemas (typehas value “object” or “array”) can best be repre-sented as instances of Class in owl, while (ii) primi-tive schemas naturally become instances of Datatype.Since the topmost complex schema must be addition-ally transformed into an instance of Ontology (everyowl ontology is represented with one such instance),a composite MOp in terms of a horizontal partitioneris used. Such an HPartitioner comprises severalCondCopiers, which restrict the output of a Copierto a subset satisfying a certain condition—in our casean ontology is only output for the topmost schema,and classes are only output for those schemas withtype “array” or “object”. For transforming primitiveschemas (i. e., simple types), we create an instance ofDatatype for each distinct value of the type propertyin the source model (i. e., one datatype for “string”,one for “boolean”, and another one for “number”).Hence, the composite MOp ObjGenerator is used,which is depicted in white-box view, exposing thecomprised kernel MOps. It includes (i) an A2C ker-nel MOp (transforms the value of an attribute of thesource meta-model into a class instance of the targetmeta-model) for creating the instance of Datatype,and (ii) an A2A kernel MOp (copies an attribute valueinto another attribute value) for setting the iri (In-ternationalized Resource Identifier) of the newly cre-ated datatype. Since there are dependencies betweenthese kernel MOps (i. e., the iri can only be set af-ter the datatype has been created), the A2C MOp isadditionally linked to the A2A MOp through a traceport (T), which provides context information about

12tools.ietf.org/html/draft-zyp-json-schema-0313www.omg.org/spec/ODM/1.0

Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia

93

Page 6: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Schema

JSON Schema Metamodel OWL T-Box Metamodel

description:String id:String type:String

properties

0..*

Property

propertyName:String

type

items

0..*

Ontology

axioms 0..*

Class

Datatype

DataProperty

Mapping

JSON Schema -> OWL T-Box

{xor}

DataProperty

Domain DataProperty

Range

ObjectProperty

Range

ObjectProperty

Domain

Axiom

Entity

iri:String

range domain

domain

range

entity 1

1

1

1

1 C C

C

C C

C

C C

C A

T

2 A C

C

A A 2 A A

C C

C C

Declaration

ObjectProperty

Figure 4: Transformation of JSON Schema to OWL T-Box using MOps

the produced output objects. Finally, instances ofProperty must either be transformed to instancesof ObjectProperty (in case they reference a com-plex schema) or to DataProperty (in all other cases).Analogously to transforming instances of Schema, anHPartitioner can be used for that.

Instance transformation specification. Fig. 5summarizes the instance transformations betweenjson and owl. The meta-models were built on basisof the same specifications as those for schema trans-formation.

As already discussed in the transformation exam-ple above, json objects are transformed in a straight-forward manner to owl individuals of unspecifiedtype (i. e., Things). Hence, one Copier creates anIndividual for each Object, while another one cre-ates a ClassAssertion referring to the entity Thing.Node identifiers (iri) are generated, if present, fromnested key properties (“id”, “key”), or, otherwise,from the filename (for root objects) or employinga simple heuristic, which uses the hash value ofall nested members. Thereby, same objects can bematched (i. e., get equal identifiers), as long as theydo not contain another layer of nested objects. Inthat case, distinct iris are generated.

Analogously to the schema transformation speci-fication, on the instance level we distinguish betweenmembers of complex and of primitive type. Con-sequently, we utilize an HPartitioner to create anObjectPropertyAssertion for members with com-plex values and a DataPropertyAssertion for mem-bers with primitive values. These assertions must ref-erence the corresponding entities that were createdduring schema transformation (e. g., a “firstName”member must be transformed into an assertion ofthe corresponding “firstName” data property14). An-other Copier, again depicted in white-box view, cre-ates new Entity instances from Member instances (thename of the member serves as the entity’s iri). Duringmerging of T-Box and A-Box, iri equivalence ensuresthat the A-Box entities can be connected to theircounterparts in the T-Box. Finally, every primitivevalue (Boolean, Number, and String) must be copiedto an instance of Literal, but only if it was not cre-

14Keeping in mind our focus on resolving data model hetero-geneity, both the source and target technical space have the samestructure, and, thus, we can safely assume name equivalence be-tween json members and owl properties.

ated before (hence, the HMerger contains in fact threeCondCopiers).

4.3 Implementation in EMF

As implementation platform, we chose the EclipseModeling Framework (emf), including Xtext andXtend for text (de-)serialization and transformation,due to its maturity and large community support. Forimplementing MOps and transformations in Xtend,meta-models in Ecore are required. The meta-modelsfor json, json Schema, and owl were generated fromXtext grammars according to the specifications intro-duced above. These Ecore meta-models are automat-ically translated by emf into Java classes, which canthen be used in Xtend. To switch to a different sourcetechnical space — in the past Facebook and the Twit-ter streaming API switched from xml to json — ourimplementation would only require Xtext grammarsto generate the new meta-models for de-serializationto Ecore.

In order to utilize existing ontology tools and de-scription logic reasoners, the Ecore representations ofT-Box and A-Box resulting from applying the Xtendimplementation of our MOps, then, must be serialized(e. g., as rdf/xml or owl/xml) and merged. Again,Xtend is used for this purpose. For increased compat-ibility we implemented serializers for both, owl/xml,e. g., for loading in Protege15, and rdf/xml, as re-quired for Apache Jena16.

4.4 Type Inference & Reasoning

Loading the generated files in Protege enables the ap-plication of the included HermiT reasoner for typeinference. Generic instance transformation specifi-cations not taking into account schema information,however, may result in ambiguities during classifica-tion by a description logic reasoner. First, classeshaving the same mandatory, but different optionalproperties, cannot be matched unambiguously, in casethat an object thereof is described in terms of themandatory properties, only. Second, equally namedand typed properties with different constraints intwo different classes cannot be distinguished (e. g., a

15protege.stanford.edu, used version: 4.2 beta (build 276)16jena.apache.org, used version: 2.7.0

CRPIT Volume 143 - Conceptual Modelling 2013

94

Page 7: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

JSON Metamodel OWL A-Box Metamodel

Member

name:String

0..* values

1 1

value

Ontology

axioms

0..*

Assertion

DataProperty

Assertion

ObjectProperty

Assertion

Mapping

JSON -> OWL A-Box

Array

Boolean

value:boolean

Value

Number

value:number

String

value:String

Object

JSON

1

{xor}

0..*

members

Literal

lexicalValue:String

Class

Assertion

Axiom

Entity

iri:String

Individual

iri:String

individual

class

object Property target

source

target

source data Property

1 1 1

1

1

1

1

1

C C

C C

C

C C

C C

C C

C

C

C C

T

2 C C

C

A A 2 A A

C C

Figure 5: Transformation of JSON to OWL A-Box using MOps

class AustrianAddress with a 4-digit zip code vs. aGermanAddress with 5 digits).

To alleviate this problem, we propose to encodeadditional meta information from instance data inan owl T-Box — in a similar manner as previ-ously shown for schema extraction (Kapsammer etal. 2012). For example, the property category ofFacebook pages (a key concept in Facebook data,used universally) may be used to define equiva-lences with specific classes: Restaurant ≡ Page u∃category.{restaurant}. Previously (Kapsammer etal. 2012), various heuristics for tackling this problemhave been proposed: for example, IdFromValue in-fers a class name from property values (e. g., “type”or “category” in Facebook, and property “kind” inGoogle+), while IdFromReferenceName infers a classname from the names of references to nested individu-als (e. g., used in LinkedIn). These different heuristicscan be exploited during instance transformation aswell. Therefore, each heuristic is encoded as a genericdeclarative Jena rule with accompanying built-ins forimperative computations. For instance, regardingconnections in Facebook, the api’s json response, ifrequested, contains an object metainfo, containingan array connections, which in turn contains mul-tiple properties, having uris as values. These arelinks for other requests to receive further data, butthey would actually resemble a connection betweenthe response’s root object, and the root object ofanother json object (e. g., a list of friends). ThisLinkPatternFromValue strategy can be expressed asa Jena rule, thereby providing a direct edge betweensuch objects (resulting in one edge per connection,instead of two intermediate nodes and three edges).As an alternative to built-ins, such definitions can beautomatically generated by our previously proposedschema extraction process (Kapsammer et al. 2012)in terms of T-Box axioms, or as reasoning rules for se-mantic web rule reasoners (e. g., Jena inference rules).

5 Results & Evaluation

In this section, we evaluate our prototypical imple-mentation by means of transforming data from Face-

book, Google+, and LinkedIn to corresponding owlontologies. Thereby, we will discuss several aspects:completeness, consistency, conciseness, as well as per-formance and scalability.

Evaluation Setup. In order to obtain compa-rable results, equal test user profiles were created ineach of the selected social networks and extracted viatheir api. These profiles contain basic user informa-tion, jobs, education, as well as a connected friendfor direct communication and interaction within agroup. Note, that these data sets, since they werecreated manually, do not reflect the complete informa-tion available in real social network profiles. Nonethe-less, as they were created in a consistent manner,they are suited for a first evaluation. (cf. (Kap-sammer et al. 2012) for details on user profiles andgenerated schemas). The input data sets from Face-book, Google+, and LinkedIn as json files, as well asa simple introductory example, are available online17.Furthermore, the files include generated json Schemafiles, as explained by (Kapsammer et al. 2012), T-Boxand A-Box in Ecore format, the merged serializationthereof (in rdf/xml & owl/xml), as well as (forthe three social networks) Jena rules and reasoningresults.

Completeness. The completeness of the ex-tracted data from social network apis was alreadydiscussed previously (Kapsammer et al. 2012). Therequirement of all information from the json inputto be present in the output is fulfilled by the pro-posed transformation approach: all json objects, ar-rays, and simple types are transformed to owl (i. e.,all input is present in output). This was evaluated bymanually comparing the input to the generated in-stances, object properties, and datatype properties,on multiple samples from all four data sets.

Consistency. To evaluate the consistency of thetransformations, we compared the outputs of multi-ple runs on the same input data. First, the more orless random serialization order of assertions is not aproblem, since Protege shows classes and individualsin alphabetical order anyway. Second, as discussed

17social-nexus.net/publications

Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia

95

Page 8: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Table 1: Description of input, output, and execution times—including count of files and individuals, file sizes(plain/zip-compressed), generated ids and id mismatches, as well as average execution times for transformationsand reasoning.

Input Simple Ex. Facebook Google+ LinkedInNumber of input instance files (JSON) 1 192 11 21Size of input files (JSON Schema + JSON) plain/compressed 1/1 kB 399/128 kB 31/12 kB 41 (14) kBOutput Simple Ex. Facebook Google+ LinkedInNumber of individuals (in A-Box) 4 1549 104 260Size of output files (T-Box, A-Box & inferred triples in RDF/XML) 13/2 kB 626/63 kB 58/6 kB 113/9 kBNumber of individuals without unique ID from JSON input 2 1078 64 187Mismatches of generated IRI for objects with equal content 0 53 4 25Transformation & Reasoning Time Simple Ex. Facebook Google+ LinkedInSum of transformation & reasoning time 196 ms 8621 ms 2198 ms 3186 msJSON Schema to OWL T-Box Ecore 45 ms 107 ms 64 ms 67 msJSON to OWL A-Box Ecore 62 ms 2541 ms 282 ms 559 msOWL Ecore to RDF/XML & OWL/XML 90 ms 2322 ms 397 ms 774 msJena Rule Reasoning on RDF/XML - 3651 ms 1455 ms 1785 ms

earlier, for individuals without a unique id from thejson input, an identifier has to be generated. Thesegenerated iris may differ between runs, but are al-ways consistent within a generated ontology. In thiscontext, Table 1 also counts mismatches of generatediris for objects with equal content (i. e., the heuris-tic fails, if json objects contain nested objects, asdiscussed in Sect. 4.2).

Conciseness. Comparing the file size for jsoninput and owl output, Table 1 shows that plainrdf/xml files are larger than the json counter-parts. Zip-compression, however, works much moreefficiently for the rdf/xml format.

Regarding the conciseness of transformed outputitself, we compare the generated serialization to theminimal representation of the same facts. Such a min-imal owl A-Box would contain one assertion for eachindividual, datatype property, and object property.In this sense, the generated rdf/xml is not mini-mal, as we assert all individuals as Things, and thereasoner adds inferred assertions with more specifictypes. For properties, however, the transformationis minimal. Concerning the json tree structure, inwhich data is provided by social network apis, thelength of paths is crucial for navigation. Our im-plementation converts these trees, being in fact aspecial case of graphs, to tree-like ontology graphs,which then allow the introduction of shortcuts (e. g.,for Facebook connections, as discussed in Sect. 4.4).On one hand, these shortcuts represent additional as-sertions, but, on the other hand, materializing theseshortcuts enable faster queries that are also easier toexpress.

Performance & Scalability. To evaluate thetime complexity of our approach, we measured theexecution times18 for the different data sets in ourprototype, as shown in Table 1. From these measure-ments we can observe that the size of the json Schemacorrelates with its transformation time to Ecore. Thesame applies for the transformation of json instancesto Ecore. Concerning the Jena inference step, clearlythe reasoning takes most of the time in the overallprocess, with the benefit of being able to infer infor-mation that is not present in the source data explic-itly. Note, that the times in the above table includefile I/O. In a separate run without those json filesfrom the Facebook input, which contain almost no

18average of at least 5 runs, on a notebook PC (Intel i7-Q740, 8GB RAM, running Windows 7 64-bit)

members (i. e., empty objects and arrays), the trans-formation of instances to Ecore was sped up by 37%,with remaining 92% of individuals within one third offiles. Thus, to make the transformation more efficient,all steps may be integrated into a single applicationwith reduced file system access for input and interme-diate files. Concerning scalability, the average trans-formation times for larger inputs grow, from a certainpoint, linearly (transformation & serialzation), andexponentially for the reasoning step. However, usingspecific rules that do not require imperative compu-tations (cf. Sect. 4.4), the magnitude would probablybe reduced. On average, for 62.000 individuals plusproperties, the transformation to Ecore took 37 sec-onds, the Jena inferencing took 184 seconds.

Finally, memory complexity was measured interms of ram consumption. Whereas for transforma-tion of these 62.000 individuals plus properties theJava process consumed 1025 mb, growing linearly, forthe reasoning it peaked at 335 mb, almost constantly(i. e., almost no growth).

6 Conclusion and Future Work

We presented an approach for model driven trans-formation of schemas and instances between differ-ent technical spaces. Our method requires the trans-formation specification to be done only once, for in-stance, from json Schema to owl T-Box axioms.This specification can then be executed for differentdata sources, such as different social networks provid-ing json data via their apis. Neglecting semantics,json’s tree-like structure can be transformed to owlin a straightforward manner (e.g., json slots of simpledatatype as datatype properties, complex types andarrays as object properties). To go beyond syntacticequivalence, and to consider semantics of apis, someconfiguration was required, which obviously dependson the data source. For instance, semantic equiva-lence from json slots is not generally possible, butin Facebook id slots can be used to define semanticequivalence.

In the evaluation section we applied our approachto comparable user profiles from Facebook, Google+,and LinkedIn. Not surprisingly, time and memorycomplexity were relatively high (especially for reason-ing and for larger user profiles), but clearly there ispotential for optimizations, and an evaluation on ex-tensive data sets would be interesting for future work.

CRPIT Volume 143 - Conceptual Modelling 2013

96

Page 9: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Our focus was on the pre-requisite steps for dataintegration and consolidation from different socialnetworks: the goal was to overcome data model het-erogeneity, in order to facilitate structural and se-mantic integration later on. In contrast to relatedtransformation approaches, using model driven archi-tectures allows to build graphical editors and to copewith evolution, for instance when apis change. Also,they enable source/target formats exchange with-out influencing transformation rules, and platform-independent MOps allow replacing the transforma-tion platform.

Generalization to arbitrary source and tar-get models. For generalizing the presented approachto technical spaces other than json as source andowl as target, several modifications need to be takeninto account. Without the reasoning capabilities ofowl and Jena rules, instances need to be loadedaccording to their concrete source schema (insteadof some generic meta-model), therefore, preventinggeneric instance transformations that are indepen-dent of source models. As a consequence, firstly, ded-icated (de-)serializers for every single source and tar-get schema are required, and, secondly, the generic in-stance transformation specification (specified on themodel layer) must be replaced with specific onesfor each social network schema. In order to auto-mate this whole process, the deserializers for sourceschemas should be generated automatically. Further-more, the instance transformation specifications onthe model layer are foreseen to be created automat-ically as artifacts as well—just like the target modelclasses are generated—during the execution of thetransformation from source to target model (speci-fied on the meta-model layer). This means, a soletransformation specification on the meta-model layermay result in potentially many transformation spec-ifications on the model layer. However, as a resultof the limitations of current model engineering soft-ware frameworks (specifically, the fact that Eclipsemodeling spans three of the four meta-modeling lay-ers, only), the models created during schema transfor-mation must be lifted to the meta-model layer first.This means, that instances in the schema transforma-tion specifications must become models in the instancetransformation specification.

Semantic integration of schemas. Having re-solved data model heterogeneity between differentsocial networks by transformation to owl, instancebased schema matching tools, such as coma++(Massmann et al. 2011), may be used to supporthumans in defining semantic correspondences. Suchtool support is especially helpful for large or unknownschemas. However, unlike transformations for resolv-ing technical heterogeneity, which were shown in thispaper to be specifiable in a generic manner, transfor-mations resolving semantic heterogeneity still needmanual intervention. Therefore, we can resort toSemantic Web technologies to integrate the trans-formed user models in owl/xml, for instance, bydefining equivalences between the owl classes Userfrom Facebook and Person from foaf. Again, em-ploying reasoners allows to retrieve instances mate-rialized as Facebook A-Box from queries using thefoaf vocabulary.

Source Schema Evolution. Once such map-pings are specified, an especially interesting questiontherefore is, how to automate or support evolutionof these transformations. A first idea in this direc-tion is the design of a meta-model of possible schemachanges in emf, including generic operations to prop-agate changes to dependent artifacts, such as queriesand integration rules.

References

Abel, F., Araujo, S., Gao, Q. & Houben, G. J. (2011),Analyzing cross-system user modeling on the socialweb, in ‘Proc. of the 11th int. conf. on Web engi-neering’, ICWE’11, Springer, pp. 28–43.

Aroyo, L. & Houben, G.-J. (2010), ‘User model-ing and adaptive Semantic Web’, Semantic Web1(1), 105–110.

Atzeni, P., Cappellari, P. & Bernstein, P. A. (2005),ModelGen: Model Independent Schema Transla-tion, in ‘Proc. of the 21st Int. Conf. on DataEngineering’, ICDE ’05, IEEE Computer Society,pp. 1111–1112.

Atzeni, P., Cappellari, P. & Bernstein, P. A. (2006),Model-Independent Schema and Data TranslationAdvances in Database Technology - EDBT 2006,Vol. 3896 of LNCS, Springer, pp. 368–385.

Bischof, S., Decker, S., Krennwallner, T., Lopes, N.& Polleres, A. (2011), Mapping between RDF andXML with XSPARQL, Technical report, DERI.

Bizer, C. & Cyganiak, R. (2006), D2R Server – Pub-lishing Relational Databases on the Semantic Web,in ‘Proceedings of the 5th International SemanticWeb Conference’.

Bleiholder, J. & Naumann, F. (2009), ‘Data Fusion’,ACM Comput. Surv. 41(1), 1–41.

Bohring, H. & Auer, S. (2005), Mapping XML toOWL Ontologies, in K. P. Jantke, K.-P. Fahnrich& W. S. Wittig, eds, ‘Leipziger Informatik-Tage’,Vol. 72 of LNI, GI, pp. 147–156.

Bozzon, A., Brambilla, M. & Ceri, S. (2012), Answer-ing search queries with CrowdSearcher, in ‘Pro-ceedings of the 21st international conference onWorld Wide Web’, WWW ’12, ACM, New York,NY, USA, pp. 1009–1018.

Cardoso, J. & Bussler, C. (2011), ‘Mapping betweenheterogeneous XML and OWL transaction repre-sentations in B2B integration’, Data & KnowledgeEngineering 70(12), 1046–1069.

Cranefield, S. & Pan, J. (2007), ‘Bridging the gapbetween the model-driven architecture and on-tology engineering’, Int. J. Hum.-Comput. Stud.65(7), 595–609.

Das, S., Sundara, S. & Cyganiak, R. (2012), ‘R2RML:RDB to RDF Mapping Language’, W3C WorkingDraft, http://www.w3.org/TR/r2rml/.

Dengel, A. R. (2007), Knowledge Technologies for theSocial Semantic Desktop, Vol. 4798 of Lecture Notesin Computer Science, Springer Berlin Heidelberg,Berlin, Heidelberg, chapter 2, pp. 2–9.

Doan, A. & Halevy, A. Y. (2005), ‘Semantic-integration research in the database community’,AI Mag. 26(1), 83–94.

Fagin, R., Haas, L. M., Hernandez, M., Miller, R. J.,Popa, L. & Velegrakis, Y. (2009), Conceptual Mod-eling: Foundations and Applications, Springer-Verlag, Berlin, Heidelberg, chapter Clio: SchemaMapping Creation and Data Exchange, pp. 198–236.

Gasevic, D., Djuric, D. & Devedzic, V. (2007), ‘MDA-based Automatic OWL Ontology Development’,Int. J. Softw. Tools Technol. Transf. 9(2), 103–117.

Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling (APCCM 2013), Adelaide, Australia

97

Page 10: Automatic Data Transformation|Breaching the Walled Gardens ...crpit.com/confpapers/CRPITV143Wischenbart.pdf · Automatic Data Transformation|Breaching the Walled Gardens of Social

Gasevic, D., Djuric, D. & Devedzic, V. (2009), ModelDriven Engineering and Ontology Development,2nd edn, Springer Publishing Company, Incorpo-rated.

Glimm, B., Rudolph, S. & Volker, J. (2010), Inte-grated metamodeling and diagnosis in OWL 2, in‘Proceedings of the 9th international semantic webconference on The semantic web - Volume PartI’, ISWC’10, Springer-Verlag, Berlin, Heidelberg,pp. 257–272.

Grau, B. C., Horrocks, I., Motik, B., Parsia, B.,Patel-Schneider, P. & Sattler, U. (2008), ‘OWL2: The next step for OWL’, Web Semantics: Sci-ence, Services and Agents on the World Wide Web6(4), 309–322.

Haas, L. M., Hernandez, M. A., Ho, H., Popa, L.& Roth, M. (2005), Clio grows up: from researchprototype to industrial tool, in ‘Proceedings of the2005 ACM SIGMOD international conference onManagement of data’, SIGMOD ’05, ACM, NewYork, NY, USA, pp. 805–810.

Heath, T. & Bizer, C. (2011), Linked Data: Evolvingthe Web into a Global Data Space, Vol. 1, Morgan& Claypool Publishers.

Henderson-Sellers, B. (2011), ‘Bridging metamodelsand ontologies in software engineering’, J. Syst.Softw. 84(2), 301–313.

Jouault, F., Allilaire, F., Bezivin, J. & Kurtev, I.(2008), ‘ATL: A model transformation tool’, Sci-ence of Computer Programming 72(1-2), 31–39.

Kapsammer, E., Kusel, A., Lechner, S., Mitsch,S., Proll, B., Retschitzegger, W., Schonbock, J.,Schwinger, W., Wimmer, M. & Wischenbart, M.(2012), User Profile Integration Made Easy -Model-Driven Extraction and Transformation ofSocial Network Schemas, in ‘International Work-shop on Interoperability of User Profiles in Multi-Application Web Environments (MultiA-Pro) atWWW 2012’, ACM.

Kim, W., Jeong, O.-R. & Lee, S.-W. (2010), ‘On so-cial Web sites’, Information Systems 35(2), 215–236.

Knoblock, C. A., Szekely, P. A., Ambite, J. L., Goel,A., Gupta, S., Lerman, K., Muslea, M., Taheriyan,M. & Mallick, P. (2012), Semi-automatically Map-ping Structured Sources into the Semantic Web,in E. Simperl, P. Cimiano, A. Polleres, O. Corcho& V. Presutti, eds, ‘ESWC’, Vol. 7295 of LectureNotes in Computer Science, Springer, pp. 375–390.

Kobeissy, N., Girod Genet, M. & Zeghlache, D.(2007), Mapping XML to OWL for seamless infor-mation retrieval in context-aware environments, in‘IEEE International Conference on Pervasive Ser-vices’, IEEE, pp. 361–366.

Legler, F. & Naumann, F. (2007), ‘A Classifica-tion of Schema Mappings and Analysis of Map-ping Tools’, Proceedings of Datenbanksysteme inBusiness, Technologie und Web (BTW 2007) (Ger-many, Aachen, March 7-9).

Massmann, S., Raunich, S., Aumuller, D., Arnold, P.& Rahm, E. (2011), Evolution of the COMA MatchSystem, in ‘The Sixth International Workshop onOntology Matching’.

Mendes, P. N., Passant, A. & Kapanipathi, P. (2010),Twarql: tapping into the wisdom of the crowd, in‘Proceedings of the 6th International Conference onSemantic Systems’, I-SEMANTICS ’10, ACM, NewYork, NY, USA.

Noy, N. F. (2004), ‘Semantic integration: a sur-vey of ontology-based approaches’, SIGMOD Rec.33(4), 65–70.

Nuzzolese, A. G., Gangemi, A., Presutti, V. &Ciancarini, P. (2010), ‘Fine-tuning triplificationwith Semion’, EKAW workshop on KnowledgeInjection into and Extraction from Linked Data(KIELD2010).

Object Management Group (2009),‘Meta object facility (MOF) 2query/view/transformation specification’,www.omg.org/spec/QVT/1.1/Beta2/PDF/.

Object Management Group (2011), ‘Meta ob-ject facility (MOF) 2 core specification’,www.omg.org/spec/MOF/2.4.1/PDF.

Parundekar, R., Knoblock, C. A. & Ambite, J. L.(2010), Linking and Building Ontologies of LinkedData, in ‘Proceedings of the 9th international se-mantic web conference on The semantic web - Vol-ume Part I’, ISWC’10, Springer-Verlag, Berlin, Hei-delberg, pp. 598–614.

Razmerita, L., Firantas, R. & Jusevicius, M. (2009),Towards a New Generation of Social Networks:Merging Social Web with Semantic Web, in ‘5thInternational Conference on Semantic Systems (I-Semantics 2009)’.

Rowe, M. & Ciravegna, F. (2008), Getting to Me - Ex-porting Semantic Social Network Information fromFacebook, in ‘1st Social Data on the Web workshop(SDoW2008)’.

Speiser, S. & Harth, A. (2010), Taking the LIDS offdata silos, in ‘Proceedings of the 6th InternationalConference on Semantic Systems’, I-SEMANTICS’10, ACM, New York, NY, USA.

Viviani, M., Bennani, N. & Egyed-Zsigmond, E.(2010), A Survey on User Modeling in Multi-Application Environments, in ‘Proceedings of the3rd International Conference on Advances inHuman-Oriented and Personalized Mechanisms,Technologies and Services’, IEEE, Nice, France,pp. 111–116.

Wimmer, M., Kappel, G., Kusel, A., Retschitzegger,W., Schonbock, J. & Schwinger, W. (2010a), Plug& Play Model Transformations - A DSL for Resolv-ing Structural Metamodel Heterogeneities, in ‘Pro-ceedings of the 10th Workshop on Domain-SpecificModeling (DSM’10) @ Splash 2010’.

Wimmer, M., Kappel, G., Kusel, A., Retschitzegger,W., Schonbock, J. & Schwinger, W. (2010b), Sur-viving the Heterogeneity Jungle with CompositeMapping Operators, in ‘Proc. of the 3rd Int. Conf.on Model Transformation’, Springer, pp. 260–275.

Yahia, S. A., Du, F. & Freire, J. (2004), A compre-hensive solution to the XML-to-relational mappingproblem, in ‘Proceedings of the 6th annual ACM in-ternational workshop on Web information and datamanagement’, WIDM ’04, ACM, New York, NY,USA, pp. 31–38.

CRPIT Volume 143 - Conceptual Modelling 2013

98