attribution department of computer science software engineering research group, berlin, germany...

23
ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Upload: dwayne-horton

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

ATTRIBUTION

Department of Computer Science

Software Engineering Research Group, Berlin,

Germany

Abdul Saboor

WELCOME TO THE PRESENTATION

Page 2: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 2

Main Agenda For The Presentation

What is Attribution and Data Attribution? The Importance of Attribution Why should the Attribution be used? Main reasons of Data Attribution Key Elements of Attribution Approaches used for Attribution Issues and Challenges in Attribution Summary of Presentation References

AGSE MEQ 2013Attribution of Open Data

Page 3: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 3

What is Attribution? Definitions

Attribution often involves identifying the author or source of information of the written material or a work of art.[1]

Attribution of Open Data

AGSE MEQ 2013

Attribution is acknowledgement of the use of someone else information, data or other work.[2]

Attribution is about crediting a copyright holder according to the terms of a copyright licence, usually crediting author/artist work like music, fiction, video and photography.

The act of attribution is defined as the act of establishing relationship which operates between users and the creator(s) of some work. Citing other authors which means refereing to them or their work.

Page 4: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 4

Data Attribution?

Data Attribution is to acknowledge the Data Creators and indicating availability of the data.

Attribution of Open Data

AGSE MEQ 2013

Granularity Features inside the datasets are being referred to.Features inside the datasets are being referred to.

VersioningDynamic or regular updated data, which version need to attributed.Dynamic or regular updated data, which version need to attributed.

Location of Data Persistent link such as Digital Object Identifier.Persistent link such as Digital Object Identifier.

AcknowledgingCreators

Ensure that the Credit is given to those authors who deserver for it.Ensure that the Credit is given to those authors who deserver for it.

Page 5: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 5

Data From Various Sectors

Attribution of Open Data

AGSE MEQ 2013

Adopted from Christine L. Borgman, UCLA, Developling Data Attribution and Citation Practices and Standards

Page 6: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 6

Social Practice

Data Reusability Reproducing Research Replicate findings (Facts & Figures)

AGSE MEQ 2013Attribution of Open Data

Importance of Attribution – 1

Need of data Attribution

Social Expectations/Requirements Legal Responsibility Must need to specify the Identifier(s)

Purpose of data Attribution

Usability of Attributed Objects

Identify the form and content Interpret Evaluate Open

Read Combine Describe Reuse

Compute upon Annotate

Page 7: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 7

Identify and Persistence of Digital Objects

Identifier ✬ DOI, URI, URL Naming and Namespaces ✬ Authors/Creators – ORCID (FUB, TUB, . . .), ISNI (People, Legal Entities, . . .)

✬ Generic/Specific – Registry Number . . .

Description ✬ Self-description

✬ Metadata augmentation description

AGSE MEQ 2013Attribution of Open Data

Importance of Attribution – 2

Identity

Permanent Long-lived Scratch spaces

Persistence

Page 8: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 8

Discoverability Identify the existence of data objects with specified characteristics –

Data Creators, Data creation date, Data creation method . . . Locate – Depends on description and representation of data and on tools and

services to search the data objects Retrieve – A variety of approaches to discover and reach data description via

standard web protocols. Semantic Web technologies, web crawlers and search engines.

AGSE MEQ 2013Attribution of Open Data

Importance of Attribution – 3

The chain of keeping and using data The transformations from the original state of datasets

Relationships

Provenance

Identification of Units The Links between various Units Actions on relationships

Page 9: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 9

Intellectual Property

AGSE MEQ 2013Attribution of Open Data

Importance of Attribution – 4

Policy for digital objects

Whose Policy? Data repositories Publishers Universities Investigators Fund raising agencies

What rights are associated? ✬ Reuse

✬ Reproduce

✬ Attribute

Who owns the Data rights? How open are the Data? ✬ Open Data

✬ Open bibliography

Types of Policy What to release? What kind of description? What attribution? What citation? Who can describe, annotate . . .

Page 10: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 10

The Importance of Metadata

Attribution of Open Data

AGSE MEQ 2013

Metadata Main purpose is, how to create durable links?

Metadata play prominent role – Documentation necessary to understand the data

Questionnaires, user guides, methodology descriptions, record layouts are also provided

Heterogeneous in format – The most unstructured data

Data Documentation Initiative (DDI) requests to provide a structured metadata standard

Adopted from Mary Vardigan, Inter-University Consortium for Political and Social Research (ICPSR)

Page 11: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 11AGSE MEQ 2013Attribution of Open Data

Data Quality

Main Reasons of Data Attribution

Proper sources of datasets Accuracy or Correctness of datasets Completeness of datasets

Allowing others to access the underlying data Allow researchers to check mistakes and inconsistencies

Previous work for verification and reuse

Maintaining research record

Understanding what has done before Attribution of existing work Understanding a subject has been changed over the time

Page 12: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 12

The Elements for Data Attribution - 1

AGSE MEQ 2013Attribution of Open Data

Elements Description

Dataset Name Specify a particular name for each dataset that represent to an organization. E.g, datasets names such as EU Coral Reef dataset.

Authors Name and Contact Details

Specify the Name of the author(s) of data and contact details. E.g, Organization name and address, telephone name, e-mail address, etc.

Data Description Description about the contents of datasets accurately.

Data Formats Specify the various supported data formats such as xml, rdf, n-triple, turtle, csv, xls, etc.

Data Handling Rules Describe the particular data handling rules or policies that apply on data and must need to follow such as Creative Commons CC0 1.0.

Data Access Methods

Specify the access method that how someone can get access to the data either via a URL or an API (Web-service SOAP, web-service REST).

Page 13: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 13

The Elements for Data Attribution - 2

AGSE MEQ 2013Attribution of Open Data

Elements Description

Dataset Size Specify an estimated sized of dataset. E.g, Less than 10 MB or more than 100 MB, or greater.

Data Time-period Specify the time period for the data which described the particular time period. E.g, 2005-2010.

Data Status Explain how often the dataset is updated, either it is updated on weekly, monthly or annually basis.

Data Factors Specify the name of the factors in the dataset. E.g, time, year, square meter, etc.

Data Availability Explain that data already exist and is available for users and if not then how data become available on web.

Language of Data Specify the data is available in one language or support some other languages.

Page 14: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 14AGSE MEQ 2013Attribution of Open Data

Vocabularies that Support Attribution - 1

Elements Description

Dcterms:Creator This property is an entity and primary responsibility is making the resource. This property can be used to acquire information about data creators of a data item.

Dcterms:Source This property describes the source of a resource is a related resource from which the described resource is derived. This make possible to create provenance elements which are associated as source data with a data creation element.

Dcterms:Modified This property specifies the date in which a resource has been changed. The modification of data item as a data creation which makes a new modified version of original data item.

Dcterms:Publisher Publisher of a resource is an entity responsible for make the resource available. This property can be used to acquire information about the provider of an information resource where actual information provider remains uncertain.

Dcterms:Provenance This property makes a link to a resource with a statement of any changes in ownership and keeping of resource since its creation that significant for its authenticity, integrity, etc.

Page 15: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 15AGSE MEQ 2013Attribution of Open Data

Vocabularies that Support Attribution - 2

Elements Description

sioc:has-creator , sioc:creator-of , sioc:has-modifier , sioc:modifier-of

sioc:has-owner , sioc:owner-of

sico:earlier-version , sioc:later-version , sioc:next-version , sioc:previous-version

The Friend of a Friend (FOAF)

Semantic Web Publishing vocabulary (SWP)

The Web Of Trust (WOT)

The Ontology Metadata Vocabulary (OMV)

The Changeset Vocabulary.

Page 16: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 16

Approaches for Attribution - 1

There are some approaches which are used to support the attribution of data that are:

Attribution of Open Data

AGSE MEQ 2013

Dublin Core Vocabulary

DC approach provides a vocabulary for expressing resources. DC relies on shared usage across different repositories and organization. The distributed application use DC terms for communication about resources. Dublic core consists of a set of qualifiers and a core set of metadata elements which make it possible to interpret the elements in the semantic way.

In context of attribution, a subset of elements and qualifiers can be employed, e.g, there are terms which are used for creator of a resource, for its publisher, and for the dates its publications. A typical Metadata statements are:

⚛An Identifier for the resource being described⚛A term from the Dublin Core Vocabulary⚛The Annotation Value

Page 17: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 17

Approaches for Attribution - 2

Attribution of Open Data

AGSE MEQ 2013

Open Provenance Model

The Open Provenance Model is a process in which data is being produced or transformed into new state, and it can represent the provenance of one or more data items from an old to a new state.

OPM graph model for provenance provides the description of provenance about the graph whose edges denote the primariy relationships between occurrences represented by nodes. OPM graph explains how multiple events conducted to produce some sort of data and shows how one part of data derived from another part.

OPM classifies nodes into three parts:

Artifacts – Parts of data fixed value and context that represent an entity in a given state

Process – Performed on artifacts in order to produce another artifact.Agents – Indicate the entities which are controlling the processes such as

users

Page 18: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 18

Granularity

◎ Dataset can be part of several files: each files contains many tables, record and data points.

◎ Additional subsets are used such as features and parameters.◎ Practical solution is to list dataset at whatever level of granularity has been

chosen by host repository for assigning identifier.

◎ If repository provides identifiers at several levels of granularity, then fine-grained level that fulfill the requirements of attribution should be used.

AGSE MEQ 2013Attribution of Open Data

Current Issues and Challenges in Attribution

Issues need to consider for making attribution process more appropriate for tracking data. Data attribution is the main successful factor for adoption of data sharing and can help to address the relevant issues while implementating data attribution.

Contributor Identifiers

◎ Every contributor has some uniqueness in their organizational activities, every institute has a unique identifier for each contributor, to be used in connection with data contributions. Two schemes used for attribution:

◎ The Open Research and Contributor Identifier (ORCID) is a scheme specifically used for academic authors.

◎ The International Standard Name Identifier (ISNI) scheme is a standard for registering Public Identifies such as People, Personnel, Legal entities in the creation or distribution of intellectual property.

Page 19: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 19

Micro – Attribution

◎ Crediting the contributors in a more compact way in order to keep process manageable.

◎ It is used to credit people or organization whose contributions do not fit the roles of data creator or compiler.

◎ The standard identifiers for both contributor and contributions are used to abbreviate the entities, a table is included in the documents supplementary data.

AGSE MEQ 2013Attribution of Open Data

Current Issues and Challenges in Attribution

Page 20: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 20

Mannual and

Automatic use of

Attribution

◎ The URL in terms of Data Attribution to lead to a landing page for the dataset rather than direct download dataset.

◎ The landing page enable users to ensure that hey have located the right datasets. The landing page create a better user experience between datasets through direct access and those available through referred access.

◎ Deep Linking provides direct access to specific datasets through hierarchical structure of website.

◎ Data are processed by software tools and SW tools provide support to reader: they can be selective to download with regard to versions and formats, to select particular files or datasets and avoid data with license restriction.

AGSE MEQ 2013Attribution of Open Data

Current Implementation Issues in Attribution

There are couple of issues in terms of data repositories that are:

Versioning

◎ An important feature of attribution system is that a reader to identify and retrieve exact same resource that author used.

◎ Possibly more versions available to choose since the data from various stages of processing can be made available in different versions.

◎ Data repositories ensure that different versions are attributed independently with their own identifier.

◎ Problem arise when repositories have to deal with rapid changes in datasets. Various version can be manageable through time slice and snapshots.

Page 21: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 21

Conclusion

Attribution of Open Data

AGSE MEQ 2013

Attribution is the process to give the credit to original creator of dataset(s)

Attribution helps to make the research process more transparent and authenticated

Attribution process maintain the Data Quality and Integrity, previous works can be verified and reused, it also maintains the proper research record

There are various elements that are used to make the attribution, there are some approaches which are used to perform that attribution.

There are various issues which need to be resolved for making the attribution processes more convenient.

Page 22: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Slide 22

Thanks for your attentions !

Any Questions? Please

AGSE MEQ 2013Attribution of Open Data

Page 23: ATTRIBUTION Department of Computer Science Software Engineering Research Group, Berlin, Germany Abdul Saboor WELCOME TO THE PRESENTATION

Here comes your footer Page 23

References

1. Tony Rogers, Attribution Definition, How to use attribution in a new story. http://www.vocabulary.com/dictionary/attribution and http://journalism.about.com/od/writing/a/attribution.htm.

2. The Mind Wobbles, Attribution vs Citation: Do you know the difference? http://themindwobbles.wordpress.com/2009/07/10/attribution-vs-citation-do- you-know-the-difference/ . July 2009.

3. Christine L. Borgman, Why are the attribution and citation of scientific data important? Report from Developing Data Attribution and Citation Practices and standards. An International Symposium and Workshop, January 2012.

4. W3C Website, What is provenance? http://www.w3.org/2005/Incubator/prov/wiki/ What Is Provenance, Modified at November 2010.

5. W3C Website, A working Definition of Provenance. http://www.w3.org/2005/Incubator/prov/wiki/What Is Provenance AWork- ingDefinition of provenance, Modified at November 2010.

6. W3C Website, Provenance, Metadata, and Trust. http://www.w3.org/2005/Incubator/prov/wiki/What Is Provenance Prove- nance.2C Metadata.2C and Trust, Modified at November 2010.

7. Edzard Hofig, Jens Klessmann, Nils Barnickel (Fraunhofer), Open Innovation mechanism in Smart Cities, Revision: A, v1.6, July 2011.

8. Alex Ball and Monica Duke (2012), How to Cite Datasets and Link to Publica- tions, Revised June 2012. 9. D.G. Campbell, The use of Dublin Core in web annotation programs.In proceed- ing of the International Conference on Dublin Core

and Metadata Applications, Florence, Italy 2002, pp105-110. 10. Simon Miles, Mapping Attribution Metadata to the Open Provenance Model, .Future Generation Computer Systems 27 (6), Kings

College London, UK, pp. 806811, 2011. 11. Dublin Core Metadata Initiative Usage Board, DCMI Metadata Terms. http://dublincore.org/documents/dcmi-terms/, January 2008. 12. Olaf Hartig, Provenance information in the Web of Data, Humboldt University Zu Berlin. In proceedings of the 2nd Workshop on

Linked Data on the Web (LDOW2009), April 2009. 13. D. Brickley and L. Miller, FOAF Vocabulary Specification. http://xmlns.com/foaf/spec/. November 2007. 14. U. Bojars and J. G. Breslin. SIOC Core Ontology Specification, Revision 1.30. http://rdfs.org/sioc/spec/, January 2009. 15. J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler, Named Graphs, Provenance and Trust. In Proceedings of the 14th International

World Wide Web Conference, ACM Press, pp613-622, May 2005. 16. D. Brickley. Web of Trust RDF Ontology. http://www.w3.org/tr/rdf-schema/, February 2004. 17. R. Palma, J. Hartmann, and P. Haase. OMV - Ontology Metadata Vocabulary for the Semantic Web, v2.4.

http://omv2.sourceforge.net/, January 2008. 18. S. Tunnicliffe and I. Davis. Changeset Vocabulary. http://vocab.org/changeset/schema.html, March 2006. 19. Li Ding, James Michaelis, Jim McCusker, and Deborah L. McGuinness. Linked Provenance Data: A Semantic Web-based

approach to interoperable workflow traces, Elsevier, Future Generation Computer Systems, Vol.27, October 2010. 20. Y. Simmhan, B. Plale, and D. Gannon. A Survey of Data Provenance in e- Science. SIGMOD Record, Computer Science

Department, Indiana University. Vol. 34, Issue No. 3, p3136, ACM, September 2005. 21. P. Buneman, S. Khanna, and W. C. Tan. Data Provenance: Some Basic Issues. In Proceedings of the 20th Conference on

Foundations of Software Technology and Theoretical Computer Science (FST TCS), p87-93, Springer, December 2000. 22. M. Hausenblas, W. Slany, and D. Ayers. A Performance and Scalability Metric for Virtual RDF Graphs. In Proceedings of the 3rd

Workshop on Scripting for the Semantic Web (SFSW) at ESWC, June 2007.