attribution department of computer science software engineering research group, berlin, germany...
TRANSCRIPT
ATTRIBUTION
Department of Computer Science
Software Engineering Research Group, Berlin,
Germany
Abdul Saboor
WELCOME TO THE PRESENTATION
Here comes your footer Slide 2
Main Agenda For The Presentation
What is Attribution and Data Attribution? The Importance of Attribution Why should the Attribution be used? Main reasons of Data Attribution Key Elements of Attribution Approaches used for Attribution Issues and Challenges in Attribution Summary of Presentation References
AGSE MEQ 2013Attribution of Open Data
Here comes your footer Slide 3
What is Attribution? Definitions
Attribution often involves identifying the author or source of information of the written material or a work of art.[1]
Attribution of Open Data
AGSE MEQ 2013
Attribution is acknowledgement of the use of someone else information, data or other work.[2]
Attribution is about crediting a copyright holder according to the terms of a copyright licence, usually crediting author/artist work like music, fiction, video and photography.
The act of attribution is defined as the act of establishing relationship which operates between users and the creator(s) of some work. Citing other authors which means refereing to them or their work.
Here comes your footer Slide 4
Data Attribution?
Data Attribution is to acknowledge the Data Creators and indicating availability of the data.
Attribution of Open Data
AGSE MEQ 2013
Granularity Features inside the datasets are being referred to.Features inside the datasets are being referred to.
VersioningDynamic or regular updated data, which version need to attributed.Dynamic or regular updated data, which version need to attributed.
Location of Data Persistent link such as Digital Object Identifier.Persistent link such as Digital Object Identifier.
AcknowledgingCreators
Ensure that the Credit is given to those authors who deserver for it.Ensure that the Credit is given to those authors who deserver for it.
Here comes your footer Slide 5
Data From Various Sectors
Attribution of Open Data
AGSE MEQ 2013
Adopted from Christine L. Borgman, UCLA, Developling Data Attribution and Citation Practices and Standards
Here comes your footer Slide 6
Social Practice
Data Reusability Reproducing Research Replicate findings (Facts & Figures)
AGSE MEQ 2013Attribution of Open Data
Importance of Attribution – 1
Need of data Attribution
Social Expectations/Requirements Legal Responsibility Must need to specify the Identifier(s)
Purpose of data Attribution
Usability of Attributed Objects
Identify the form and content Interpret Evaluate Open
Read Combine Describe Reuse
Compute upon Annotate
Here comes your footer Slide 7
Identify and Persistence of Digital Objects
Identifier ✬ DOI, URI, URL Naming and Namespaces ✬ Authors/Creators – ORCID (FUB, TUB, . . .), ISNI (People, Legal Entities, . . .)
✬ Generic/Specific – Registry Number . . .
Description ✬ Self-description
✬ Metadata augmentation description
AGSE MEQ 2013Attribution of Open Data
Importance of Attribution – 2
Identity
Permanent Long-lived Scratch spaces
Persistence
Here comes your footer Slide 8
Discoverability Identify the existence of data objects with specified characteristics –
Data Creators, Data creation date, Data creation method . . . Locate – Depends on description and representation of data and on tools and
services to search the data objects Retrieve – A variety of approaches to discover and reach data description via
standard web protocols. Semantic Web technologies, web crawlers and search engines.
AGSE MEQ 2013Attribution of Open Data
Importance of Attribution – 3
The chain of keeping and using data The transformations from the original state of datasets
Relationships
Provenance
Identification of Units The Links between various Units Actions on relationships
Here comes your footer Slide 9
Intellectual Property
AGSE MEQ 2013Attribution of Open Data
Importance of Attribution – 4
Policy for digital objects
Whose Policy? Data repositories Publishers Universities Investigators Fund raising agencies
What rights are associated? ✬ Reuse
✬ Reproduce
✬ Attribute
Who owns the Data rights? How open are the Data? ✬ Open Data
✬ Open bibliography
Types of Policy What to release? What kind of description? What attribution? What citation? Who can describe, annotate . . .
Here comes your footer Slide 10
The Importance of Metadata
Attribution of Open Data
AGSE MEQ 2013
Metadata Main purpose is, how to create durable links?
Metadata play prominent role – Documentation necessary to understand the data
Questionnaires, user guides, methodology descriptions, record layouts are also provided
Heterogeneous in format – The most unstructured data
Data Documentation Initiative (DDI) requests to provide a structured metadata standard
Adopted from Mary Vardigan, Inter-University Consortium for Political and Social Research (ICPSR)
Here comes your footer Slide 11AGSE MEQ 2013Attribution of Open Data
Data Quality
Main Reasons of Data Attribution
Proper sources of datasets Accuracy or Correctness of datasets Completeness of datasets
Allowing others to access the underlying data Allow researchers to check mistakes and inconsistencies
Previous work for verification and reuse
Maintaining research record
Understanding what has done before Attribution of existing work Understanding a subject has been changed over the time
Here comes your footer Slide 12
The Elements for Data Attribution - 1
AGSE MEQ 2013Attribution of Open Data
Elements Description
Dataset Name Specify a particular name for each dataset that represent to an organization. E.g, datasets names such as EU Coral Reef dataset.
Authors Name and Contact Details
Specify the Name of the author(s) of data and contact details. E.g, Organization name and address, telephone name, e-mail address, etc.
Data Description Description about the contents of datasets accurately.
Data Formats Specify the various supported data formats such as xml, rdf, n-triple, turtle, csv, xls, etc.
Data Handling Rules Describe the particular data handling rules or policies that apply on data and must need to follow such as Creative Commons CC0 1.0.
Data Access Methods
Specify the access method that how someone can get access to the data either via a URL or an API (Web-service SOAP, web-service REST).
Here comes your footer Slide 13
The Elements for Data Attribution - 2
AGSE MEQ 2013Attribution of Open Data
Elements Description
Dataset Size Specify an estimated sized of dataset. E.g, Less than 10 MB or more than 100 MB, or greater.
Data Time-period Specify the time period for the data which described the particular time period. E.g, 2005-2010.
Data Status Explain how often the dataset is updated, either it is updated on weekly, monthly or annually basis.
Data Factors Specify the name of the factors in the dataset. E.g, time, year, square meter, etc.
Data Availability Explain that data already exist and is available for users and if not then how data become available on web.
Language of Data Specify the data is available in one language or support some other languages.
Here comes your footer Slide 14AGSE MEQ 2013Attribution of Open Data
Vocabularies that Support Attribution - 1
Elements Description
Dcterms:Creator This property is an entity and primary responsibility is making the resource. This property can be used to acquire information about data creators of a data item.
Dcterms:Source This property describes the source of a resource is a related resource from which the described resource is derived. This make possible to create provenance elements which are associated as source data with a data creation element.
Dcterms:Modified This property specifies the date in which a resource has been changed. The modification of data item as a data creation which makes a new modified version of original data item.
Dcterms:Publisher Publisher of a resource is an entity responsible for make the resource available. This property can be used to acquire information about the provider of an information resource where actual information provider remains uncertain.
Dcterms:Provenance This property makes a link to a resource with a statement of any changes in ownership and keeping of resource since its creation that significant for its authenticity, integrity, etc.
Here comes your footer Slide 15AGSE MEQ 2013Attribution of Open Data
Vocabularies that Support Attribution - 2
Elements Description
sioc:has-creator , sioc:creator-of , sioc:has-modifier , sioc:modifier-of
sioc:has-owner , sioc:owner-of
sico:earlier-version , sioc:later-version , sioc:next-version , sioc:previous-version
The Friend of a Friend (FOAF)
Semantic Web Publishing vocabulary (SWP)
The Web Of Trust (WOT)
The Ontology Metadata Vocabulary (OMV)
The Changeset Vocabulary.
Here comes your footer Slide 16
Approaches for Attribution - 1
There are some approaches which are used to support the attribution of data that are:
Attribution of Open Data
AGSE MEQ 2013
Dublin Core Vocabulary
DC approach provides a vocabulary for expressing resources. DC relies on shared usage across different repositories and organization. The distributed application use DC terms for communication about resources. Dublic core consists of a set of qualifiers and a core set of metadata elements which make it possible to interpret the elements in the semantic way.
In context of attribution, a subset of elements and qualifiers can be employed, e.g, there are terms which are used for creator of a resource, for its publisher, and for the dates its publications. A typical Metadata statements are:
⚛An Identifier for the resource being described⚛A term from the Dublin Core Vocabulary⚛The Annotation Value
Here comes your footer Slide 17
Approaches for Attribution - 2
Attribution of Open Data
AGSE MEQ 2013
Open Provenance Model
The Open Provenance Model is a process in which data is being produced or transformed into new state, and it can represent the provenance of one or more data items from an old to a new state.
OPM graph model for provenance provides the description of provenance about the graph whose edges denote the primariy relationships between occurrences represented by nodes. OPM graph explains how multiple events conducted to produce some sort of data and shows how one part of data derived from another part.
OPM classifies nodes into three parts:
Artifacts – Parts of data fixed value and context that represent an entity in a given state
Process – Performed on artifacts in order to produce another artifact.Agents – Indicate the entities which are controlling the processes such as
users
Here comes your footer Slide 18
Granularity
◎ Dataset can be part of several files: each files contains many tables, record and data points.
◎ Additional subsets are used such as features and parameters.◎ Practical solution is to list dataset at whatever level of granularity has been
chosen by host repository for assigning identifier.
◎ If repository provides identifiers at several levels of granularity, then fine-grained level that fulfill the requirements of attribution should be used.
AGSE MEQ 2013Attribution of Open Data
Current Issues and Challenges in Attribution
Issues need to consider for making attribution process more appropriate for tracking data. Data attribution is the main successful factor for adoption of data sharing and can help to address the relevant issues while implementating data attribution.
Contributor Identifiers
◎ Every contributor has some uniqueness in their organizational activities, every institute has a unique identifier for each contributor, to be used in connection with data contributions. Two schemes used for attribution:
◎ The Open Research and Contributor Identifier (ORCID) is a scheme specifically used for academic authors.
◎ The International Standard Name Identifier (ISNI) scheme is a standard for registering Public Identifies such as People, Personnel, Legal entities in the creation or distribution of intellectual property.
Here comes your footer Slide 19
Micro – Attribution
◎ Crediting the contributors in a more compact way in order to keep process manageable.
◎ It is used to credit people or organization whose contributions do not fit the roles of data creator or compiler.
◎ The standard identifiers for both contributor and contributions are used to abbreviate the entities, a table is included in the documents supplementary data.
AGSE MEQ 2013Attribution of Open Data
Current Issues and Challenges in Attribution
Here comes your footer Slide 20
Mannual and
Automatic use of
Attribution
◎ The URL in terms of Data Attribution to lead to a landing page for the dataset rather than direct download dataset.
◎ The landing page enable users to ensure that hey have located the right datasets. The landing page create a better user experience between datasets through direct access and those available through referred access.
◎ Deep Linking provides direct access to specific datasets through hierarchical structure of website.
◎ Data are processed by software tools and SW tools provide support to reader: they can be selective to download with regard to versions and formats, to select particular files or datasets and avoid data with license restriction.
AGSE MEQ 2013Attribution of Open Data
Current Implementation Issues in Attribution
There are couple of issues in terms of data repositories that are:
Versioning
◎ An important feature of attribution system is that a reader to identify and retrieve exact same resource that author used.
◎ Possibly more versions available to choose since the data from various stages of processing can be made available in different versions.
◎ Data repositories ensure that different versions are attributed independently with their own identifier.
◎ Problem arise when repositories have to deal with rapid changes in datasets. Various version can be manageable through time slice and snapshots.
Here comes your footer Slide 21
Conclusion
Attribution of Open Data
AGSE MEQ 2013
Attribution is the process to give the credit to original creator of dataset(s)
Attribution helps to make the research process more transparent and authenticated
Attribution process maintain the Data Quality and Integrity, previous works can be verified and reused, it also maintains the proper research record
There are various elements that are used to make the attribution, there are some approaches which are used to perform that attribution.
There are various issues which need to be resolved for making the attribution processes more convenient.
Here comes your footer Slide 22
Thanks for your attentions !
Any Questions? Please
AGSE MEQ 2013Attribution of Open Data
Here comes your footer Page 23
References
1. Tony Rogers, Attribution Definition, How to use attribution in a new story. http://www.vocabulary.com/dictionary/attribution and http://journalism.about.com/od/writing/a/attribution.htm.
2. The Mind Wobbles, Attribution vs Citation: Do you know the difference? http://themindwobbles.wordpress.com/2009/07/10/attribution-vs-citation-do- you-know-the-difference/ . July 2009.
3. Christine L. Borgman, Why are the attribution and citation of scientific data important? Report from Developing Data Attribution and Citation Practices and standards. An International Symposium and Workshop, January 2012.
4. W3C Website, What is provenance? http://www.w3.org/2005/Incubator/prov/wiki/ What Is Provenance, Modified at November 2010.
5. W3C Website, A working Definition of Provenance. http://www.w3.org/2005/Incubator/prov/wiki/What Is Provenance AWork- ingDefinition of provenance, Modified at November 2010.
6. W3C Website, Provenance, Metadata, and Trust. http://www.w3.org/2005/Incubator/prov/wiki/What Is Provenance Prove- nance.2C Metadata.2C and Trust, Modified at November 2010.
7. Edzard Hofig, Jens Klessmann, Nils Barnickel (Fraunhofer), Open Innovation mechanism in Smart Cities, Revision: A, v1.6, July 2011.
8. Alex Ball and Monica Duke (2012), How to Cite Datasets and Link to Publica- tions, Revised June 2012. 9. D.G. Campbell, The use of Dublin Core in web annotation programs.In proceed- ing of the International Conference on Dublin Core
and Metadata Applications, Florence, Italy 2002, pp105-110. 10. Simon Miles, Mapping Attribution Metadata to the Open Provenance Model, .Future Generation Computer Systems 27 (6), Kings
College London, UK, pp. 806811, 2011. 11. Dublin Core Metadata Initiative Usage Board, DCMI Metadata Terms. http://dublincore.org/documents/dcmi-terms/, January 2008. 12. Olaf Hartig, Provenance information in the Web of Data, Humboldt University Zu Berlin. In proceedings of the 2nd Workshop on
Linked Data on the Web (LDOW2009), April 2009. 13. D. Brickley and L. Miller, FOAF Vocabulary Specification. http://xmlns.com/foaf/spec/. November 2007. 14. U. Bojars and J. G. Breslin. SIOC Core Ontology Specification, Revision 1.30. http://rdfs.org/sioc/spec/, January 2009. 15. J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler, Named Graphs, Provenance and Trust. In Proceedings of the 14th International
World Wide Web Conference, ACM Press, pp613-622, May 2005. 16. D. Brickley. Web of Trust RDF Ontology. http://www.w3.org/tr/rdf-schema/, February 2004. 17. R. Palma, J. Hartmann, and P. Haase. OMV - Ontology Metadata Vocabulary for the Semantic Web, v2.4.
http://omv2.sourceforge.net/, January 2008. 18. S. Tunnicliffe and I. Davis. Changeset Vocabulary. http://vocab.org/changeset/schema.html, March 2006. 19. Li Ding, James Michaelis, Jim McCusker, and Deborah L. McGuinness. Linked Provenance Data: A Semantic Web-based
approach to interoperable workflow traces, Elsevier, Future Generation Computer Systems, Vol.27, October 2010. 20. Y. Simmhan, B. Plale, and D. Gannon. A Survey of Data Provenance in e- Science. SIGMOD Record, Computer Science
Department, Indiana University. Vol. 34, Issue No. 3, p3136, ACM, September 2005. 21. P. Buneman, S. Khanna, and W. C. Tan. Data Provenance: Some Basic Issues. In Proceedings of the 20th Conference on
Foundations of Software Technology and Theoretical Computer Science (FST TCS), p87-93, Springer, December 2000. 22. M. Hausenblas, W. Slany, and D. Ayers. A Performance and Scalability Metric for Virtual RDF Graphs. In Proceedings of the 3rd
Workshop on Scripting for the Semantic Web (SFSW) at ESWC, June 2007.