resource curation and automated resource discovery
TRANSCRIPT
![Page 1: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/1.jpg)
Resource Curationand
Automated Resource Discovery
![Page 2: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/2.jpg)
NIF Resources
• NIF is cataloging websites that house information about databases, atlases, software tools, data, transgenic mice and other things that we consider of value to the neuroscience community.
![Page 3: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/3.jpg)
Definition of Resource
• Individual resource boundary: shall be considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.
![Page 4: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/4.jpg)
Resource Nomination
Registry(4500)
Public Registry(2100)
NIF Web(499,952)
Level 2/3(24)
User Feedback*Automated tools Web
Crawl
RegistrySubset
Nomination
Check: -Links
-Annotation-Vocabulary
*Automated updates Level 2 tools
*In Development
![Page 5: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/5.jpg)
Resource is NominatedNIF Staff, Contact at Meetings, Web Form
In NIF already?
Assign Metadata-short name, long name, url
-description (short description 1-3 sentences, longer description)-parent organization (physical location, university)
-support (grant numbers)-keywords (species, technique, structure, age, level, disease, topic)
Decision: Should it be included?
Assign resource type
Do not includeKeep Record
![Page 6: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/6.jpg)
Resources Difficult to Categorize• Link aggregates• Large organizations (NIH)• Poorly documented databases• Private data sites• Clinical trials that are still recruiting
– Experimental protocol
• Commercial entities• Journals
– JOVE– supplemental materials
![Page 7: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/7.jpg)
CINdy the resource curation tool
![Page 8: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/8.jpg)
![Page 9: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/9.jpg)
Resource Ontology (BRO)• Data Resource: provides access to data;
database, atlas, book• Software Resource: software programs or
source code• Material Resource: reagents, tissue samples or
organisms• Funding Resource: grants or contracts• Training Resource: educational materials,
training programs• Job Resource: employment opportunities• People Resource: access to individual people’s
web sites
![Page 10: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/10.jpg)
NIF Service vs BRO Service
![Page 11: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/11.jpg)
Solutions Consolidating Classes• Synonyms where appropriate: ex. Material
storage service vs. Material storage repository.
• Temporary mapping, where appropriate– *Deprecated terms must be maintained*
• Data loss
• Moving forward with a joint descriptive terminology!
![Page 12: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/12.jpg)
Evolution of the NIF Resource Ontology
Object Function Target Audience
Data Type Data Format
Materials -Biomaterials -Reagents
Software
People
Grants
Jobs
Information
Service -Storage -Production
Funding
Job Service
Community-building
General
Kids
Student
Medical
Researcher
Structured -Database -Atlas
Unstructured -Journal -Webpage
Text
RDF Text
Picture
Video
![Page 13: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/13.jpg)
![Page 14: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/14.jpg)
Resource Boundary?• Software Library
– Software tool• Plugin: I2B2
• Our solution: use url as a uniqueness qualifier– Our problem: a single url may house several
resources– Individual plugins can have individual urls
![Page 15: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/15.jpg)
Boundary cont.• Individual resource boundary: shall be
considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.
• Solution to random boundary problem:
Human Curator
![Page 16: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/16.jpg)
Issues of Scope• Single line or short paragraph + keywords
– Resource discovery problem
*Stanford ontologies description is very short (as are many) finding this resource by keyword will be difficult unless we index the content of the website.
• Data dump– Small vs. Large databases– Updates
![Page 17: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/17.jpg)
Internal referencing• Stanford example:
– License: “same as bioportal” – does not match any license types in any list.
– Problem: non standard terminology, reference to another project (no url), can create loops • also true in publications: ex., used same protocol
as paper X, which used the same protocol as paper Y
– Automated text mining tools have a hard time recognizing these
![Page 18: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/18.jpg)
What can we gain from automated systems?
• Basic information: Name, url, contact info
• Some keywords• Some descriptive text
• No resource boundary• No resource description
![Page 19: Resource Curation and Automated Resource Discovery](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649ead5503460f94bb4404/html5/thumbnails/19.jpg)
How do we help the computers?
• Common naming project (neurocommons)
http://sharedname.org/page/Main_Page• Automated uri’s• Community building:
– Shared data models– Shared ontology– RDF entity tags? (mouse vs mouse)