![Page 1: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/1.jpg)
The life-sciences as a pathfinder in data-intensive research practice
Dr Andrew Treloar Director of Technology
11 April 2023 CC-BY-SA atreloar 1
Structure presentation Research Lifecycles Functions of Scholarly Communication Pointers to the future Characterising the future Pathfinder problems Conclusions
11 April 2023 CC-BY-SA atreloar 2
So many lifecycleshellip
11 April 2023 CC-BY-SA hvdsomp and atreloar 3
Minimal Research Lifecycle
Think
DoShare
11 April 2023 CC-BY-SA atreloar 4
Sharing Scholarly Communication System and its Functions
Registration Certification Awareness Archiving
(Rosendaal and Geurts 1997)
11 April 2023 CC-BY-SA hvdsomp and atreloar 5
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 2: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/2.jpg)
Structure presentation Research Lifecycles Functions of Scholarly Communication Pointers to the future Characterising the future Pathfinder problems Conclusions
11 April 2023 CC-BY-SA atreloar 2
So many lifecycleshellip
11 April 2023 CC-BY-SA hvdsomp and atreloar 3
Minimal Research Lifecycle
Think
DoShare
11 April 2023 CC-BY-SA atreloar 4
Sharing Scholarly Communication System and its Functions
Registration Certification Awareness Archiving
(Rosendaal and Geurts 1997)
11 April 2023 CC-BY-SA hvdsomp and atreloar 5
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 3: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/3.jpg)
So many lifecycleshellip
11 April 2023 CC-BY-SA hvdsomp and atreloar 3
Minimal Research Lifecycle
Think
DoShare
11 April 2023 CC-BY-SA atreloar 4
Sharing Scholarly Communication System and its Functions
Registration Certification Awareness Archiving
(Rosendaal and Geurts 1997)
11 April 2023 CC-BY-SA hvdsomp and atreloar 5
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 4: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/4.jpg)
Minimal Research Lifecycle
Think
DoShare
11 April 2023 CC-BY-SA atreloar 4
Sharing Scholarly Communication System and its Functions
Registration Certification Awareness Archiving
(Rosendaal and Geurts 1997)
11 April 2023 CC-BY-SA hvdsomp and atreloar 5
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 5: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/5.jpg)
Sharing Scholarly Communication System and its Functions
Registration Certification Awareness Archiving
(Rosendaal and Geurts 1997)
11 April 2023 CC-BY-SA hvdsomp and atreloar 5
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 6: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/6.jpg)
System of Journals Registration
submission of manuscript
Certification peer-review (pre-publication) commentary (post-publication)
Awareness discovery services
Archiving libraries (print) publishers (electronic) special purpose organisations (eg Portico)
11 April 2023 CC-BY-SA hvdsomp and atreloar 6
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 7: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/7.jpg)
Pointers to the future
ldquothe future is already here ndash itrsquos just not very evenly distributedrdquo
William Gibson NPR interview
11 April 2023 CC-BY-SA hvdsomp and atreloar 7
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 8: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/8.jpg)
Registration BioRxiv
11 April 2023 CC-BY-SA hvdsomp and atreloar 8
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 9: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/9.jpg)
Registration Github
11 April 2023 CC-BY-SA hvdsomp and atreloar 9
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 10: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/10.jpg)
Registration WikiPathways
11 April 2023 CC-BY-SA hvdsomp and atreloar 10
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 11: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/11.jpg)
Registration NeuroLex
11 April 2023 CC-BY-SA hvdsomp and atreloar 11
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 12: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/12.jpg)
Registration Nanopublications
11 April 2023 CC-BY-SA hvdsomp and atreloar 12
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 13: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/13.jpg)
Registration some observations Decoupling registration from certification Timestamping versioning Registration of various types of objects Machines as creators and contributors
11 April 2023 CC-BY-SA hvdsomp and atreloar 13
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 14: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/14.jpg)
Certification PubMed Commons
11 April 2023 CC-BY-SA hvdsomp and atreloar 14
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 15: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/15.jpg)
Certification PubPeer
11 April 2023 CC-BY-SA hvdsomp and atreloar 15
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 16: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/16.jpg)
Certification Publons
11 April 2023 CC-BY-SA hvdsomp and atreloar 16
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 17: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/17.jpg)
Certification some observations Peer-review decoupled from publication process Certification of various types of objects Machines validating form Social endorsement
11 April 2023 CC-BY-SA hvdsomp and atreloar 17
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 18: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/18.jpg)
Awareness myExperiment
11 April 2023 CC-BY-SA hvdsomp and atreloar 18
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 19: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/19.jpg)
Awareness eLabNotebook RSS
11 April 2023 CC-BY-SA hvdsomp and atreloar 19
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 20: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/20.jpg)
Awareness Twitter
11 April 2023 CC-BY-SA hvdsomp and atreloar 20
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 21: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/21.jpg)
Awareness some observations Awareness for various types of objects Real time awareness Awareness support targeted at machines Awareness through social media
11 April 2023 CC-BY-SA hvdsomp and atreloar 21
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 22: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/22.jpg)
Archiving PDB
11 April 2023 CC-BY-SA hvdsomp and atreloar 22
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 23: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/23.jpg)
Archiving GenBank
11 April 2023 CC-BY-SA hvdsomp and atreloar 23
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 24: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/24.jpg)
Characterising the future
11 April 2023 CC-BY-SA hvdsomp and atreloar 24
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 25: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/25.jpg)
Fundamental changes The research process (objects social
dimension) is becoming more exposed Articles books are no longer the only
relevant objects for research communication Objects are no longer static Machines are joining humans as
(co-)creators and consumers of research objects
11 April 2023 CC-BY-SA hvdsomp and atreloar 25
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 26: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/26.jpg)
Pathfinder problems Integrity of the scholarly record The three obsolescences
hardware file format software
11 April 2023 CC-BY-SA atreloar 26
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 27: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/27.jpg)
System of Journals Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 27
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 28: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/28.jpg)
Web of Objects Archiving
11 April 2023 CC-BY-SA hvdsomp and atreloar 28
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 29: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/29.jpg)
Not just citation relationships
11 April 2023 CC-BY-SA hvdsomp and atreloar 29
Your Text Here
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 30: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/30.jpg)
The problem of obsolescence Lifescience research environment can be viewed as
undergoing a process of accelerated evolution Other disciplines will hit these problems in time
11 April 2023 CC-BY-SA atreloar 30
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 31: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/31.jpg)
Cambrian explosion
11 April 2023 31
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 32: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/32.jpg)
Hardware obsolescence Roche 454
11 April 2023 CC-BY-SA atreloar 32
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 33: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/33.jpg)
Software obsolescence too much choice not enough support
11 April 2023 CC-BY-SA atreloar 33
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 34: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/34.jpg)
Abandonware ldquoLast summer a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project The ANREP program which annotates structural motifs in gene or protein sequences was out of date having been written more than a decade ago Although still used by molecular biologists its slow computing ability meant a straightforward multiple search could take all night on a desktop PC The Udine biologist wanted Vitacolonna a postdoctoral fellow in computational biology to write a program that could do the job more quicklyrdquo Sam Jaffe Scientists Abandon their Software The Scientist Feb 16 2004
11 April 2023 CC-BY-SA atreloar 34
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 35: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/35.jpg)
File format obsolescence Illumina Probability of error in basecalling encoded using ascii code
to reduce file size Meaning of the ascii code changed along the life cycle and
for data generated at different time points the quality might be encoded differently
ldquoIf you get an error like Invalid quality score value your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores Youll need to add the option -Q33 to your FASTX Toolkit argumentsrdquo Obviouslyhellip
11 April 2023 CC-BY-SA atreloar 35
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 36: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/36.jpg)
Everett Rogers Diffusion of Innovation 1962
11 April 2023 CC-BY-SA atreloar 36
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 37: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/37.jpg)
Conclusions Need to move to a smaller number of standard file
formats Need to move to a more sustainable model of
software development and maintenance Need to encourage platform manufacturers to
innovate around the hardware not the software NOTE other disciplines are looking to lifesciences
to work out how to solve some of these problems11 April 2023 CC-BY-SA atreloar 37
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 38: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/38.jpg)
On best practices in the development of bioinformatics software Front Genet 02 Jul 14
Source code available to reviewers Software indexed citable available Source code documented Source code managed Test libraries sample data and dataset repositories
available
11 April 2023 CC-BY-SA atreloar 38
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-
![Page 39: The life-sciences as a pathfinder in data-intensive research practice](https://reader035.vdocuments.us/reader035/viewer/2022062418/554e83fcb4c90573338b4597/html5/thumbnails/39.jpg)
Questions andrewtreloarandsorgau
atreloar
httpswwwslidesharenetatreloarthe-lifesciences-as-a-pathfinder-in-dataintensive-research-practice
11 April 2023 CC-BY-SA atreloar 39
- The life-sciences as a pathfinder in data-intensive research pr
- Structure presentation
- So many lifecycleshellip
- Minimal Research Lifecycle
- Sharing Scholarly Communication System and its Functions
- System of Journals
- Pointers to the future
- Registration BioRxiv
- Registration Github
- Registration WikiPathways
- Registration NeuroLex
- Registration Nanopublications
- Registration some observations
- Certification PubMed Commons
- Certification PubPeer
- Certification Publons
- Certification some observations
- Awareness myExperiment
- Awareness eLabNotebook RSS
- Awareness Twitter
- Awareness some observations
- Archiving PDB
- Archiving GenBank
- Characterising the future
- Fundamental changes
- Pathfinder problems
- System of Journals Archiving
- Web of Objects Archiving
- Not just citation relationships
- The problem of obsolescence
- Cambrian explosion
- Hardware obsolescence Roche 454
- Software obsolescence too much choice not enough support
- Abandonware
- File format obsolescence Illumina
- Everett Rogers Diffusion of Innovation 1962
- Conclusions
- On best practices in the development of bioinformatics software
- Questions
-