so˝ware heritage · url decay disrupts the web of reference web links are not permanent (even...

50
Soware Heritage Building the Universal Soware Archive for Open Science Roberto Di Cosmo [email protected] September 17th, 2018 THE GREAT LIBRARY OF SOURCE CODE Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.sowareheritage.org September 17th, 2018 1 / 27

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

So�ware HeritageBuilding the Universal So�ware Archive for Open Science

Roberto Di Cosmo

[email protected]

September 17th, 2018

T H E G R E AT L I B R A RY O F S O U RC E C O D E

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 1 / 27

Page 2: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 2 / 27

Page 3: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Short Bio: Roberto Di Cosmo

Computer Science professor in Paris, now working at INRIA

30 years of research (Theor. CS, Programming, So�ware Engineering, Erdos #: 3)

20 years of Free and Open Source So�ware

10 years building and directing structures for the common good

1999 DemoLinux – first live GNU/Linux distro

2007 Free So�ware Thematic Group150 members 40 projects 200Me

2008 Mancoosi project www.mancoosi.org2010 IRILL www.irill.org2015 So�ware Heritage at INRIA

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 2 / 27

Page 4: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 3 / 27

Page 5: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

So�ware is everywhere

Software

Source code is executable and human readable knowledge

a growing part of our Cultural Heritage

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 3 / 27

Page 6: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Source code is special

Harold Abelson, Structure and Interpretation of Computer Programs

“Programs must be wri�en for people to read, and only incidentally for machines to execute.”

�ake III source code (excerpt) Net. queue in Linux (excerpt)

Len Shustek, Computer History Museum

“Source code provides a view into the mind of the designer.”

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 4 / 27

Page 7: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

~ 50 years, a lightning fast growth

Apollo 11 Guidance Computer (~60.000 lines), 1969

"When I first got into it, nobody knew what it was that we were doing. Itwas like the Wild West."

Margaret Hamilton

Linux Kernel

. . . now in your pockets!

are we taking care of all this?

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 5 / 27

Page 8: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 6 / 27

Page 9: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

So�ware is spread all around

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 6 / 27

Page 10: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

So�ware is fragile

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 7 / 27

Page 11: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

So�ware lacks its own research infrastructure

Photo: ALMA(ESO/NAOJ/NRAO), R. Hills

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 8 / 27

Page 12: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Research so�ware: a long way to go!

ICSE (Zannier, Melrik, Maurer, 2006)

complete absence of replication studies

ACM TOSEM 2001 to 2006 C. Ghezzi http://bit.ly/tosemreprod60% of all papers have tools: only 20% installable

Collberg’s 2015 study http://reproducibility.cs.arizona.edu/601 mainstream papers: 508 with tools, only 40% installable

Main reasonssource code (or the right version of it) cannot be found

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 9 / 27

Page 13: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

URL decay disrupts the web of reference

Web links are not permanent (even permalinks)

there is no general guarantee that a URL. . . which at one time points to a givenobject continues to do soT. Berners-Lee et al. Uniform Resource Locators. RFC 1738.

URLs used in articles decay!

Analysis of IEEE Computer (Computer), and the Communications of the ACM (CACM):1995-1999

the half-life of a referenced URL is approximately 4 years from its publication dateD. Spinellis. The Decay and Failures of URL References.

Communications of the ACM, 46(1):71-77, January 2003.

Similar findings in Lawrence, S. et al. Persistence of Web References in Scientific Research,IEEE Computer, 34(2), pp. 26–31, 2001.

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 10 / 27

Page 14: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Scholar roster of broken links

An example from Astronomy

How Do Astronomers Share Data?Pepe, Goodman, Muench, Crosas, Erdmann PLOS August 28, 2014dx.doi.org/10.1371/journal.pone.0104798

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 11 / 27

Page 15: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

DOI limitations

Example: doi:10.1109/MSR.2015.10

to find what 10.1109/MSR.2015.10 is, go to a resolver (e.g.doi.org)

this returns http://ieeexplore.ieee.org/document/7180064/at this URL we find . . .

Architecture of the DOI infrastructureDOI resolution can change

content at URL can change

no intrinsic way of noticing

persistence based on good will of multiple parties

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 12 / 27

Page 16: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

No catalog, no archive, no references: we are at a turning point

Looking at the past

a lot of old so�ware misplaced, lost, or behind barriers, but. . .

most founding fathers are still here, and willing to share

urgent to collect their knowledge

Only a few years le�.

Looking at the future

so�ware development and use skyrockets: more programmers, and more code!

essential to provide a universal platform for all the future so�ware source code

Every year that goes by makes the problem worse.

it is urgent to take action!

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 13 / 27

Page 17: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 14 / 27

Page 18: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The So�ware Heritage Project www.so�wareheritage.org

Our missionCollect, preserve and share the source code of all the so�ware that is available

Past, present and future

Preserving the past, enhancing the present, preparing the future

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 14 / 27

Page 19: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 20: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 21: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 22: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 23: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 24: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 25: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 26: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 27: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 28: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

A principled infrastructure http://bit.ly/swhpaper

Technology

transparency and FOSS

replicas all the way down

Contentintrinsic identifiers

facts and provenance

Organization

non-profit

mirror network

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 15 / 27

Page 29: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 16 / 27

Page 30: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Automation, and storage

Gitloader

Mercurialloader

Debian sourcepackage loader

tar loader

.

.

.

Software Heritage ArchiveMerkle DAG + blob storage

Loading& deduplication

dsc

dsc

hg

hg

hg

gitgit

git git

svn

svn

svn

tar

zip

softwareorigins

Packagerepos

ForgesGitHublister

GitLablister

Debianlister

PyPilister

.

.

.Distros

...

Scheduling

Listing(full/incremental)

full development history permanently archivedorigins: GitHub (auto), Debian (auto), Gitlab.com, Gitorious, Google Code, GNU~ 200Tb raw contents, ~ 10Tb graph (10Bn nodes, 100Bn edges)

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 16 / 27

Page 31: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Much more than an archive!Merkle tree (R. C. Merkle, Crypto 1979)

Combination of

tree

hash function

Classical cryptographic construction

fast, parallel signature of large data structures

widely used (e.g., Git, blockchains, IPFS, . . . )

built-in deduplication

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 17 / 27

Page 32: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The archive in pictures

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 18 / 27

Page 33: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 19 / 27

Page 34: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Reference archive for all so�ware

A "wayback machine" for so�ware source code . . . with intrinsic identifiers!

http://archive.softwareheritage.org/browsehttp://bit.ly/swhpids for persistent identifiers

Demo time: let’s highlight some features. . .

Origin searchDirectory browsing Revisions as di�s

Web UI — syntax highlighting and selectionRoberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 19 / 27

Page 35: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 20 / 27

Page 36: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Supporting more accessible and reproducible science

A global library referencing all so�ware used in all research fields

completes the infrastructure for Open Access in science

provides intrinsic persistent identifiers for scientific reproducibility

enables large scale, verifiable so�ware studies

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 20 / 27

Page 37: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Demo links

Paper points to lost gitorious.org repo saved in So�ware Heritage

https://www.openaire.eu/search/publication?articleId=dedup_wf_001::cd996f0b6236b90659f84f99feb62bcchttps://gitorious.org/parmaphttps://archive.softwareheritage.org/browse/search/?url=%22gitorious.org/parmap%22

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 21 / 27

Page 38: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Deposit Scientific So�ware

Deposit so�ware in HAL http://hal.inria.fr/hal-01738741Generic mechanism:

SWORD based

review process

versioning

How to do it:

today: deposit .zip or .tar.gz file (guide)tomorrow:

provide SWH id and metadatainclude metadata file for automaticmetadata extraction. . .

September 2018: open to all on https://hal.archives-ouvertes.fr/

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 22 / 27

Page 39: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The way to go to archive and reference scientific so�ware

All features of So�ware Heritage for free

intrinsic IDs (integrity, not dependent on resolvers!)specification: http://bit.ly/swhpidsiPres2018 paper: http://bit.ly/swhpidpaper

browse, download (now)

metadata, licenses, provenance (plagiarism detection), classification (wip), . . .

Coverage and uniformity

one archive for all domains (industry included)

reference any so�ware, not just the deposited ones

git-compatible identifiers greatly simplify workflows

Sustainability . . . doors are open!

one infrastructure independent non profit foundation worldwide mirrorsRoberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 23 / 27

Page 40: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 24 / 27

Page 41: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Growing Support

Landmark Inria Unesco agreement, April 3rd, 2017

Sharing the visionContributing to the mission

>= 100Ke/year

>= 50Ke/year

>= 25Ke/year

>= 10Ke/year

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 24 / 27

Page 42: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

The next steps

The So�ware Heritage Foundation

independent

long term mission

multistakeholder

The community

academia: Open Access, research

industry: be�er so�ware

cultural heritage: all the so�ware history

The mirror networkresilience

biodiversity

“Let us save what remains: not by vaults and locks which fence them from the publiceye and use in consigning them to the waste of time, but by such a multiplicationof copies, as shall place them beyond the reach of accident.”

Thomas Je�erson

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 25 / 27

Page 43: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

You can help!

Many scientific and technological challenges

object storage, machine learning, classification, e�icient graph queries, mirror protocols,. . .

Contributeforge.softwareheritage.org

Funding

become a partner/sponsor/mirror :sponsorship.softwareheritage.orggive your own contribution :www.softwareheritage.org/donate

Spread the word!

use the archive and helpothers do

tell everybody aboutSo�ware Heritage

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 26 / 27

Page 44: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

1 Introductions

2 So�ware is everywhere. . .

3 . . . and we are not taking care of it!

4 The So�ware Heritage initiative

5 Architecture

6 Using the So�ware Heritage archive

7 Open Science

8 Building for the long term

9 Conclusion

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 27 / 27

Page 45: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Come in, we’re open!

www.softwareheritage.org @swheritage

Library of Alexandria of code

recover the past

structure the future

A CERN for So�warebuild be�er so�ware

for industryfor society as awhole

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 27 / 27

Page 46: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

All the source code

Online

Offline

Online

OpenClosed

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 1 / 4

Page 47: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

All the source code: strategy

Online

Offline

Online

OpenClosed

Embargo

Focused Search

Automation

Crowdsourcing

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 2 / 4

Page 48: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

A bird’s eye view

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 3 / 4

Page 49: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Outline

10 Science of So�ware

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 4 / 4

Page 50: So˝ware Heritage · URL decay disrupts the web of reference Web links are not permanent (even permalinks) there is no general guarantee that a URL... which at one time points to

Big Code = Big data + AI

Large scale repeatable so�ware studies. . .

vulnerability detection

dependency analysis

pa�ern elicitation

automatic classification . . .

. . . need a uniform representation

So�ware Heritage has one data model for all forges/VCS. . .. . . yes, we do data normalization of so�ware evolution!

Breaking news: soon an Amazon public data set!

Roberto Di Cosmo www.dicosmo.org (CC-BY 4.0) www.so�wareheritage.org September 17th, 2018 4 / 4