ninja research lab, university of victoria
TRANSCRIPT
![Page 1: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/1.jpg)
Intellectual Property and Mining Software Repositories
Ninja Research Lab, University of Victoria
Daniel M German
Mining Software Archives, Ascona, 2010
18 March 2010
Daniel M German Ninja Research Lab, University of Victoria
![Page 2: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/2.jpg)
Intellectual Property and Mining Software Repositories
1 Intellectual Property and Mining Software Repositories
Daniel M GermanNinja Research Lab, University of Victoria
Daniel M German Ninja Research Lab, University of Victoria
![Page 3: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/3.jpg)
Intellectual Property and Mining Software Repositories
FOSS has fulfilled the goals of COTS
1 FOSS is a thriving ecosystem2 Widely used in industry3 But comes with a price
Daniel M German Ninja Research Lab, University of Victoria
![Page 4: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/4.jpg)
Intellectual Property and Mining Software Repositories
Use cases
1 Is my system properly honouring the licenses of all of itscomponents?
2 Given my intentions, can I use this component?3 Is any of this code derived from FOSS?
Daniel M German Ninja Research Lab, University of Victoria
![Page 5: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/5.jpg)
Intellectual Property and Mining Software Repositories
Software is complex, auditing its IP is challenging
Daniel M German Ninja Research Lab, University of Victoria
![Page 6: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/6.jpg)
Intellectual Property and Mining Software Repositories
Auditing IP:
1 Is my system properly honouring the licenses of all of itscomponents?
1 What components is it using?
Not trivial!
2 What is the license of each component?3 What is the license of each file in each component?4 How do the licenses of the files of a system interact with the
license of the system?
Daniel M German Ninja Research Lab, University of Victoria
![Page 7: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/7.jpg)
Intellectual Property and Mining Software Repositories
License identification challenges
Type. Challenge
Finding the licensestatement
F1. License statements are usually mixed with other text
F2. Files might reference another file where the license islocated
F3. Files might contain multiple licenses
Language related L1. Licensing statements contain spelling errors
L2. A given license is referred in different ways
L3. Licensors change the spelling/grammar of the licensestatement
License customiza-tion
C1. Several licenses must be customized when used
C2. Licensors modify, add or remove conditions to wellknown licenses
C3. Licensors modify licenses for various intents
Daniel M German Ninja Research Lab, University of Victoria
![Page 8: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/8.jpg)
Intellectual Property and Mining Software Repositories
Current developments: Ninka
1 License identification system
Capable of identifying more than 100 FOSS licensesDesigned to avoid making mistakes (at the cost of recall)Faster than the competition
Ninka FOSSo. ohcount OSLCCorrect 200 137 83 57
Incorrect 7 112 167 193Unknown 43 1 0 0
Recall 82.3% 99.2% 100.0% 100.0%Precision 96.6% 55.0% 33.2% 29.5%
F-measure 0.889 0.708 0.498 0.371Execution Time 22s 923 s 27s 372s
Daniel M German Ninja Research Lab, University of Victoria
![Page 9: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/9.jpg)
Intellectual Property and Mining Software Repositories
Debian 5.0.2 licenses: Most common licenses pernumber of applications in which they appear
License Aps. PropNONE 8241 74.2%GPLv2+ 5486 49.4%SeeFile 1252 11.3%LibraryGPLv2+ 1150 10.4%SameAsPerl 791 7.1%LesserGPLv2.1+ 767 6.9%MITX11 601 5.4%BSD3 646 5.8%GPLv2 582 5.2%LesserGPLv2+ 470 4.2%GPLnoVersion 334 3.0%BSD2 255 2.3%publicDomain 244 2.2%
Daniel M German Ninja Research Lab, University of Victoria
![Page 10: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/10.jpg)
Intellectual Property and Mining Software Repositories
Fedora 12: Most common licenses found by file
License No. FilesNONE 61475 19%EPLv1 40310 12%GPLv2+ 31392 10%UNKNOWN 23202 7%Apachev2 18059 6%GPLv2 15173 5%LesserGPLv3 12616 4%LesserGPLv2.1+ 9342 3%LibraryGPLv2+ 9320 3%GPLv3+ 7475 2%SeeFile 6163 2%boostV1 4802 1%BSD3 4460 1%MITX11noNotice 4219 1%CDDLv1orGPLv2 3651 1%
Daniel M German Ninja Research Lab, University of Victoria
![Page 11: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/11.jpg)
Intellectual Property and Mining Software Repositories
Fedora 12: Most common declared licenses
Declared License Source License # Src Pkgs # Bin Pkgsgplv2+ GPLv2+ 118 145asl 2.0 Apachev2 28 48lgplv2+ LesserGPLv2.1+ 27 36mit MITX11noNotice 21 30mit MITold 18 23lgplv2+ LibraryGPLv2+ 16 23gpl+ or artistic SameAsPerl 14 14gplv2 GPLv2 11 12bsd BSD3 11 11gplv2 GPLv2+ 10 14
Daniel M German Ninja Research Lab, University of Victoria
![Page 12: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/12.jpg)
Intellectual Property and Mining Software Repositories
Applications that had Errors in their Licensing
Files without a license that should have oneCutting-and-pasting the wrong license statementInconsistent license clausesIncorrect name of the licenseLicense statements can only be edited by their copyrightowners
Daniel M German Ninja Research Lab, University of Victoria
![Page 13: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/13.jpg)
Intellectual Property and Mining Software Repositories
License Maintenance: Requirements
Editing of the license statements.Verifying the validity of the license statements.Summarizing licenses in source code files.Tracking of copyright owners.
Daniel M German Ninja Research Lab, University of Victoria
![Page 14: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/14.jpg)
Intellectual Property and Mining Software Repositories
Current Development: Auditing Fedora 12
1 Determining the license of a component is sometimeseasy:
All files share the same license
2 But sometimes it is extremely difficult:
Same source package splits into different binary packageseach with a different licenseSometimes licenses are in documentationErrors in licenses!
Sometimes by developersSometimes by packagers
Daniel M German Ninja Research Lab, University of Victoria
![Page 15: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/15.jpg)
Intellectual Property and Mining Software Repositories
Fedora 12: Licenses for source packages having codewith one license
Declared License Source License # Src Pkgs # Bin Pkgsgplv2+ GPLv2+ 118 145asl 2.0 Apachev2 28 48lgplv2+ LesserGPLv2.1+ 27 36mit MITX11noNotice 21 30mit MITold 18 23lgplv2+ LibraryGPLv2+ 16 23gpl+ or artistic SameAsPerl 14 14gplv2 GPLv2 11 12bsd BSD3 11 11gplv2 GPLv2+ 10 14lgplv2+ LesserGPLv2+ 8 9gplv3+ GPLv3+ 8 9mit X11mit 7 12epl EPLv1 6 6mit X11 5 6lgplv2+ SeeFile 5 6mit SeeFile 4 5bsd BSD2 4 6bsd BSD4 4 4asl 1.1 Apachev1.1 4 4
Daniel M German Ninja Research Lab, University of Victoria
![Page 16: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/16.jpg)
Intellectual Property and Mining Software Repositories
Example 2: packages with one license that isinconsistent with the declared license
WarningLevel
Issue Source Package Declared License Source License
OK Incorrect mysql-connector-java gplv2 with exceptions GPLv2
license glade3 gplv2+ and (gplv2+ andlgplv2+) and lgplv2
GPLv2+
identification imagemagick imagemagick LesserGPLv2+
gzip gplv2 and gfdl GPLv2+
mpfr lgplv2+ and gplv2+ and gfdl LesserGPLv2.1+
libpng zlib GPLv2+
OK Optionalcompo-nent
libpng zlib GPLv2+
OK Used as acompo-nent
opensp mit LibraryGPLv2+
OK Inconsistentdeclaredlicense
automake gplv2+ and gfdl and mit GPLv2+
Daniel M German Ninja Research Lab, University of Victoria
![Page 17: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/17.jpg)
Intellectual Property and Mining Software Repositories
Example 2: packages with one license that isinconsistent with the declared license..
WarningLevel
Issue Source Package Declared License Source License
Suspicious FedoraFalsePositive
eclipse-cdt epl and cpl EPLv1
Suspicious License bsf asl 1.1 Apachev2
change mtools gplv2+ GPLv3+
Unknown License wasnot found
ortp lgplv2+ and vsl LesserGPLv2.1+
Daniel M German Ninja Research Lab, University of Victoria
![Page 18: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/18.jpg)
Intellectual Property and Mining Software Repositories
Example 3: Packages under the GPL with code underthe BSD-4
Warning Issue PackagesLevelOk Copyright by
UofCftp, guile, kernel, nmap, rpm, squid
Copyright byNetBSD
exiv2, rpcbind
Sample code bashSuspicious Files using
BSD-4cups, isdn4k-utils, xen
Daniel M German Ninja Research Lab, University of Victoria
![Page 19: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/19.jpg)
Intellectual Property and Mining Software Repositories
Example 4: Source packages that contain filesdistributed with inconsistent GPL versions
Warning Issue Package Declared SourceLevel License LicenseSuspicious License fetchmail gplv1+ GPLv2+
Evolution iptables gplv1+ GPLv2+cvs gplv1+ GPLv2+bash gplv2+ GPLv3+bison, gplv2+ GPLv3+
Some incon-sistent
mtools gplv2+ GPLv3+
files vinagre gplv2+ GPLv3+Contradictory vinagre gplv2+ GPLv3+documentation
Daniel M German Ninja Research Lab, University of Victoria
![Page 20: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/20.jpg)
Intellectual Property and Mining Software Repositories
Results of interactions with Fedora and Upstream
Status Issue Source PackageResolved Incorrect license enchant, kdesdk, wiresharkUpstream in sourcesResolved Incorrect license xenIndependantly in sourcesResolved Incorrect declared abrtby Fedora license
Dynamic linking phpwith GPL
Acknowledged Dynamic linking lvm2, pilot-linkby Fedora with GPLReported Incorrect license cups, isdn4k-utilsUpstream in sourcesReported Incorrect declared alsa-utils, bison, eclipse-cdt,to Fedora license fetchmail, firstboot, iproute,
iptables, kdebindings, mtools,ortp, rpcbind, vinagre, vino, yum
Daniel M German Ninja Research Lab, University of Victoria
![Page 21: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/21.jpg)
Intellectual Property and Mining Software Repositories
Current Development: Ultra fast 1-to-n clone detection
1 Windows 7 USB/DVD Download Tool contains GPL codebut it is distributed with a proprietary license!
2 How can I know if my source code contains FOSS sourcecode?
3 Running ccfinder on 0.5 million files of Debian 5.0.2 took35 days!
Other tools simply run out of memoryWe hit a worst-case: it took 1.5 days to analyze 1 file forclones (in itself only!)
Daniel M German Ninja Research Lab, University of Victoria
![Page 22: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/22.jpg)
Intellectual Property and Mining Software Repositories
Current Development: Yocca..
1. Yocca is a system for the verification of the existence ofclones between a file and a large corpus of code (potentiallymillions of files)
1 based on n-grams2 performs syntactic clone detection3 Runs in time O(n log n).
Corpus Size 1st Qu. Median Mean 3rd Qu. Max100 0.270 0.375 0.519 0.495 2.880
1,000 0.260 0.370 0.557 0.530 3.49010,000 0.260 0.540 0.822 0.865 5.560
100,000 0.308 3.700 9.420 9.535 100.510
Times in seconds.Daniel M German Ninja Research Lab, University of Victoria
![Page 23: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/23.jpg)
Intellectual Property and Mining Software Repositories
Future work
Legal issues will not go awayWe are scratching the surfaceSeveral areas of future work:
Architecture recoveryOrigin analysis
particularly at the assembly level
Dependency analysis
What does my application really need?
Daniel M German Ninja Research Lab, University of Victoria
![Page 24: Ninja Research Lab, University of Victoria](https://reader036.vdocuments.us/reader036/viewer/2022081405/6293e45581329f2bf609fe2a/html5/thumbnails/24.jpg)
Intellectual Property and Mining Software Repositories
Acknowledgements
This work is being done in collaboration with:
Ahmed HassanGiulio AntoniolJulius DavisKatsuro InoueMassimiliano Di PentaSimone LivieriYann-Gael GueheneucYuki Manabe
Daniel M German Ninja Research Lab, University of Victoria