in search of anti-commons: academic patenting and patent-paper pairs in biotechnology. an analysis...
DESCRIPTION
In search of anti-commons: Academic patenting and patent-paper pairs in biotechnology. An analysis of citation flows. Tom Magerman, Bart Van Looy, Koenraad Debackere ([email protected]) INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management) - PowerPoint PPT PresentationTRANSCRIPT
In search of anti-commons: Academic patenting and patent-paper pairs in biotechnology. An analysis of citation flows.Tom Magerman, Bart Van Looy, Koenraad Debackere([email protected])INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)K.U.Leuven Managerial Economics, Strategy & InnovationECOOM (Centre for R&D Monitoring)
ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012
1957
TECHNOLOGY
SCIENCE
University-Industry linkages
University-Industry linkages
Scientification of technology
Commercialization of science(Entrepreneurial University)
University-Industry linkages
Complementarities
Generation of new research ideas
Additional funding
Create a market of ideas
+
University-Industry linkages
Complementarities
Generation of new research ideas
Additional funding
Create a market of ideas
+Crowding out
Quality
Research orientation
Anti-commons and the end of open science
-
University-Industry linkages
Scientification of technology
Commercialization of science(Entrepreneurial University)
Anti-commons and the end of open science
If I have seen a little further [then you and Descartes]it is by standing on the shoulders of Giants.
Isaac Newton, letter to Robert Hoode(originated from John of Salisbury)
Anti-commons and the end of open science
Anti-commons and the end of open science
Tragedy of the anticommons: underuse of scarce resources because too many owners can block each other=> more intellectual property rights may lead paradoxically to fewer useful products
On the one hand incentive to undertake risky research On the other hand too many owners hold rights in previous discoveries that constitute obstacles to future research=> high transaction costs lead to inefficiencies
Biomedical research has been moving from a commons model toward a privatization model=> risc of anticommons tragedy Influenced by patent system: what is patentable (e.g. patents on gene fragments)Influenced by patent owner: licensing behavior (e.g. use of reach-through license agreements)
Transition or tragedy? Find ways to lower transactions costs of bundling rights (intermediate organizations; patent pools; cross-licensing) 8/09/2011 Tom Magerman – ENID 2011 17
Anti-commons and the end of open science
Expansion of IPR is privatizing the scientific commons and limiting scientific progress
– Heller and Eisenberg (1998); Argyres and Liebskind (1998); David (2000); Lessig (2002); Etzkowitz (1998); Krimsky (2003)
Murray and Stern (2007): “Do formal intellectual property rights hinder the free flow of scientific knowledge? An empirical test of the anti-commons hypothesis”• How does IPRs affect propensity of future researchers to build upon
knowledge?• Compare citation patterns of publications in pre-grant period and after grant• 169 patent-paper pairs (Nature Biotechnology)• Modest anti-commons effect: decline in citation rate by 10 to 20%
Detection of patent-publication pairs
Text Mining
Text mining refers to the automated extraction of knowledge and information from text by means of revealing relationships and patterns present, but not obvious, in a document collection.
Related to data mining, but additional issues: other scale of dimensionality (100,000+
‘variables’) different kind of variables (not really
independent, and very, very sparse – 99.99%)
language issues (homonymy/polysemy and synonymy)
Latent Semantic Analysis (LSA)
LSA was developed late 1980s at BellCore/Bell Laboratories by Landauer and his team of Cognitive Science Research:“Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the meaning of words. Meaning is estimated using statistical computations applied to a large corpus of text. The corpus embodies a set of mutual constraints that largely determine the semantic similarity of words and sets of words. These constraints can be solved using linear algebra methods, in particular, singular value decomposition.”
LSA is a technique for analyzing text: extract (underlying or latent) meaning from text LSA is a theory of meaning: meaning is acquired by solving an enormous set of
simultaneous equations that capture the contextual usage of words LSA is a new approach to cognitive science: use large text corpora to test cognitive
theories
Linear algebra problem
The meaning of passages of text must be sums of the meaning of its words.
LSA models a large corpus of text as a large set of simultaneous equations.
The solution is in the form of a set of vectors, one for each word and passage, in a semantic space
Similarity of meaning of two words is measured by the cosine between the vectors, and the similarity of two passages as the same measure on the sum or average of all its contained words
SVD dimensionality reductionSingular Value Decomposition rank-k approximation:
Dimensionality reduction by taking first k singular values:
with a diagonal matrix of singular values TVUA )...( 222
21 n
nkkkkmnmk
nm VUAAA ..
Practical application?
SVDtruncation
Term weightin
g
Even when using LSA/SVD as text mining method, many options remain!
Assessment of 40 measure variants
4 weightin
g methods
9 SVD truncation levels
+ no SVD
40 similarity measures based on SVD and cosine
Full process
Construct DbT matrix
Create full text index with stop word removal and stemming
(Lucene)
Convert full text index to
document-by-term matrix
(Matlab)
Weight DbT matrix (4 variants)
SVD truncatio
n
Decompose weighted DbT
matrix into U∑V using 1,000
largest singular values
Generate document –by-concept matrix
V∑
Truncate document-by-concept matrix (take first 1000,
500, …, 5 concepts)
Similarity calculatio
nNormalise DbT
and DbC matrices
Calculate distance matrix
(all patents to all publications) by calculating inner
product of vectors
Retain closest publication for
every patent for all of the 43
variants
Expert validation
Measure R² Measure R²
RAW
No SVD 0.61
TF-ID
F
No SVD 0.71 SVD 1000 0.34 SVD 1000 0.45 SVD 500 0.31 SVD 500 0.34 SVD 300 0.30 SVD 300 0.26 SVD 200 0.31 SVD 200 0.21 SVD 100 0.30 SVD 100 0.17 SVD 25 0.22 SVD 25 0.14 SVD 5 0.11 SVD 5 0.11
BIN
No SVD 0.77
IDF
No SVD 0.80 SVD 1000 0.65 SVD 1000 0.63 SVD 500 0.63 SVD 500 0.57 SVD 300 0.58 SVD 300 0.54 SVD 200 0.51 SVD 200 0.51 SVD 100 0.45 SVD 100 0.49 SVD 25 0.38 SVD 25 0.46 SVD 5 0.20 SVD 5 0.21
Common terms (weighted by min number of terms) 0.82 Common terms (weighted by max number of terms) 0.68 Common terms (weighted by avg number of terms) 0.75
University-Industry linkages
Scientification of technology
Commercialization of science(Entrepreneurial University)
Methodology and data Publication data
Selection of biotechnology publications from the Web of Science based on the subject classification (1991-2008):• Core set of 243,361 publications : subject category
Biotechnology & Applied Microbiology• Extended set of 683,674 publications : publications of
following subject categories citing or cited by a publication of the core set: Biochemical Research Methods; Biochemistry & Molecular Biology; Biophysics; Plant sciences; Cell Biology; Developmental Biology; Food sciences & Technology; Genetics & Heredity; Microbiology Materials
• Multidisciplinary set of 97,970 publications : publications from multidisciplinary journals Nature; Science; and Proceedings of the National Academy of Sciences of the United States of America
1,025,005 publications in total (948,432 suited for text mining)
478,361 publications published between 1991 and 2000
Methodology and data Patent data
Selection of all granted EPO and USPTO biotechnology patents, applied for between 1991 and 2008, from PATSTAT using IPC-codes as listed in OECD definition of biotechnology (‘A Framework for Biotechnology Statistics’, OECD, Paris, 2005)27,241 EPO patents and 91,775 USPTO patents 119,016 patents in total (88,248 suited for text mining)
Methodology and data Matching
Original document combinations: 83,697,227,136 patent-publication combinations
CommonTermsMin ≥ 0.60:27,250 patent-publication combinations
And CommonTermsMax ≥ 0.30:645 patent-publication combinations
And at least one shared inventor/author:584 patent-publication pairs
Methodology and data Pairs
584 patent-publication pairs identified• 17 patent linked to multiple publications (up to 3)• 115 publications linked to multiple patents (up to 7) (patent
families)• 566 distinct patents paired with publication• 400 distinct publications paired with patentPatentee type• 292 University• 128 Government / Non profit• 126 Company• 38 Hospital• 21 Individual(42 patents have multiple patentees from different sectors)
Publication and citation numbers
Citation analysis
Match publications to deal with quality differences
Paired and non-paired publications matched by year and journal (1991-2000)
PAIRS NONPAIRSVY SO PUB AVG_AU AVG_CIT PUB AVG_AU AVG_CIT
1991BIOCHEMISTRY 1 5.00 65.00 625 4.03 57.201991BIOTECHNIQUES 1 2.00 64.00 125 3.24 40.27
… … 1992BIOSCIENCE BIOTECH AND BIOCHEMISTRY 1 2.00 4.00 543 4.24 8.071992BIOTECHNIQUES 1 4.00 147.00 144 3.07 26.17
… … Total 328 5.18 130.47 117,909 4.42 67.03
328 paired publications versus 106,027 biotechnology publications
Before and after publication and grant
Variable Class N Lower cl
mean Mean Upper cl
mean Ratio average citations
pairs/non-pairs Pre-grant 288 1.42 1.71 2.00
Ratio average citations pairs/non- pairs Post-grant 288 1.48 1.74 2.00
Diff (1-2) -0.43 -0.03 0.36
T-TESTS Variable Method Variances DF t value Pr > |t|
Ratio average citations pairs/non-pairs Pooled Equal 574 -0.17 0.8666
Ratio average citations pairs/non-pairs Satterthwaite Unequal 565 -0.17 0.8666
EQUALITY OF VARIANCES
Variable Method Num DF Den DF F value Pr > F Ratio average citations
pairs/non-pairs Folded F 287 287 1.29 0.0299
Paired sample t-tests
Test
N Mean 1 Mean 2 Difference t value Pr > |t|
Paired vs non-paired
Forward citations 190 130.47 74.24 56.23 4.33 0.0001
Without self citations 190 116.01 65.02 50.99 4.07 0.0001
Paired vs non-paired (at least 2 paired publications)
Forward citations 59 224.97 131.63 93.34 3.12 0.0028
Without self citations 59 202.7 117.88 84.82 2.97 0.0043
Paired and grey zone vs all others
Forward citations 764 60.57 42.69 17.88 5.72 0.0001
Without self citations 764 53.09 36.48 16.61 5.59 0.0001
Paired and grey zone vs all others (at least 2 paired or grey zone publications)
Forward citations 281 96.41 59.64 36.77 5.57 0.0001
Without self citations 281 85.85 51.76 34.09 5.43 0.0001
Multivariate analysis (negative binomial)
Parameter B Std.
Error
95% Wald Confidence Interval Lower - Upper
Hypothesis Test Wald Chi-
Square df Sig. (Intercept) 2.966 .1258 2.719 3.213 555.643 1 .000 Pair (Y/N) .450 .0506 .350 .549 78.945 1 .000 Document type: Article -.574 .0113 -.596 -.552 2589.688 1 .000 Letter -.774 .0590 -.890 -.659 172.469 1 .000 Note -.567 .0175 -.601 -.533 1051.989 1 .000 Review 0 . . . . . . Number of backward publication citations .013 .0001 .013 .014 10416.453 1 .000
Number of authors .033 .0005 .032 .034 4613.407 1 .000 Time .125 .0015 .122 .128 7191.199 1 .000 Time² -.012 .0001 -.013 -.012 29450.994 1 .000 Journal dummies (n=104) Included
Sector analysis
Pub sector Pat sector N Mean Median Var SD COM COM 21 71.6 34.0 5,999.6 77.5 KGI COM 25 70.5 49.0 3,212.6 56.7 KGI+COM COM 15 106.7 80.0 18,605.8 136.4 KGI KGI 227 179.2 67.0 95,544.4 309.1 KGI+COM KGI 16 282.0 131.5 231,467.6 481.1 KGI KGI+COM 6 219.2 93.5 66,633.4 258.1 KGI+COM KGI+COM 5 85.0 67.0 3,546.5 59.6 315 164.4 66.0 84,846.9 291.3
Parameter B Std. Error z P>z [95% Conf. Interval]
(Intercept) 4.326 0.292 14.800 0.000 3.753 4.899 Document type:
Article Note 0.114 0.524 0.220 0.827 -0.913 1.141
Review 0.309 1.130 0.270 0.784 -1.905 2.523
Number of backward publication citations 0.046 0.008 5.990 0.000 0.031 0.061 Number of authors 0.141 0.019 7.350 0.000 0.103 0.179 Pat sector:
KGI 0.000 . . . . . COM -0.627 0.206 -3.050 0.002 -1.030 -0.223
KGI+COM -0.917 0.355 -2.590 0.010 -1.612 -0.222 Aff sector
KGI 0.000 . . . . . COM 0.051 0.314 0.160 0.870 -0.563 0.666
KGI+COM 0.176 0.214 0.820 0.413 -0.245 0.596 Time -0.301 0.122 -2.470 0.013 -0.539 -0.063 Time² 0.015 0.010 1.420 0.156 -0.006 0.035
Sector analysis
Sector analysis
THE REGENTS OF THE UNIVERSITY OF CALIFORNIA US 26THE JOHNS HOPKINS UNIVERSITY US 26THE SALK INSTITUTE FOR BIOLOGICAL STUDIES US 15BOARD OF REGENTS, THE UNIVERSITY OF TEXAS SYSTEM US 12THE SCRIPPS RESEARCH INSTITUTE US 10THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY US 9JOHNS HOPKINS UNIVERSITY US 9CITY OF HOPE US 8PRESIDENT AND FELLOWS OF HARVARD COLLEGE US 8WASHINGTON UNIVERSITY US 8INSTITUT PASTEUR FR 8THE ROCKEFELLER UNIVERSITY US 7THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF AGRICULTURE US 7THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH US 7UNIVERSITY OF UTAH RESEARCH FOUNDATION US 7OKLAHOMA MEDICAL RESEARCH FOUNDATION US 6MASSACHUSETTS INSTITUTE OF TECHNOLOGY US 6THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF US 6THE JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE US 6ST. JUDE CHILDREN'S RESEARCH HOSPITAL US 6
Conclusions science-technology interactions
• We do not observe lower citation rates for publications that are part of a patent application (nor before and after grant, nor matched by journal, nor matched by author)
• Significant impact of KGIs at the patent side
• We miss patent-publication pairs• Dig deeper into the sector dynamics• Citation patterns are only one aspect of the
diffusion of knowledge
Overview
In search of anti-commons: Academic patenting and patent-paper pairs in biotechnology. An analysis of citation flows.Tom Magerman, Bart Van Looy, Koenraad Debackere([email protected])INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)K.U.Leuven Managerial Economics, Strategy & InnovationECOOM (Centre for R&D Monitoring)
ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012