Analyses of the Effect of Patent Category Diversity on Patent Quality
Wenping Wang1; Alan Porter2,3; Ismael Rafols4; Nils Newman5; Yun Liu1
1. School of Management and Economics, Beijing Institute of Technology, Beijing, China
2. Technology Policy and Assessment Center, Georgia Institute of Technology, Atlanta, USA
3. Search Technology, Inc., Atlanta, USA4. SPRU -Science and Technology Policy Research, University of Sussex,
Brighton, UK5. IISC, Inc. , Atlanta, USA
Research Objectives• Aim: To gauge the effect of patent category
diversity (PCD) on patent quality
• We address three research questions: How to measure PCD? How to measure patent quality? Does high PCD lead to higher patent quality?
• 2 cases studied: “Measuring chemical, physical properties” “Optical measurement”
• Try the counterpart of our ‘Integration indicator’(Porter et al., 2007) on patents (Rao-Stirling diversity)
• Measure how integrative particular patents are based on the patents they cite
• A patent will have a higher PCD if there is greater heterogeneity among its cited patent categories.
3
Patent Category Diversity(PCD)
Patent Category Diversity
• Patent Category Diversity: Diversity of cited patents comprising different
categories (e.g., NBER technology classes or International Patent Classes – IPCs)
• Characteristics of Diversity: Variety: Number of distinctive categories Balance: Evenness of the distribution Disparity: Degree to which the categories
are different
Selected Measures of PCD
• (based on Rao, 1982; Stirling, 1998, 2007; Rafols and Meyer, 2010)
Notation
Proportion of cited patents in category :
Distance between categories and :Similarity between categories and :
Indices
Number of cited categories (Variety)
Simpson diversity measuring a combination of Variety and Balance
Rao-Stirling diversity incorporating Variety, Balance, and Disparity
A case to compute PCDPatent Category Patents Cited by the Focal Patent Focal Patent
A
B
C
Ref1
Ref2 Patent
Ref3Ref4
Indices of patent category diversity:
Where (here;) is the cosine measure of similarity between patent category and .
6
Selected Measures of Patent Quality
• Times Cited: most typical indicator of patent quality
• Patent H-index: at least h forward citing patents, each of which are not cited less than h times.
Data source• Database: 2006 edition of the NBER patent database
Advantage: detailed patent classification and multiple generations of patent citations.
▪ Limitations:− ONLY incorporating the citation relations among the
Utility patents granted in USPTO in 1976-2006. − ONLY having basic information of the patent
• Sample from two categories:▪ IPC4=G01N -- Measuring chemical, physical
properties (MCPP)▪ IPC4=G02B -- Optical measurement (OM)
• Timespan: Grant Year from 1996 to 2006• Country: First Assignee’s Country
8
• Patent category: 4-digit International Patent Category(IPC4) Main IPC4 is adopted as the unique IPC4 of each
patent. A finer classification will lead to higher diversity
measures.• Threshold: Number of cited categories>2• Patent category similarity matrix made with
Square root of the Cosine Similarity between IPC4s (constructed by Rafols, 2011)
Patent Categories
9
Regression VariablesUnit of analysis Individual patentTimespan 1996-2004Variables Dependent Variable
Times Cited
Explanatory Variable
# of cited categoriesSimpson diversityRao-Stirling diversity
Control Variable
Grant YearFirst Assignee's countryNumber of Patent References
10
Temporal Change of PCD• Number of Cited IPC4s for both MCPP and OM is
increasing modestly over time.
0
1
2
3
4
5
6
7
8
4.69
6.66 MCPP
0
1
2
3
4
5
6
7
8
4.50
6.25 OM
11
Temporal Change of PCD• The Simpson diversity of MCPP seems be increasing from
0.62 to 0.66 in small steps, whereas that of OM has no significant change.
12
0.50
0.55
0.60
0.65
0.70
0.62
0.66 MCPP
0.50
0.55
0.60
0.65
0.70
0.60 0.62
OM
Temporal Change of PCD• Annual Rao-Stirling diversity Range: 0.55 – 0.59• Difficult to conclude the trend of Rao-Stirling diversity for
MCPP and OM
13
1996
1998
2000
2002
2004
2006
0.50
0.52
0.54
0.56
0.58
0.60
0.56
0.58
0.55
MCPP
1996
1998
2000
2002
2004
2006
0.50
0.52
0.54
0.56
0.58
0.60
0.57
0.59 0.58
OM
MCPP vs. OM
Times Cited N. Cited IPC4 Simpson Rao-Stirling
MCPP L H H L*
OM H L L H*
Note: H: higher; L: lower; * Rao-Stiring diversity for OM is higher than that for MCPP in 6 of 9 years.
• Patent citations and diversity measures vary on different technology fields.
• Even though the patents in MCPP receive fewer citations than those in OM, the cited patents for MCPP comprise more distinctive categories, higher Simpson diversity & slightly lower Rao-Stirling diversity. 14
Initial Estimation of the Effect of PCD on Patent citations
• Scatter diagram: Times Cited vs. Simpson diversity Looking like
cloud A bell with its top
leaning to the right
15
Times CitedTimes Cited of a given patent is the count data (with many zeros); the frequency follows the power law
Data Source: USA-assigned patents in the field of Optical measurement granted in 1996, NBER Patent Database
16
Regression Model
• Why do we choose Zero-inflated Negative Binomial regression model?
• Ordinary Least Square Regression? Count data are highly non-normal.
• Zero-inflated Poisson Regression? Times Cited is too dispersed -- i.e., variance is
much larger than its mean.• Ordinary Count Models?
Too many zeros.
17
Results of ZINB
Gyear N obs.
Correlation Coefficient-lnL
Chi-Squared Test
Intercept N. PatRef N. CitedIPC4 Simpson Log(theta) Chisq P(>Chisq)
1996 320Coef. 2.440183 0.013166 0.021603 0.260093 0.008014
1252 9.945 0.01904 **Sig. <2e-16*** 0.157 0.678 0.656 0.918
1998 456Coef. 2.4416882 -0.0008684 0.0760176 -0.1545466 -0.0643583
1730 49.47 1.036e-10 ***Sig. <2e-16*** 0.86784 0.00767*** 0.72938 0.32802
2000 488Coef. 2.601868 -0.001104 0.090167 -1.080047 -0.108602
1692 52.358 2.513e-11 ***
Sig. <2e-16*** 0.7392 0.00021*** 0.01424** 0.10182
2002 807Coef. 2.02632 0.00406 0.03936 -0.97084 -0.17598
2281 55.438 5.536e-12 ***
Sig. <2e-16*** 0.24604 0.05789* 0.00182*** 0.00219***
2004 1012Coef. 0.803044 -0.009134 0.107365 -1.456021 -0.476171
1682 56.14 3.922e-12 ***Sig. 2.29e-
05*** 1.11e-05*** 4.86e-10*** 5.99e-05*** 2.05e-10***
Table: Results of the ZINB models on Times Cited for OMDependent Variable: Times CitedFirst Assignee’s Country: USA
Note: (1) *** sig. 0.01, ** sig. 0.05, * sig. 0.1 (2) N. PatRef: Number of patent references; N. CitedIPC4: Number of cited categories; lnL: Log likelihood (3) The ZINB regression is run by R software for statistical computing and graphics (downloaded at www.r-project.org/). 18
Results of ZINB
Gyear N obs.
Correlation Coefficient-lnL
Chi-Squared Test
Intercept N. PatRef Stirling Log(theta) Chisq P(>Chisq)
1996 511Coef. 2.05916 0.01475 0.46837 -0.21063
1825 12.803 0.001659 **
Sig. <2e-16*** 0.004725*** 0.202704 0.000637*
**
1998 773Coef. 1.862895 0.016005 0.314082 -0.2492
2583 34.004
4.132e-08***Sig. <2e-16*** 5.79e-07*** 0.261 1.46e-
06***
2000 737Coef. 2.143242 0.017708 -0.5194 -0.17906
2358 68.209
1.544e-15 ***Sig. <2e-16*** 1.05e-10*** 0.08212* 0.00141***
2002 909Coef. 2.309078 0.008992 -2.13953 -0.17336
2128 123.37
< 2.2e-16 ***Sig. <2e-16*** 7.34e-10*** 2.01e-15*** 0.00518***
2004 1000Coef. 0.52944 0.01098 -1.43317 -0.56239
1317 50.847
9.094e-12 ***Sig. 0.0212** 4.72e-07*** 8.08e-05*** 1.31e-
09***
Table: Results of the ZINB models on Times Cited for MCPPDependent Variable: Times CitedFirst Assignee’s Country: USA
Note: (1) *** sig. 0.01, ** sig. 0.05, * sig. 0.1 (2) N. PatRef: Number of patent references; Stirling: Rao-Stirling diversity; lnL: Log likelihood (3) The ZINB regression is run by R software for statistical computing and graphics (downloaded at www.r-project.org/). 19
Effect of PCD on TC• # of cited categories has modest positive effect on TC.• Simpson diversity has slightly negative correlation with
TC.• The effect of Rao-Stirling diversity on TC depends upon
the categories.
20
Patent Indicators of
PCD
Correlation PCD vs. Times Cited
Category Positive Negative Significant Relation Sig. Chisq
MCPPN. Cited IPC4 9 0 8 +
9Simpson 2 7 5 -Rao-Stirling 3 6 5 -
OMN. Cited IPC4 9 0 7 +
9Simpson 0 9 6 -Rao-Stirling 7 2 4 +
Discussion(1)• Different measures of diversity lead to
different influence on citations. The diversity of different technology fields shows
slight differences Both in MCPP and OM, number of cited
categories (Variety) slightly favors patent quality; while Simpson diversity (incorporating both Variety and Balance) has a modest negative effect.
Rao-Stirling diversity (comprising Variety, Balance and Disparity) shows opposite influence on TC for MCPP and OM
21
Discussion(2)• The effect of PCD on patent quality depends upon
the categories. The correlations for "Electric battery"(IPC4=H01M) ,
"Electrography“ (IPC4=G03G) and "Medical preparations, toiletries"(IPC4=A61K) are not so significant as that in MCPP and OM.
• The analysis results vary in different patent category systems. A finer classification leads to higher diversity
measures. No significant effect of PCD on citations if NBER
technology category system(Hall et al. 2001), a coarser system, is selected as the patent category.
22
Limitations and further research
• Limitations: Patent category diversity is seen on the basis of
problematic predefined categories (IPC4). Patent citations only include the citation relation among
the Utility patents granted in USPTO. The patents that are not granted yet or granted in other patent office are not in this consideration.
Due to the limitation of NBER patent database, we only currently select Times Cited and Patent H-index as the indices of patent quality.
• Further research: A more appropriate patent category system Case study in another patent database(e.g. EPO)
23
• Chen, C., & Hicks, D. (2004). Tracing knowledge diffusion. Scientometrics, 59(2),199-211.
• Hall, B. H., Jaffe, A. B., & Trajtenberg, M. (2001). The nber patent citation data file: Lessons, insights and methodological tools. NBER Working Papers 8498, http://www.nber.org/papers/w8498.
• Bessen J. (2009). Matching Patent Data to Compustat Firms. NBER PDP Project User Documentation: http://www.nber.org/~jbessen/matchdoc.pdf Accessed 09-01-2010.
• Porter, A. L., Cohen, A. S., Roessner, J. D., & Perreault, M. (2007). Measuring researcher interdisciplinarity. Scientometrics, 72(1), 117–147.
• Rafols, A., & Meyer, M.(2010). Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics, 82:263-287.
• Rao, C. R. (1982). Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology, 21, 24–43.
• Stirling, A. (1998). On the economics and analysis of diversity. SPRU Electronic Working Paper. http://www.sussex.ac.uk/Units/spru/publications/imprint/sewps/ sewp28/sewp28.pdf Accessed 10-20-2011.
• Stirling, A. (2007). A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface, 4(15), 707–719.
• Yegros, A., Amat, C. B., D'Este, P., Porter, A. L., & Rafols, I. (2011). Does interdisciplinary research lead to higher scientific impact?. Atlanta Conference on Science and Innovation Policy, 2011. http://www.idr.gatech.edu/doc/Yegros-Final.pdf Accessed 10-10-2011.
References
Thank you!