r in the statistical office: the case of unido...i unido is a custodian agency for six indicators in...
TRANSCRIPT
R in the statistical office: The case of UNIDO
V. Todorov1
1United Nations Industrial Development Organization, Vienna
New Techniques and Technologies for Statistics 2017
Brussels, Belgium 14-16 March, 2017
Todorov (UNIDO) R in UNIDO NTTS’2017 1 / 47
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 2 / 47
About UNIDO, UNIDO Statistics and R
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 3 / 47
About UNIDO, UNIDO Statistics and R
About UNIDO
• UNIDO was set up in 1966
• Became a specialized agency of the UN in 1985
• Promote industrialization throughout the developing world
• 168 Member States (as of January 2017)
• Headquarters in Vienna
• Represented in 35 developing countries
Todorov (UNIDO) R in UNIDO NTTS’2017 4 / 47
About UNIDO, UNIDO Statistics and R
About UNIDO Statistics
• Service Module ”Industrial Governance and Statistics”:
I monitor, benchmark and analyse the industrial performance and
capabilitiesI formulate, implement and monitor strategies, policies and
programmes to improve the contribution of industry to
productivity growth and the achievement of the Sustainable
Development Goals (SDG)I UNIDO is a custodian agency for six indicators in Goal 9.
• Building capabilities in industrial statistics - providing technicalassistance to:
I Introduce best practice methodologies and software systemsI Enhance the quality and consistency of the industrial statistics
databases
Todorov (UNIDO) R in UNIDO NTTS’2017 5 / 47
R for Data Exchange
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 6 / 47
R as a graphical engine: package yearbook
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 7 / 47
Imputation of Key Indicators: package unidoCIP2
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 8 / 47
Imputation of Key Indicators: package unidoCIP2
Manufacturing and industrial statistics
• Industrial development is a driver of structural change which is
key in the process of economic development.
• Industrial statistics allow to identify and rank the key production
sectors, major economic zones in the country, major size classes• Specialized and structural statistics on industry (as well as on
other economic sectors) are demanded more than ever byresearchers and analysts to assess implications of the process ofthe globalization to individual countries:
I Synthesized data on world development trends.I Internationally comparable data to assess the growth and
structure of one region in the world vis-a-vis others.I A complete set of data on their field of interest to avoid
measurement discrepancies.I Regular data production to update/correct policy measures.
Todorov (UNIDO) R in UNIDO NTTS’2017 9 / 47
Imputation of Key Indicators: package unidoCIP2
Structural statistics for industry: UNIDO databases
UNIDO databases
• Cover the manufacturing sector
• Refer to economic statistics, mainly production and trade
related, not technological or environmental data
• Include statistical data from the annual observation within the
quality assurance framework (no experimental or one-time study
data)
• Official data supplied by NSOs (abided by the resolution of UN
Statistics Commission)
• Further details:
http://www.unido.org/index.php?id=1002103
• Follow the UNIDO Quality Framework (Upadhyaya and Todorov,
2008, 2012)Todorov (UNIDO) R in UNIDO NTTS’2017 10 / 47
Imputation of Key Indicators: package unidoCIP2
UNIDO databases: summary
• INDSTAT DB
• by ISIC and by country
• Number of establishments
• Number of employees
• Number of female
employees
• Wages and salaries
• Gross output
• Value added
• Gross fixed capital
formation
• Index numbers of
industrial production
• MVA DB
• by country
• GDP at current prices
• GDP at constant
prices
• MVA at current prices
• MVA at constant
prices
• Population
• IDSB
• by ISIC and by country
• Output = Y
• Import= M
• Export = X
• Apparent consumption
= C
C = Y + M − X
Todorov (UNIDO) R in UNIDO NTTS’2017 11 / 47
Imputation of Key Indicators: package unidoCIP2
UNIDO Statistics online portal
http://stat.unido.org/
Todorov (UNIDO) R in UNIDO NTTS’2017 12 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation in international statistics
Survey data (micro)
• Multiple variables observed for a sample of observation units
from a population at one point in time
• Gaps in the data are classified as:
I Item non-responseI Unit non-responseI Variables not included in the survey
Time series data (macro)
• Contain data for multiple time periods
• Contain data for aggregate (or macro) units (sections)
• Sections are usually countries
• Variables are usually statistical indicators (like GDP, MVA, etc.)Todorov (UNIDO) R in UNIDO NTTS’2017 13 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Cross-sectional time series data
• Four different types of time series data structures (Denk andWeber, 2011):
1. Single univariate time series
2. Single multivariate time series
3. Cross-sectional univariate time series
4. Cross-sectional multivariate time series
• Missingness patterns The relevance and applicability of missingdata techniques depends on:
1. missing items;
2. missing periods,
3. missing variables, and
4. missing sections (countries).
Todorov (UNIDO) R in UNIDO NTTS’2017 14 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Description of the data set
Variables of interest
1. GO - Gross output
2. VA - Value added
3. WS - Wages and salaries
4. EMP - Number of employees
Auxiliary variables
1. IIP - Index of Industrial Production
2. MVA - Manufacturing Value Added (from SNA)
3. IMVA - Index of MVA
4. CPI - Consumer price index
Todorov (UNIDO) R in UNIDO NTTS’2017 15 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Description of the data set
The following variables will not be considered:
• GFCF - Gross fixed capital formation—the economic relation to
GO and VA is too weak
• EST - Number of establishments—too heterogeneous due to
difference in definitions
Todorov (UNIDO) R in UNIDO NTTS’2017 16 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Analysis of the missingness
Package VIM
• VIM—”Visualization and Imputation of Missing Values“
• An R package (Temple et al., 2010)
• Tools for visualization of missing values, useful for exploring the
data and the structure of the missing values
• May help to identify the mechanism generating the missings
What to analyze
• Time series evolution of missingness
• The multivariate dependence in the missingness across the
variables
Todorov (UNIDO) R in UNIDO NTTS’2017 17 / 47
Imputation of Key Indicators: package unidoCIP2
INDSTAT: Time series evolution of missingness (main)
0.0
0.2
0.4
0.6
0.8
1.0
Employment
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
0.0
0.2
0.4
0.6
0.8
1.0
Wages and Salaries
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
0.0
0.2
0.4
0.6
0.8
1.0
Gross Output
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
0.0
0.2
0.4
0.6
0.8
1.0
Value Added
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
Todorov (UNIDO) R in UNIDO NTTS’2017 18 / 47
Imputation of Key Indicators: package unidoCIP2
INDSTAT: Time series evolution of missingness (auxiliary)
0.0
0.2
0.4
0.6
0.8
1.0
IIP
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
0.0
0.2
0.4
0.6
0.8
1.0
CPI
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
0.0
0.2
0.4
0.6
0.8
1.0
IMVA
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
mis
sing
Todorov (UNIDO) R in UNIDO NTTS’2017 19 / 47
Imputation of Key Indicators: package unidoCIP2
INDSTAT: Multivariate dependence of missingness across variablesP
ropo
rtio
n of
mis
sing
s
0.00
0.05
0.10
0.15
0.20
0.25
Out
put
IIP CP
I
Com
bina
tions
Out
put
IIP CP
I
1352
301
258
207
97
89
89
67
Pro
port
ion
of m
issi
ngs
0.00
0.05
0.10
0.15
0.20
0.25
Out
put
IMV
A
CP
I
Com
bina
tions
Out
put
IMV
A
CP
I
1592
507
163
118
38
23
18
1
Todorov (UNIDO) R in UNIDO NTTS’2017 20 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Deterministic approach based on economic
relations
• Impute the four variables of interest using economic relationships
between the variables.
• Start with estimation of the missing observations for Gross
output based on available production indexes or Value added.
• Estimate Value added, Wages and salaries and Employment on
the basis of past trends in the relationships between output and
these three variables.
• At total manufacturing level.
Todorov (UNIDO) R in UNIDO NTTS’2017 21 / 47
Imputation of Key Indicators: package unidoCIP2
Deterministic approach: algorithm
STEP 1
Imputation of GO using IIP and CPI:
• EGOt = GOt−1(1 + IIPt:0CPIt:0−IIPt−1:0CPIt−1:0
IIPt−1:0CPIt−1:0)
STEP 2
Imputation of GO using VA and lagged ratio GO/VA
• EGOt = VAtGOt−1
VAt−1)
STEP 3
Imputation of GO using IMVA and CPI
• EGOt = GOt−1(1 + IMVAt:0CPIt:0−IMVAt−1:0CPIt−1:0
IMVAt−1:0CPIt−1:0)
Todorov (UNIDO) R in UNIDO NTTS’2017 22 / 47
Imputation of Key Indicators: package unidoCIP2
B. Imputation INDSTAT: Deterministic approach: algorithm II
STEP 4
Imputation of VA using GO and lagged ratio VA/GO
• EVAt = GOtVAt−1
GOt−1)
STEP 5
Imputation of WS using VA and lagged ratio WS/VA
• EWSt = VAtWSt−1
VAt−1)
STEP 6
Imputation of EMP using real VA and lagged ratio EMP/real VA
• EEMPt = VAt/CPItEMPt−1
VAt−1/CPIt−1
Todorov (UNIDO) R in UNIDO NTTS’2017 23 / 47
Imputation of Key Indicators: package unidoCIP2
B. Imputation INDSTAT: Deterministic approach: algorithm III
STEP 7
Imputation at industry level: will be based on the observed share of the industry
in the manufacturing total. There are three ways to compute these shares:
• Historical average share. This method is based on the average share
observed over the full history of the series and does not take into account
time-variation in the industrial structure of the country. It is also sensitive
to outliers.
• Historical median share. The share is estimated by taking the median of the
whole history of the series. It is less sensitive to outliers than the average,
but also does not take into account time-variation in the industrial structure
of the country.
• Lagged share. This method takes the (imputed) share of the preceding year.
It takes the time-varying structure of the economy into account, but is a
less efficient estimate since it is based on only one observation and sensitive
to outliers in that one observation.Todorov (UNIDO) R in UNIDO NTTS’2017 24 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Deterministic approach: Example 1: Egypt
Imputation of all missing values using IIP and CPI
Todorov (UNIDO) R in UNIDO NTTS’2017 25 / 47
Imputation of Key Indicators: package unidoCIP2
Imputation INDSTAT: Deterministic approach: Example 2:
imputation by industry
Todorov (UNIDO) R in UNIDO NTTS’2017 26 / 47
REST APIs
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 27 / 47
REST APIs
Accessing international statistical databases with R
• Economics studies (e.g. competitiveness analysis or
benchmarking) - necessary to access different sources of data.
• Many international organizations maintain statistical databaseswhich cover certain types of data:
I COMTRDAE, UNCTAD and WTO for international trade data,I World development indicators (WDI) from the World bank,I World Economic Outlook (WEO) and International Financial
statistics (IFS) from the International Monetary Fund (IMF),I Industrial statistics databases (INDSTAT) by UNIDO and many
more.
• Some of these organizations already provide application program
interface (API) for accessing the data.
• How to use these APIs in R?
Todorov (UNIDO) R in UNIDO NTTS’2017 28 / 47
REST APIs
World Development Indicators (WDI)
• A comprehensive collection of cross-country comparable
development indicators
• Compiled from officially-recognized international sources.
• Contains more than 1300 time series for more than 200
economies, for more than 50 years.
• The R package WDI makes it easy to search and download data
from the WDI.
• The package is available from CRAN.
Todorov (UNIDO) R in UNIDO NTTS’2017 29 / 47
REST APIs
UNIDO REST API
• dbList()—Returns the list of all available data sets (currently 9)
• dbInfo(db)—Returns the info about the content of a data set: countries,
variables, years, ISIC
• dbData(db, ...)—Retrieves data from data set ’db’.
• Example:
> for(db in dblist) ## print the names of all data sets
+ print(dbInfo(db=db)$dbname)
[1] "INDSTAT 2 2016, ISIC Revision 3"
[1] "INDSTAT 4 2016, ISIC Revision 3"
[1] "INDSTAT 4 2016, ISIC Revision 4"
[1] "IDSB 2016, ISIC Revision 3"
[1] "IDSB 2016, ISIC Revision 4"
[1] "MINSTAT 2016 ISIC Revision 3"
[1] "MINSTAT 2016 ISIC Revision 4"
[1] "MVA 2016"
[1] "CIP 2016"
Todorov (UNIDO) R in UNIDO NTTS’2017 30 / 47
REST APIs
UNIDO REST API
• Retrieve data:
dbData(db=dblist[1], ct=100, variable=20, from=2000,
to=2006, isic=15)
country variable isic isicComb year value
1 100 20 15 NULL 2000 233600844
2 100 20 15 NULL 2001 232982867
3 100 20 15 NULL 2002 236882397
4 100 20 15 NULL 2003 350320309
5 100 20 15 NULL 2004 452031922
6 100 20 15 NULL 2005 547604073
7 100 20 15 NULL 2006 642608400
Todorov (UNIDO) R in UNIDO NTTS’2017 31 / 47
IO Analysis, WIOD and the package rwiot
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 32 / 47
Industrial statistics for business structure: package indstat
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 33 / 47
Competitive Industrial Performance (CIP) index: package CItools
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 34 / 47
Competitive Industrial Performance (CIP) index: package CItools
Competitive Industrial Performance (CIP) index
• The Competitive Industrial Performance (CIP) Index developed by UNIDO
aims at benchmarking industrial performance at the country level.
• In contrast to other competitiveness indices currently available, the CIP
index provides a unique crosscountry industrial performance benchmarking
and ranking based on quantitative indicators and a selected number of
industrial performance indicators.
• Rankings are provided at the global and regional levels, as well as by
adopting different country groupings for 144 countries in 2016.
• This offers governments the possibility to compare their country’s
competitive industrial performance with relevant comparators, that is, not
only with countries from the same region but also with countries at the
same stage of economic or industrial development across the globe.
• More at stat.unido.org
Todorov (UNIDO) R in UNIDO NTTS’2017 35 / 47
Competitive Industrial Performance (CIP) index: package CItools
Competitive Industrial Performance (CIP) index (2)
• The CIP index combines 3 dimensions (comprising 8 indicators)of industrial performance into a single measure:
1. Capacity to produce and export manufactures (2)
2. Structural change towards manufactures and technology
intensive sectors (4)
3. Impact in world MVA and in world manufactures (2)
• Only quantitative indicators are considered.
Todorov (UNIDO) R in UNIDO NTTS’2017 36 / 47
Competitive Industrial Performance (CIP) index: package CItools
Competitive Industrial Performance (CIP) index (2)
Todorov (UNIDO) R in UNIDO NTTS’2017 37 / 47
Competitive Industrial Performance (CIP) index: package CItools
CIP Ranking
Todorov (UNIDO) R in UNIDO NTTS’2017 38 / 47
Competitive Industrial Performance (CIP) index: package CItools
CIP profile I
Todorov (UNIDO) R in UNIDO NTTS’2017 39 / 47
Competitive Industrial Performance (CIP) index: package CItools
CIP profile II
Todorov (UNIDO) R in UNIDO NTTS’2017 40 / 47
Maintenance of UNIDO databases with R
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 41 / 47
Maintenance of UNIDO databases with R
Data screening
Todorov (UNIDO) R in UNIDO NTTS’2017 42 / 47
Maintenance of UNIDO databases with R
Data screening
Todorov (UNIDO) R in UNIDO NTTS’2017 43 / 47
Maintenance of UNIDO databases with R
Data screening
●
●
●
●
●●
● ● ●
●
2004 2006 2008 2010 2012
2628
3032
Time t
Indi
cato
r va
lue
x t
start value: xS*
relevant change: xS* ± δ
significant change: xS* ± (δ + 2 sx
2 + sxS*
2 )
●
●
●
●
●●
● ● ●
●●
●
●
●
●●
● ● ●
●
●
●
●
●
●●
●● ●
●
xS*
Todorov (UNIDO) R in UNIDO NTTS’2017 44 / 47
Technical assistance
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 45 / 47
Summary and conclusions
Outline
1 About UNIDO, UNIDO Statistics and R
2 R for Data Exchange
3 R as a graphical engine: package yearbook
4 Imputation of Key Indicators: package unidoCIP2
5 REST APIs
6 IO Analysis, WIOD and the package rwiot
7 Industrial statistics for business structure: package indstat
8 Competitive Industrial Performance (CIP) index: package CItools
9 Maintenance of UNIDO databases with R
10 Technical assistance
11 Summary and conclusions
Todorov (UNIDO) R in UNIDO NTTS’2017 46 / 47
Summary and conclusions
Challenges
• The awareness of importance of computation in official statistics
• Staff - limited resources
• Rapid release cycle of R
• Package dependence
• Regular support
• Training and training materials
• IT infrastructure and support
Todorov (UNIDO) R in UNIDO NTTS’2017 47 / 47