![Page 1: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/1.jpg)
ESSnet on Datawarehousing - the business register
Pieter Vlag – Statistics Netherlands
![Page 2: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/2.jpg)
2
Outline of the presentation
• DataWareHouse and importance population frame
• relationship population frame - business register
- (default) target population, statistical units
• other crucial datasources: “backbones”
- turnover + employment
• datalinking : the statistical unit base
• conflicting information between datasources
- when correcting in statistical DWH
- when correcting in backbones
- when feedback to business register
ESSnet DWH – business register
![Page 3: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/3.jpg)
3
Definition of a statistical Datawarehouse (according to the FPA)
ESSnet DWH – business register
The broad definition of a data warehouse to be used in this ESSnet is therefore:
‘A common conceptual model for managing all available data of interest, enabling the NSI to (re)use this data to create new data/new outputs, to produce the necessary information and perform reporting and analysis, regardless of the data’s source.’
![Page 4: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/4.jpg)
4
A DataWarehouse: the general idea
ESSnet DWH – business registerAs staging area is “core business” for NSIs, term statistical DWH is used for staging area + WareHouse
![Page 5: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/5.jpg)
5
The statistical DataWarehouse: architecture and layers
ESSnet DWH – business register
![Page 6: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/6.jpg)
6
The statistical DataWareHouse:processing steps the GSBPM model
ESSnet DWH – business register
process
input DWH / int. data
5.1a: link data5.1b: integrate datasee presentation Fursova
Calculate aggregates
![Page 7: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/7.jpg)
Titel van de presentatie 7datasource 1 datasource 2 datasource 3
Output 1 Output 2 Output 3
Linking
Processing (integration layer)
Integrated data
p.analysep.analyse
4
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
![Page 8: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/8.jpg)
8
A datawarehouse without population frame
ESSnet DWH – business register
Datasource I:Admin data
Datasource I:Survey 1
Datasource I:Survey 2
Datasource I:BIG DATA
• different sources cover different enterprises -> information about ?• timing of availability sources differs -> when complete desc. available ?
![Page 9: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/9.jpg)
9
A Datawarehouse with a population frame
ESSnet DWH – business register
Pop
ulat
ion.
Dat
asou
rce
1: a
dmin
dat
a 1
Dat
asou
rce
2: B
IG D
ATA
Dat
asou
rce
3: s
urve
y 1
Dat
asou
rce
4:
surv
ey 2
ADVANTAGE:the coverage of DWH is known (e.g. which enterprises are included in a DWH)
![Page 10: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/10.jpg)
10
Units and target population
The population should be known for the
• datawarehouse; e.g. “about which enterprises info”
• its preparation phase ; e.g. when linking data sources
Challenges are:
• units may differ between the data sources
- decision: which unit used for linking
• what is the reference population- decision: how is the default target population defined
ESSnet DWH – business register
![Page 11: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/11.jpg)
11
Proposals
• Only statistical unit (=enterprise) is used
- for data-linking
- in processing phase of the statistical - DWH
- justification: most obvious, ESSnet on Consistency, maintenance
• Default target population :
all enterprises with economic activity in reference period (e.g. year)
- justification: SBS-regulation
- widest definition of enterprises from which flexible outputs for subpopulations can be derived
- term default is used: as subpopulations do have a target population, tooESSnet DWH – business register
![Page 12: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/12.jpg)
Titel van de presentatie 124
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
Linked data
Integrated data
Processing on stat. unit + default target population only
flexible datasources with different populations and units
Weighting to flexible pop.
flexible output for different populations, and units
![Page 13: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/13.jpg)
13
Population frame and the Business Register
Determination of the default target population in SDWH in 2 steps:
• the population frame, i.e. a list of enterprises with a certain kind of activity during a period.
• confirmation which enterprises of the list really performed economic activities during a period
The business register provides information for the population frame.
Therefore, the statistical Business Register is an indirect datasource for the statistical-DWH
ESSnet DWH – business register
![Page 14: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/14.jpg)
14
Information needed from stat.business register
Recommended information for the population frame :• the frame reference year
• the statistical enterprises unit, including national ID and EGR ID
• the name and address of the enterprise
• the national identification number (ID) of the enterprise
• the date in population (mm/yr)
• the date out of population (mm/yr)
• the NACE-code
• the institutional sector code
• a size class
ESSnet DWH – business register
![Page 15: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/15.jpg)
15
Other backbones
ESSnet AdminData: VAT and social security admin • almost complete for quarter and annual
• can be used for high-quality estimates for turnover + employment respectively.
ESSnet DWH: VAT and social security data are crucial• to confirm the activity status of enterprises
implictly to determine the default target population
• to integrate data suitable for flexible outputs
measurement errors are reduced of sample survey (or data about subpopulation) if weighting to pop.numbers + VAT-turnover + employment
Proposal: to include these admin data as backbones in a stat-DWH
ESSnet DWH – business register
![Page 16: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/16.jpg)
16
Source layer
Int. + Analyses layer
Access layer
Integration layer
SBR
Pop-framedata 1 data 2
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
GSBPM 6: analyse / “DATAWAREHOUSE”
GSBPM 7-9: disseminate
Backbones in a statistical-DWH
Backbones are crucial for data-linking and data-integration;-> need to be checked/cleaned by source in the source layer
![Page 17: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/17.jpg)
17
Source layer
Int. + Analyses layerIntegration layer
SBR
Pop-framedata 1 data 2
VAT
empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
GSBPM 6: analyse
GSBPM 7-9: disseminate
Observed: admin data incorporated in BR
When choosing this option, - important part of linking process outside the S-DWH- unless S-DWH integral part of S-DWH (maintenance ?)
![Page 18: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/18.jpg)
18
Determining default target population
ESSnet DWH – business register
If statistical-DWH covers annual statistics only
• relatively straightforward
- derive population frame from business register at the end of reference year t
- determine active or non-active as soon as VAT and/or employment data become available
If STS included in statistical-DWH more complicated:
- updating necessary !
![Page 19: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/19.jpg)
19
Updating population
ESSnet DWH – business registernov aug
jan apr jul okt jan apr jul
FATS survey population year T
FATS frame population of active
units year T
Undercoverage
Overcoverage
In both frame populations
FATS frame population of active
units year T (revised)
Frame error procedure
novRelease 3
julRelease 4
decRelease 1
julRelease 2
![Page 20: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/20.jpg)
20SBR
Pop-frame
data 1 data 2
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
“DATAWAREHOUSE”
The largest enterprises
L.E.
output 1output 2
output 3
If a team within a NSI produces consistent microdata for largest enterprises-> consider this source as backbone
![Page 21: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/21.jpg)
21
Units: ideal situation
ESSnet DWH – business register
• enterprise has a unique ID
• enterprise group has a unique ID
enterprise and enterprise group
• correspond with statistical definitions
• are used in all data sources
In practice more complex situations do exist (especially when using more admin data)
![Page 22: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/22.jpg)
Titel van de presentatie 224
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
Linked data
Integrated data
processing on one unit + one population only
flexible datasources with different population and units
Flexible output for different populations, and units
Key question: how to manage these different in- and output units and their relationships to the statistical unit
![Page 23: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/23.jpg)
ESSnet DWH – business register 23
ENTERPRISE(=statistical unit)
ENTERPRISE GROUP
Legal unit
Legal unit
“Accountìng” unit
“Accountìng” unit
“VAT-unit”
other units
“other tax”units
enterprise
Enterprise
Local unit
LKAU
KAU
Enterprise group
INPUT IN S-DWHprocessing
OUTPUT
![Page 24: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/24.jpg)
24
The unit base
ESSnet DWH – business register
Some remarks:
• Complexity of unit base depends on
- scope of statistical-DWH
- national legislation (practices) with respect to enterprise units
• Unit base closely related to Business Register. Main motivation to place this base outside the Business registers
- more flexible in case of new in- and outputs
- more transparent in case of linking errors
![Page 25: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/25.jpg)
25SBR
Pop-frame
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
“DATAWAREHOUSE”
Position of Business Register in stat -DWH
L.E.
output 1output 2
output 3
survey
units
tax BIG DATA
other
![Page 26: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/26.jpg)
26
Feedback to Business Register
ESSnet DWH – business register
In case of conflicting information between datasources and conclusion is influential error in backbones (and indirectly SBR)
• When incorporating corrections in statistical DWH ?
• When incorporating corrections in backbones ?
• When incorporating corrections in SBR?
![Page 27: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/27.jpg)
27SBR
Pop-frame
survey otherunits
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.1-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
L.E.
Correction of information
In SDWH: corrections at 5.6
In backbones themselves: timing most important revisions
In SBR: after end of year (for consistency) – exception major impact
“DATAWAREHOUSE”
output 1output 2
output 3
![Page 28: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062308/56649eef5503460f94bff494/html5/thumbnails/28.jpg)
28
Conclusions
ESSnet DWH – business register
Requirements for statistical-DWH
• Population well defined
• Use of one unit in processing
Backbones desired for
• populations, VAT-turnover, admin data employment, large enterprises
Business Register is indirect input for statistical DWH
• population frame, unit base, survey
Timing of corrections errors (backbone information)
• in DWH: before weighting
• in backbone: when revising
• in Business Register: end of year