5 June 2013 SDMX Technical Working GroupLuxembourg
15 June 2013 SDMX Technical Working GroupLuxembourg
1
WP Item 6
The Expressions Language of Banca d’Italia (EXL)
5 June 2013 SDMX Technical Working GroupLuxembourg
25 June 2013 SDMX Technical Working GroupLuxembourg
2
History
Mid nineties: Banca d’Italia designed a Language for validations and calculations
2009: A new version of the EXL was released as part of the new Infostat software platform, containing the operators needed for validation and basic calculation
On-going: progressive upgrade of the EXL for supporting the data compilation, for example:– Operators for time series manipulation
– Operators for data analysis
– Operators’ syntax upgrade
5 June 2013 SDMX Technical Working GroupLuxembourg
35 June 2013 SDMX Technical Working GroupLuxembourg
3
Basic example of validation rule
Check () special operator for checks
- subtract the multidimensional data
<= comparison operator
EXPRESSIONS:
C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C5 = check ( C3 – C4 <= given_threshold )
Two collected data:C1: Loans - Date, Entity, Sector, AmountC2: Loans - Date, Entity, Geo_Area, Amount
Check rule: C1 and C2 should be equal if aggregated on their common dimensions (for less than a small amount)
EXL operators:
5 June 2013 SDMX Technical Working GroupLuxembourg
45 June 2013 SDMX Technical Working GroupLuxembourg
4
Example of sum
Get () read the specified data
Keep () keep the specified dimensions
Sum () sum the specified measure (if quantitative)
+ sum the multidimensional data
EXPRESSIONS:
C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C5 = C3 + C4
Two collected data:C1 (Current Accounts): Date, Entity, Sector, AmountC2 (Mortgages): Date, Entity, Geo_Area, Amount
The desired result is Loans (= Current Accounts + Mortgages):
C5 (Loans): Date, Entity, Amount
EXL operators:
5 June 2013 SDMX Technical Working GroupLuxembourg
55 June 2013 SDMX Technical Working GroupLuxembourg
5
5 June 2013 SDMX Technical Working GroupLuxembourg
65 June 2013 SDMX Technical Working GroupLuxembourg
6
5 June 2013 SDMX Technical Working GroupLuxembourg
75 June 2013 SDMX Technical Working GroupLuxembourg
7
5 June 2013 SDMX Technical Working GroupLuxembourg
85 June 2013 SDMX Technical Working GroupLuxembourg
8
5 June 2013 SDMX Technical Working GroupLuxembourg
95 June 2013 SDMX Technical Working GroupLuxembourg
9
5 June 2013 SDMX Technical Working GroupLuxembourg
105 June 2013 SDMX Technical Working GroupLuxembourg
10
5 June 2013 SDMX Technical Working GroupLuxembourg
115 June 2013 SDMX Technical Working GroupLuxembourg
11
5 June 2013 SDMX Technical Working GroupLuxembourg
125 June 2013 SDMX Technical Working GroupLuxembourg
12
5 June 2013 SDMX Technical Working GroupLuxembourg
135 June 2013 SDMX Technical Working GroupLuxembourg
13
5 June 2013 SDMX Technical Working GroupLuxembourg
145 June 2013 SDMX Technical Working GroupLuxembourg
14
5 June 2013 SDMX Technical Working GroupLuxembourg
155 June 2013 SDMX Technical Working GroupLuxembourg
15
5 June 2013 SDMX Technical Working GroupLuxembourg
165 June 2013 SDMX Technical Working GroupLuxembourg
16
Validation
Formal (Structural)– assurance that the formal structure of the data observations
matches the Data Structure Definition, in term of concepts, their roles and their admissible values; the formal validation is not treated as a calculation and is not defined through an expression;
Of the Information Content (Plausibility)– Assurance that the data content gives right information about the
real world (as much as possible); to this end, it is possible to use the a-priori information about the real world and the possible redundancies of the data (e.g. the integrity rules, coherence rules, plausibility rules); this kind of validation rules is normally performed through calculations,
5 June 2013 SDMX Technical Working GroupLuxembourg
175 June 2013 SDMX Technical Working GroupLuxembourg
1717
Validations as calculations
• Use of the same language of the calculations
• Validations possible in any phase of the process
• Results of the Validations like any other data
• are defined and stored • can be inquired and disseminated• can be further processed
5 June 2013 SDMX Technical Working GroupLuxembourg
185 June 2013 SDMX Technical Working GroupLuxembourg
18
SDMX Compliance
• The SDMX 2.0 and 2.1 versions already envisaged the introduction of a standard language for validations and calculations
• The SDMX 2.1 package n. 13 (Transformations and Expressions) is a generic model aimed to track the validation and the calculation of data, derived from the CWM (Common Warehouse Metamodel), a OMG standard (Object Management Group)
• However this model is not operational in-itself, because it requires a language to specify the validation and calculation expressions
• The EXL is designed according with the SDMX package n. 13 – Transformations and Expressions
5 June 2013 SDMX Technical Working GroupLuxembourg
195 June 2013 SDMX Technical Working GroupLuxembourg
19
SDMX IM – Package 13
5 June 2013 SDMX Technical Working GroupLuxembourg
205 June 2013 SDMX Technical Working GroupLuxembourg
20
Transformations; internal view
Einstein equation E = MC2 E = M*(C**2)
Operator:
**
b
pOperand:
2
Operand:
C
Result: EOperator:
*
f
fOperand:
M
Constant node
Reference nodes
Operator nodes
Expression nodes
0..*
5 June 2013 SDMX Technical Working GroupLuxembourg
215 June 2013 SDMX Technical Working GroupLuxembourg
21
Transformations: User view
Einstein equation E = MC2 E = M*(C**2)
Operand:
2
Operand:
C
Result: EExpression:
E = M*(C**2)
Operand:
M
5 June 2013 SDMX Technical Working GroupLuxembourg
225 June 2013 SDMX Technical Working GroupLuxembourg
22
Notes on Transformations
The Operands may be:• Artefacts of the model (e.g. Statistical Data)
• Constants
• Operator nodes
The property of “Closure”• The result is an artefact of the model (e.g.
Statistical Data)
• The result may be operand of other calculations
5 June 2013 SDMX Technical Working GroupLuxembourg
235 June 2013 SDMX Technical Working GroupLuxembourg
23
Graph of the calculations
External Institutions
C1
C2
C3C4
C5
T1
T3
T2
C10
C12C13
C15
C17
C16T13
T12
T14
Banks & OFI’s reports
C.C.R.
C21
C22
C23
C24T22
T21
C51
C52
T53
T52
T51
Economic research models
C54
C53
T54
C60C61
Statistical bulletin
T60T61
Statistical products
C70T71
T70T72C71
C72
C41
T42
T41
C42
Supervision models
5 June 2013 SDMX Technical Working GroupLuxembourg
245 June 2013 SDMX Technical Working GroupLuxembourg
24
Software Tools
• Dictionary, that is a data base containing all the definitions• Warehouse, that is the complex of data archives containing the
data, logically unique but also physically heterogeneous and distributed
• Tool for the administration of the metadata (create, modify, etc.), including the expressions for calculations and controls (this package is built in-house)
• Tool for validation of the expressions syntax and consistency and for translation of the expressions in the language of the calculation tools (based on the open source ANTLR under the control of a software built in-house)
• Execution of the expressions, that is the calculation engine of the software platform, based on a software layer developed in-house that interfaces and controls the calculation software, which in turn can be various: currently it is used the open-source R, the SQL, some software built in-house and optimized for specific purposes.
5 June 2013 SDMX Technical Working GroupLuxembourg
255 June 2013 SDMX Technical Working GroupLuxembourg
25
Allowed Data
The EXL is applied to any kind of data of interest in the Bank of Italy
statistical environment, like
– Dimensional data, including as particular cases
• time series
• cross sections
– Questionnaires
– Registers
the Bank of Italy is gradual extending the use of EXL to the whole statistical information system to support its industrial processing
5 June 2013 SDMX Technical Working GroupLuxembourg
262626
End to End processing
1. Data are integrated with information relevant to the collected security codes
2. Missing observations are estimated
3. data are aggregated
Use case: production of the information for the ECB concerning the balance sheet of the monetary and financial institutions sector
CollectCollect
data on securities from MFI on a security by security basis.
securities register data
Dataflow to ECB
Structural and integrity checks
DisseminateProcessCollectBuildDesign
checkcheck
CollectCollect
processprocessprocessprocessprocessprocess
disseminatedisseminate
checkcheck
5 June 2013 SDMX Technical Working GroupLuxembourg
275 June 2013 SDMX Technical Working GroupLuxembourg
27
Some other characteristics
Formal – expressed in Backus-Naur formDeals with historicity:
– Takes into account the time validity of the artefacts – Allows defining changes of the algorithm with reference to the
timeMay deal with
– Mono and multi-measure data – Data attributes having a definable behaviour– Operands having different dimensionality– Subsets of dimensional cubes– Implicit / explicit zeros
Allows – Persistent and non persistent results – Expressions as operands of other expressions – Invocation of external routines
5 June 2013 SDMX Technical Working GroupLuxembourg
28
Operators used in the validation (1)
Data retrieval / storage (Get, Put)
Projection (drop, keep …)
Filter (=, <, <=, >, >=, <>, like, between …)
Aggregation (sum, avg, min, max, first, last …)
Other manipulators of the data structure (rename, calc)
Join (merge)
Algebraic and string manipulation ( +, -, *, /)
Comparison (=, <, <=, >, >=, <>)
5 June 2013 SDMX Technical Working GroupLuxembourg
29
Operations used for validation (2)
Logical (and, or, not)
Tailored for Validation:– Check of a generic condition– Existence and referential integrity check– Completeness check– Imbalance– Error severity level
Conditional execution (case)
Currency conversion
Date-time (year, month, day, time shift)