5 june 2013 sdmx technical working group luxembourg 1 5 june 2013 sdmx technical working group...

30
5 June 2013 SDMX Technical Working Group Luxembourg 1 5 June 2013 SDMX Technical Working Group Luxembourg 1 WP Item 6 The Expressions Language of Banca d’Italia (EXL)

Upload: leo-sanders

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

5 June 2013 SDMX Technical Working GroupLuxembourg

15 June 2013 SDMX Technical Working GroupLuxembourg

1

WP Item 6

The Expressions Language of Banca d’Italia (EXL)

5 June 2013 SDMX Technical Working GroupLuxembourg

25 June 2013 SDMX Technical Working GroupLuxembourg

2

History

Mid nineties: Banca d’Italia designed a Language for validations and calculations

2009: A new version of the EXL was released as part of the new Infostat software platform, containing the operators needed for validation and basic calculation

On-going: progressive upgrade of the EXL for supporting the data compilation, for example:– Operators for time series manipulation

– Operators for data analysis

– Operators’ syntax upgrade

5 June 2013 SDMX Technical Working GroupLuxembourg

35 June 2013 SDMX Technical Working GroupLuxembourg

3

Basic example of validation rule

Check () special operator for checks

- subtract the multidimensional data

<= comparison operator

EXPRESSIONS:

C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))

C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))

C5 = check ( C3 – C4 <= given_threshold )

Two collected data:C1: Loans - Date, Entity, Sector, AmountC2: Loans - Date, Entity, Geo_Area, Amount

Check rule: C1 and C2 should be equal if aggregated on their common dimensions (for less than a small amount)

EXL operators:

5 June 2013 SDMX Technical Working GroupLuxembourg

45 June 2013 SDMX Technical Working GroupLuxembourg

4

Example of sum

Get () read the specified data

Keep () keep the specified dimensions

Sum () sum the specified measure (if quantitative)

+ sum the multidimensional data

EXPRESSIONS:

C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))

C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))

C5 = C3 + C4

Two collected data:C1 (Current Accounts): Date, Entity, Sector, AmountC2 (Mortgages): Date, Entity, Geo_Area, Amount

The desired result is Loans (= Current Accounts + Mortgages):

C5 (Loans): Date, Entity, Amount

EXL operators:

5 June 2013 SDMX Technical Working GroupLuxembourg

55 June 2013 SDMX Technical Working GroupLuxembourg

5

5 June 2013 SDMX Technical Working GroupLuxembourg

65 June 2013 SDMX Technical Working GroupLuxembourg

6

5 June 2013 SDMX Technical Working GroupLuxembourg

75 June 2013 SDMX Technical Working GroupLuxembourg

7

5 June 2013 SDMX Technical Working GroupLuxembourg

85 June 2013 SDMX Technical Working GroupLuxembourg

8

5 June 2013 SDMX Technical Working GroupLuxembourg

95 June 2013 SDMX Technical Working GroupLuxembourg

9

5 June 2013 SDMX Technical Working GroupLuxembourg

105 June 2013 SDMX Technical Working GroupLuxembourg

10

5 June 2013 SDMX Technical Working GroupLuxembourg

115 June 2013 SDMX Technical Working GroupLuxembourg

11

5 June 2013 SDMX Technical Working GroupLuxembourg

125 June 2013 SDMX Technical Working GroupLuxembourg

12

5 June 2013 SDMX Technical Working GroupLuxembourg

135 June 2013 SDMX Technical Working GroupLuxembourg

13

5 June 2013 SDMX Technical Working GroupLuxembourg

145 June 2013 SDMX Technical Working GroupLuxembourg

14

5 June 2013 SDMX Technical Working GroupLuxembourg

155 June 2013 SDMX Technical Working GroupLuxembourg

15

5 June 2013 SDMX Technical Working GroupLuxembourg

165 June 2013 SDMX Technical Working GroupLuxembourg

16

Validation

Formal (Structural)– assurance that the formal structure of the data observations

matches the Data Structure Definition, in term of concepts, their roles and their admissible values; the formal validation is not treated as a calculation and is not defined through an expression;

Of the Information Content (Plausibility)– Assurance that the data content gives right information about the

real world (as much as possible); to this end, it is possible to use the a-priori information about the real world and the possible redundancies of the data (e.g. the integrity rules, coherence rules, plausibility rules); this kind of validation rules is normally performed through calculations,

5 June 2013 SDMX Technical Working GroupLuxembourg

175 June 2013 SDMX Technical Working GroupLuxembourg

1717

Validations as calculations

• Use of the same language of the calculations

• Validations possible in any phase of the process

• Results of the Validations like any other data

• are defined and stored • can be inquired and disseminated• can be further processed

5 June 2013 SDMX Technical Working GroupLuxembourg

185 June 2013 SDMX Technical Working GroupLuxembourg

18

SDMX Compliance

• The SDMX 2.0 and 2.1 versions already envisaged the introduction of a standard language for validations and calculations

• The SDMX 2.1 package n. 13 (Transformations and Expressions) is a generic model aimed to track the validation and the calculation of data, derived from the CWM (Common Warehouse Metamodel), a OMG standard (Object Management Group)

• However this model is not operational in-itself, because it requires a language to specify the validation and calculation expressions

• The EXL is designed according with the SDMX package n. 13 – Transformations and Expressions

5 June 2013 SDMX Technical Working GroupLuxembourg

195 June 2013 SDMX Technical Working GroupLuxembourg

19

SDMX IM – Package 13

5 June 2013 SDMX Technical Working GroupLuxembourg

205 June 2013 SDMX Technical Working GroupLuxembourg

20

Transformations; internal view

Einstein equation E = MC2 E = M*(C**2)

Operator:

**

b

pOperand:

2

Operand:

C

Result: EOperator:

*

f

fOperand:

M

Constant node

Reference nodes

Operator nodes

Expression nodes

0..*

5 June 2013 SDMX Technical Working GroupLuxembourg

215 June 2013 SDMX Technical Working GroupLuxembourg

21

Transformations: User view

Einstein equation E = MC2 E = M*(C**2)

Operand:

2

Operand:

C

Result: EExpression:

E = M*(C**2)

Operand:

M

5 June 2013 SDMX Technical Working GroupLuxembourg

225 June 2013 SDMX Technical Working GroupLuxembourg

22

Notes on Transformations

The Operands may be:• Artefacts of the model (e.g. Statistical Data)

• Constants

• Operator nodes

The property of “Closure”• The result is an artefact of the model (e.g.

Statistical Data)

• The result may be operand of other calculations

5 June 2013 SDMX Technical Working GroupLuxembourg

235 June 2013 SDMX Technical Working GroupLuxembourg

23

Graph of the calculations

External Institutions

C1

C2

C3C4

C5

T1

T3

T2

C10

C12C13

C15

C17

C16T13

T12

T14

Banks & OFI’s reports

C.C.R.

C21

C22

C23

C24T22

T21

C51

C52

T53

T52

T51

Economic research models

C54

C53

T54

C60C61

Statistical bulletin

T60T61

Statistical products

C70T71

T70T72C71

C72

C41

T42

T41

C42

Supervision models

5 June 2013 SDMX Technical Working GroupLuxembourg

245 June 2013 SDMX Technical Working GroupLuxembourg

24

Software Tools

• Dictionary, that is a data base containing all the definitions• Warehouse, that is the complex of data archives containing the

data, logically unique but also physically heterogeneous and distributed

• Tool for the administration of the metadata (create, modify, etc.), including the expressions for calculations and controls (this package is built in-house)

• Tool for validation of the expressions syntax and consistency and for translation of the expressions in the language of the calculation tools (based on the open source ANTLR under the control of a software built in-house)

• Execution of the expressions, that is the calculation engine of the software platform, based on a software layer developed in-house that interfaces and controls the calculation software, which in turn can be various: currently it is used the open-source R, the SQL, some software built in-house and optimized for specific purposes.

5 June 2013 SDMX Technical Working GroupLuxembourg

255 June 2013 SDMX Technical Working GroupLuxembourg

25

Allowed Data

The EXL is applied to any kind of data of interest in the Bank of Italy

statistical environment, like

– Dimensional data, including as particular cases

• time series

• cross sections

– Questionnaires

– Registers

the Bank of Italy is gradual extending the use of EXL to the whole statistical information system to support its industrial processing

5 June 2013 SDMX Technical Working GroupLuxembourg

262626

End to End processing

1. Data are integrated with information relevant to the collected security codes

2. Missing observations are estimated

3. data are aggregated

Use case: production of the information for the ECB concerning the balance sheet of the monetary and financial institutions sector

CollectCollect

data on securities from MFI on a security by security basis.

securities register data

Dataflow to ECB

Structural and integrity checks

DisseminateProcessCollectBuildDesign

checkcheck

CollectCollect

processprocessprocessprocessprocessprocess

disseminatedisseminate

checkcheck

5 June 2013 SDMX Technical Working GroupLuxembourg

275 June 2013 SDMX Technical Working GroupLuxembourg

27

Some other characteristics

Formal – expressed in Backus-Naur formDeals with historicity:

– Takes into account the time validity of the artefacts – Allows defining changes of the algorithm with reference to the

timeMay deal with

– Mono and multi-measure data – Data attributes having a definable behaviour– Operands having different dimensionality– Subsets of dimensional cubes– Implicit / explicit zeros

Allows – Persistent and non persistent results – Expressions as operands of other expressions – Invocation of external routines

5 June 2013 SDMX Technical Working GroupLuxembourg

28

Operators used in the validation (1)

Data retrieval / storage (Get, Put)

Projection (drop, keep …)

Filter (=, <, <=, >, >=, <>, like, between …)

Aggregation (sum, avg, min, max, first, last …)

Other manipulators of the data structure (rename, calc)

Join (merge)

Algebraic and string manipulation ( +, -, *, /)

Comparison (=, <, <=, >, >=, <>)

5 June 2013 SDMX Technical Working GroupLuxembourg

29

Operations used for validation (2)

Logical (and, or, not)

Tailored for Validation:– Check of a generic condition– Existence and referential integrity check– Completeness check– Imbalance– Error severity level

Conditional execution (case)

Currency conversion

Date-time (year, month, day, time shift)

305 June 2013 SDMX Technical Working GroupLuxembourg

305 June 2013 SDMX Technical Working GroupLuxembourg

WP Item 6

Expressions and Calculations