etl for the masses with power query and m

33
ETL for the Masses Régis Baccaro – IBM @regbac

Upload: regis-baccaro

Post on 24-May-2015

1.489 views

Category:

Technology


3 download

DESCRIPTION

This is the slide deck for my presentation given at SQL Saturday 280 in Vienna.on March 6th 2014.

TRANSCRIPT

Page 1: ETL for the masses with Power Query and M

ETL for the MassesRégis Baccaro – IBM

@regbac

Page 2: ETL for the masses with Power Query and M

Our Sponsors

Page 3: ETL for the masses with Power Query and M

Introduction

Régis Baccaro @regbachttp://Theblobfarm.wordpress.com

http://Thelovefarm.wordpress.com

[email protected]

• Founder and lead organizer of SQL Saturday Denmark

• PASS Regional Mentor

• Works for IBM

• Passionate about the community

• .Net developer, BI dude, SharePoint fellow and accidental DBA

Page 4: ETL for the masses with Power Query and M

Agenda

• Power Query and the M language

• E and T and L with Power Query

• Data refresh techniques with PQ

• Next step

Page 5: ETL for the masses with Power Query and M

Introduction

• Power Query• Get data experience

• Filter and combine

• Embedded M for repeatable mashup

• Power Query Formula Language (aka M)• Mostly pure

• Higher-order

• Dynamically typed

• Partially lazy

• Functional programming language

Page 6: ETL for the masses with Power Query and M

Elements of language

• Expressions – central construct• Evaluated to a single vlaue

• Values• Primitives

• List – ordered seq.

• Record – set of fields

• Table

• Function

Page 7: ETL for the masses with Power Query and M

Evaluation

• Excel-like (surprise !)

• Nested records• In Records

• In Lists

• Lazy evaluation• Lists and Records (and let)

• Eager evaluation• Everything else

Page 8: ETL for the masses with Power Query and M

Functions and Standard Library

• Mapping from a set of values to a single value• (named parameters) => function body

• Common set of definitions

Page 9: ETL for the masses with Power Query and M

Operators

• Meaning varies depending on kind of value

• & = text or list concatenation and records merge

Page 10: ETL for the masses with Power Query and M

Metadata

• Information about a value that is associated with a value

• A record

• Exists for every value

• Unobtrusive way to add information

• Accessed with Value.Metadata

Page 11: ETL for the masses with Power Query and M

Let .....in expression

• So far only literal values

• Let allows a set of value to be:• Computed

• Named

• Used in subsequent expressions that follows the in

let

Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),

RowCount = Table.RowCount(Source)

in

RowCount

Page 12: ETL for the masses with Power Query and M

IF expression

• Select between 2 expression based on logical condition

Page 13: ETL for the masses with Power Query and M

Error expression

• When an expression evaluation cannot yield a value

• Raised with error

• Handled with try

• Produces an Error record

• try...otherwise Used with default values

Page 14: ETL for the masses with Power Query and M

Keywords and Operators

• and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time

• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...

Page 15: ETL for the masses with Power Query and M

The ”E” - Why is Power Query great for Extracting data

• Multiple data sources

Hey wait ! Where is PDW ?

Page 16: ETL for the masses with Power Query and M

Query folding - A step toward declarative ETL approach

• Declarative vs Imperative

• Query folding similar to predicate pushdown

• Does Power Query have a Query Optimizer ?

• Demo

Query folding - the unofficial list:• SQL Databases• OData and OData based sources, such

as the Windows Azure Marketplace and SharePoint Lists

• Active Directory• HDFS.Files, Folder.Files, and

Folder.Contents (for basic operations on paths)

• Column removal• Renaming• Joins• Type conversions

Page 17: ETL for the masses with Power Query and M

Real life scenario – ETL for the masses

• Seen a lot of demos

• Build a lot of demos

• They are always so clean !

Page 18: ETL for the masses with Power Query and M

Real life scenario

Page 19: ETL for the masses with Power Query and M

Transform

• M is how the magic happens!

• Data manipulation• Records

• Lists

• Tables

• Merging

• Function calls

Page 20: ETL for the masses with Power Query and M

What about our scenario?

• Where should I get my data from?• Pure Excel

• Excel and MDS/DQS/SSIS/SQL

• Web, SQL, XML, ?

• Let me show you ! Input

• (cvr web)

Page 21: ETL for the masses with Power Query and M

Let’s go to homegrown data?

• Bad web service

• Bad HTML structure

• Let’s go with local data that we can control• SQL Server

• Excel

• Let’s Query!

Isolated DB

Local storage

Page 22: ETL for the masses with Power Query and M

Clean up before you merge!

• DQSKnowledge base with CVR

+ Cleansing project with LinkedIn input

________________________________________

= Demo2.1_AndreasStrandbyClean

• Hit ratio increased...

+

=

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0

50

100

150

200

250

Cleanjoin

Nestedjoin

Merge

Hit Total

Page 23: ETL for the masses with Power Query and M

Smarter Power Query

• Expression.Evaluate()

• Examples• Load query text from file

• Load function from file

• Passing parameters (as constants)

• Demo

Page 24: ETL for the masses with Power Query and M

Refreshing Power Query data

• Different solutions

• All with flaws !

Page 25: ETL for the masses with Power Query and M

Refreshing Power Query data – with VB6 !

• Back from 2006

• From Kim GreenLee

Plus Minus

Can be scheduled VB6 – are you kidding ?

More robust than the non-technical solution

Page 26: ETL for the masses with Power Query and M

Refreshing Power Query data – with PowerShell

Plus Minus

Robust Hard to troubleshoot

Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run

Page 27: ETL for the masses with Power Query and M

Refreshing Power Query data – The non-technical way

• Let me show you !

Plus Minus

Very easy Not very corporate !

The spreadsheet needs to be open

Excel file not saved

Locked out when it refreshes

Page 28: ETL for the masses with Power Query and M

Refreshing Power Query data – The non-technical way part 2

• Let me show you !

Plus Minus

Very easy Not very corporate !

Uses technique from previous The spreadsheet needs to be open

Page 29: ETL for the masses with Power Query and M

Refreshing Power Query data – with SSIS

Plus Minus

Robust Requires a SQL Server (wait, it’s a plus!)

Needs a SSIS / C# developer

Page 30: ETL for the masses with Power Query and M

Refreshing Power Query data – with SSIS

• Using DQS for cleansing input

• Let me show you !

Page 31: ETL for the masses with Power Query and M

How is Power query going to be used?

• Data store accumulating interesting data points

• Hook into read only data for reporting purposes or data marts

• One file to accumulate (Produce)

• Multiple files or programs to report (Consume)

• I don’t believe in “Data Steward”

• I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).

Page 32: ETL for the masses with Power Query and M

Conclusion

• A step toward declarative ETL approach

• Still much work to do !

We have• A declarative data integration language• Only surfaced in Power Query• Can push data to an Excel spreadsheetImagine.....• Connection to heterogenous data sources

Page 33: ETL for the masses with Power Query and M

THANK YOU!@REGBACHTTP://[email protected]