data integration with server side mashups

24
Data Integration with Server Side Mashups Juergen Brendel Principal Software Engineer OSDC 2007, Brisbane

Upload: jbrendel

Post on 19-May-2015

2.068 views

Category:

Technology


1 download

DESCRIPTION

The open source SnapLogic data integration framework. Overview, examples, screenshots.

TRANSCRIPT

Page 1: Data Integration with server side Mashups

Data Integration with Server Side Mashups

Juergen BrendelPrincipal Software Engineer

OSDC 2007, Brisbane

Page 2: Data Integration with server side Mashups

Slide 2

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Agenda

• The SnapLogic project• Client-side mashups• Problems and solutions• Data integration with SnapLogic

Page 3: Data Integration with server side Mashups

Slide 3

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The SnapLogic project

• Founded 2005, data integration background• Vision:

– Reusable data integration resources– REST– Web-based GUI– Programmatic interface– Open Source

• Python... Why not?• www.snaplogic.com

Page 4: Data Integration with server side Mashups

Slide 4

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

What's a mashup?

• A 'Web 2.0 kind of thing'• Combine, aggregate, visualise

– Multiple sources– Multiple dimensions

• Typically on the client side– Browser– Ajax

Page 5: Data Integration with server side Mashups

Slide 5

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Self-made mashups

• Hand coded• Mashup editors

– GUI mashup-logic editor– Wiki-style– Hosted

Page 6: Data Integration with server side Mashups

Slide 6

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Benefits for the enterprise?

Yeah, right...

Enable knowledgeEnable knowledgeworkers !!!workers !!! Situat

ionalSituat

ional

applicatio

ns !

applicatio

ns !

Avoid theAvoid theIT bottleneck !!

IT bottleneck !!

Page 7: Data Integration with server side Mashups

Slide 7

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Problems with client-side mashups

• Skill• Internal data often not web-friendly• Maintenance• Security• Performance

Page 8: Data Integration with server side Mashups

Slide 8

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Solution: Server-side mashups

• Flexible access• Security• Performance

Page 9: Data Integration with server side Mashups

Slide 9

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

SnapLogic data integration philosophy

• Clearly defined, REST resources• Data reuse and integration• Pipelines• Framework for resource specific scripting• Open source and community

Page 10: Data Integration with server side Mashups

Slide 10

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Resources

SnapLogic Server

ComponentHTTP

Resource Definition

Databases

Files

Applications

Atom / RSS

HTTP://server1.example.com/customer_list

Client HTTP Request and Response

• Resource Name• HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials• Parameters

JSON

Page 11: Data Integration with server side Mashups

Slide 11

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Pipelines

SnapLogic Server

Component HTTP

Resource Definition

HTTP://server1.example.com/processed_customer_list

Client HTTP Request and Response

Component

Resource Definition

Component

Resource Definition

Read Geocode Sort

Databases

Files

Applications

Atom / RSS

JSON

Page 12: Data Integration with server side Mashups

Slide 12

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple pipeline: Filtering leads

Page 13: Data Integration with server side Mashups

Slide 13

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Linking fields in a pipeline

Page 14: Data Integration with server side Mashups

Slide 14

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 15: Data Integration with server side Mashups

Slide 15

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 16: Data Integration with server side Mashups

Slide 16

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 17: Data Integration with server side Mashups

Slide 17

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Adding new components

• For access logic• For data transformations• Independent of data format• Currently written in Python

Page 18: Data Integration with server side Mashups

Slide 18

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple processing component

1: class IncreaseSalary(DataComponent):2: 3: def init(self):4: '''Called when the component is started.'''5: self.increase = float(self.moduleProperties['percent_increase'])6: 7: def processRecord(self, record):8: '''Called for every record.'''9: record.fields['salary'] *= (1 + self.increase/100)10: self.writeRecord(record)

Page 19: Data Integration with server side Mashups

Slide 19

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

An Apache log file reader1: class LogReader(DataComponent):2: 3: def startReading(self):4: '''Called when component does not have input stream.'''5: logfile = open(self._filename, 'rbU')6: format = self.moduleProperties['log_format']7: 8: if format == 'COMMON':9: p = apachelog.parser(apachelog.formats['common'])10: elif ...11: 12: # Read all lines in the logfile13: for line in logile:14: out_rec = Record(self.getSingleOutputView())15: raw_rec = p.parse(line)16: out_rec.fields['remote_host'] = raw_rec['%h']17: out_rec.fields['client_id'] = raw_rec['%l']18: out_rec.fields['user'] = raw_rec['%u']19: out_rec.fields['server_status'] = int(raw_rec['%>s'])20: out_rec.fields['bytes'] = int(raw_rec['%b'])21: ...22: 23: self.writeRecord(out_rec)

Page 20: Data Integration with server side Mashups

Slide 20

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Programmatic access

• GUI is nice, but still limiting• SnapScript: An API library• Python, PHP, more to come

Page 21: Data Integration with server side Mashups

Slide 21

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a resource

1: # Create a new resource2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'4: staff_res_def.props.description = 'Read the from the employee file'5: staff_res_def.props.title = 'Staff'6: staff_res_def.props.delimiter = '$?{DELIMITER}'7: staff_res_def.props.filename = '$?{INPUTFILE}'8: staff_res_def.props.parameters = (9: ('INPUTFILE', Param.Required, ''),10: ('DELIMITER', Param.Optional, ',')11: )12: 13: # Define the output view of the resource14: staff_res_def.props.outputview.output1 = (15: ('Last_Name', 'string', 'Employee last name'),16: ('First_Name', 'string', 'Employee first Name'),17: ('Salary', 'number', 'Annual income')18: )

Page 22: Data Integration with server side Mashups

Slide 22

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a pipeline

1: # Create a new pipeline2: p = Pipeline()3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'4: p.props.title = 'Employee_Salary_Increase'5: 6: # Select the resources in the pipeline7: p.resources.Staff = staff_res_def.instance()8: p.resources.PayRaise = increase_salary_res_def.instance()9: 10: # Link the resources in the pipeline11: link = (12: ('Last_Name', 'last'),13: ('First_Name', 'first'),14: ('Salary', 'salary')15: )16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)

Page 23: Data Integration with server side Mashups

Slide 23

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Pipeline parameters

1: # Define the user-visible parameters of the pipeline2: p.props.parameters = (3: ('INCREASE', Param.Required, ''),4: )5: 6: # Map values to the parameters of the pipeline's resources7: p.props.parammap = (8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')10: )11: 12: # Confirm correctness and publish as a new resource13: p.check()14: p.saveToServer(connection)

Page 24: Data Integration with server side Mashups

Slide 24

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The end

Any questions?

[email protected]