data integration with server side mashups

Post on 19-May-2015

2.068 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

The open source SnapLogic data integration framework. Overview, examples, screenshots.

TRANSCRIPT

Data Integration with Server Side Mashups

Juergen BrendelPrincipal Software Engineer

OSDC 2007, Brisbane

Slide 2

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Agenda

• The SnapLogic project• Client-side mashups• Problems and solutions• Data integration with SnapLogic

Slide 3

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The SnapLogic project

• Founded 2005, data integration background• Vision:

– Reusable data integration resources– REST– Web-based GUI– Programmatic interface– Open Source

• Python... Why not?• www.snaplogic.com

Slide 4

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

What's a mashup?

• A 'Web 2.0 kind of thing'• Combine, aggregate, visualise

– Multiple sources– Multiple dimensions

• Typically on the client side– Browser– Ajax

Slide 5

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Self-made mashups

• Hand coded• Mashup editors

– GUI mashup-logic editor– Wiki-style– Hosted

Slide 6

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Benefits for the enterprise?

Yeah, right...

Enable knowledgeEnable knowledgeworkers !!!workers !!! Situat

ionalSituat

ional

applicatio

ns !

applicatio

ns !

Avoid theAvoid theIT bottleneck !!

IT bottleneck !!

Slide 7

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Problems with client-side mashups

• Skill• Internal data often not web-friendly• Maintenance• Security• Performance

Slide 8

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Solution: Server-side mashups

• Flexible access• Security• Performance

Slide 9

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

SnapLogic data integration philosophy

• Clearly defined, REST resources• Data reuse and integration• Pipelines• Framework for resource specific scripting• Open source and community

Slide 10

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Resources

SnapLogic Server

ComponentHTTP

Resource Definition

Databases

Files

Applications

Atom / RSS

HTTP://server1.example.com/customer_list

Client HTTP Request and Response

• Resource Name• HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials• Parameters

JSON

Slide 11

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Pipelines

SnapLogic Server

Component HTTP

Resource Definition

HTTP://server1.example.com/processed_customer_list

Client HTTP Request and Response

Component

Resource Definition

Component

Resource Definition

Read Geocode Sort

Databases

Files

Applications

Atom / RSS

JSON

Slide 12

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple pipeline: Filtering leads

Slide 13

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Linking fields in a pipeline

Slide 14

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Slide 15

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Slide 16

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Slide 17

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Adding new components

• For access logic• For data transformations• Independent of data format• Currently written in Python

Slide 18

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple processing component

1: class IncreaseSalary(DataComponent):2: 3: def init(self):4: '''Called when the component is started.'''5: self.increase = float(self.moduleProperties['percent_increase'])6: 7: def processRecord(self, record):8: '''Called for every record.'''9: record.fields['salary'] *= (1 + self.increase/100)10: self.writeRecord(record)

Slide 19

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

An Apache log file reader1: class LogReader(DataComponent):2: 3: def startReading(self):4: '''Called when component does not have input stream.'''5: logfile = open(self._filename, 'rbU')6: format = self.moduleProperties['log_format']7: 8: if format == 'COMMON':9: p = apachelog.parser(apachelog.formats['common'])10: elif ...11: 12: # Read all lines in the logfile13: for line in logile:14: out_rec = Record(self.getSingleOutputView())15: raw_rec = p.parse(line)16: out_rec.fields['remote_host'] = raw_rec['%h']17: out_rec.fields['client_id'] = raw_rec['%l']18: out_rec.fields['user'] = raw_rec['%u']19: out_rec.fields['server_status'] = int(raw_rec['%>s'])20: out_rec.fields['bytes'] = int(raw_rec['%b'])21: ...22: 23: self.writeRecord(out_rec)

Slide 20

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Programmatic access

• GUI is nice, but still limiting• SnapScript: An API library• Python, PHP, more to come

Slide 21

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a resource

1: # Create a new resource2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'4: staff_res_def.props.description = 'Read the from the employee file'5: staff_res_def.props.title = 'Staff'6: staff_res_def.props.delimiter = '$?{DELIMITER}'7: staff_res_def.props.filename = '$?{INPUTFILE}'8: staff_res_def.props.parameters = (9: ('INPUTFILE', Param.Required, ''),10: ('DELIMITER', Param.Optional, ',')11: )12: 13: # Define the output view of the resource14: staff_res_def.props.outputview.output1 = (15: ('Last_Name', 'string', 'Employee last name'),16: ('First_Name', 'string', 'Employee first Name'),17: ('Salary', 'number', 'Annual income')18: )

Slide 22

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a pipeline

1: # Create a new pipeline2: p = Pipeline()3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'4: p.props.title = 'Employee_Salary_Increase'5: 6: # Select the resources in the pipeline7: p.resources.Staff = staff_res_def.instance()8: p.resources.PayRaise = increase_salary_res_def.instance()9: 10: # Link the resources in the pipeline11: link = (12: ('Last_Name', 'last'),13: ('First_Name', 'first'),14: ('Salary', 'salary')15: )16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)

Slide 23

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Pipeline parameters

1: # Define the user-visible parameters of the pipeline2: p.props.parameters = (3: ('INCREASE', Param.Required, ''),4: )5: 6: # Map values to the parameters of the pipeline's resources7: p.props.parammap = (8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')10: )11: 12: # Confirm correctness and publish as a new resource13: p.check()14: p.saveToServer(connection)

Slide 24

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The end

Any questions?

jbrendel@snaplogic.org

top related