data integration with server side mashups
DESCRIPTION
The open source SnapLogic data integration framework. Overview, examples, screenshots.TRANSCRIPT
Data Integration with Server Side Mashups
Juergen BrendelPrincipal Software Engineer
OSDC 2007, Brisbane
Slide 2
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Agenda
• The SnapLogic project• Client-side mashups• Problems and solutions• Data integration with SnapLogic
Slide 3
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
The SnapLogic project
• Founded 2005, data integration background• Vision:
– Reusable data integration resources– REST– Web-based GUI– Programmatic interface– Open Source
• Python... Why not?• www.snaplogic.com
Slide 4
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
What's a mashup?
• A 'Web 2.0 kind of thing'• Combine, aggregate, visualise
– Multiple sources– Multiple dimensions
• Typically on the client side– Browser– Ajax
Slide 5
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Self-made mashups
• Hand coded• Mashup editors
– GUI mashup-logic editor– Wiki-style– Hosted
Slide 6
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Benefits for the enterprise?
Yeah, right...
Enable knowledgeEnable knowledgeworkers !!!workers !!! Situat
ionalSituat
ional
applicatio
ns !
applicatio
ns !
Avoid theAvoid theIT bottleneck !!
IT bottleneck !!
Slide 7
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Problems with client-side mashups
• Skill• Internal data often not web-friendly• Maintenance• Security• Performance
Slide 8
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Solution: Server-side mashups
• Flexible access• Security• Performance
Slide 9
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
SnapLogic data integration philosophy
• Clearly defined, REST resources• Data reuse and integration• Pipelines• Framework for resource specific scripting• Open source and community
Slide 10
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Example: Resources
SnapLogic Server
ComponentHTTP
Resource Definition
Databases
Files
Applications
Atom / RSS
HTTP://server1.example.com/customer_list
Client HTTP Request and Response
• Resource Name• HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials• Parameters
JSON
Slide 11
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Example: Pipelines
SnapLogic Server
Component HTTP
Resource Definition
HTTP://server1.example.com/processed_customer_list
Client HTTP Request and Response
Component
Resource Definition
Component
Resource Definition
Read Geocode Sort
Databases
Files
Applications
Atom / RSS
JSON
Slide 12
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
A simple pipeline: Filtering leads
Slide 13
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Linking fields in a pipeline
Slide 14
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 15
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 16
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 17
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Adding new components
• For access logic• For data transformations• Independent of data format• Currently written in Python
Slide 18
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
A simple processing component
1: class IncreaseSalary(DataComponent):2: 3: def init(self):4: '''Called when the component is started.'''5: self.increase = float(self.moduleProperties['percent_increase'])6: 7: def processRecord(self, record):8: '''Called for every record.'''9: record.fields['salary'] *= (1 + self.increase/100)10: self.writeRecord(record)
Slide 19
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
An Apache log file reader1: class LogReader(DataComponent):2: 3: def startReading(self):4: '''Called when component does not have input stream.'''5: logfile = open(self._filename, 'rbU')6: format = self.moduleProperties['log_format']7: 8: if format == 'COMMON':9: p = apachelog.parser(apachelog.formats['common'])10: elif ...11: 12: # Read all lines in the logfile13: for line in logile:14: out_rec = Record(self.getSingleOutputView())15: raw_rec = p.parse(line)16: out_rec.fields['remote_host'] = raw_rec['%h']17: out_rec.fields['client_id'] = raw_rec['%l']18: out_rec.fields['user'] = raw_rec['%u']19: out_rec.fields['server_status'] = int(raw_rec['%>s'])20: out_rec.fields['bytes'] = int(raw_rec['%b'])21: ...22: 23: self.writeRecord(out_rec)
Slide 20
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Programmatic access
• GUI is nice, but still limiting• SnapScript: An API library• Python, PHP, more to come
Slide 21
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Creating a resource
1: # Create a new resource2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'4: staff_res_def.props.description = 'Read the from the employee file'5: staff_res_def.props.title = 'Staff'6: staff_res_def.props.delimiter = '$?{DELIMITER}'7: staff_res_def.props.filename = '$?{INPUTFILE}'8: staff_res_def.props.parameters = (9: ('INPUTFILE', Param.Required, ''),10: ('DELIMITER', Param.Optional, ',')11: )12: 13: # Define the output view of the resource14: staff_res_def.props.outputview.output1 = (15: ('Last_Name', 'string', 'Employee last name'),16: ('First_Name', 'string', 'Employee first Name'),17: ('Salary', 'number', 'Annual income')18: )
Slide 22
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Creating a pipeline
1: # Create a new pipeline2: p = Pipeline()3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'4: p.props.title = 'Employee_Salary_Increase'5: 6: # Select the resources in the pipeline7: p.resources.Staff = staff_res_def.instance()8: p.resources.PayRaise = increase_salary_res_def.instance()9: 10: # Link the resources in the pipeline11: link = (12: ('Last_Name', 'last'),13: ('First_Name', 'first'),14: ('Salary', 'salary')15: )16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)
Slide 23
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Pipeline parameters
1: # Define the user-visible parameters of the pipeline2: p.props.parameters = (3: ('INCREASE', Param.Required, ''),4: )5: 6: # Map values to the parameters of the pipeline's resources7: p.props.parammap = (8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')10: )11: 12: # Confirm correctness and publish as a new resource13: p.check()14: p.saveToServer(connection)
Slide 24
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
The end
Any questions?