first indico workshop conversion server thomas baron 29-27 may 2013 cern

17
First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Upload: edwina-doyle

Post on 24-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

First Indico Workshop

Conversion ServerThomas Baron

29-27 May 2013 CERN

Page 2: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Service DescriptionArchitectureConversion AlternativesFuture Directions

Page 3: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Service description

Page 4: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

GoalProvide a PDF version of all textual documents uploaded to Indico

Long-term preservationMulti-platform reading

Converted formats: .ppt, .pptx, .doc, .docx, .sxi, .odp

Page 5: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

InterfaceOn user request only

Page 6: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Service parametersAsynchronousAbout 30 seconds in average

At CERN: An average of 165 conversions per day

16%

74%

2%6%2%

number of con-versions .ppt

.pptx

.doc

Page 7: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

architecture

Page 8: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

General overview

Conversion serverIndico

PDFs

files to convert

HTT

P AP

I

Page 9: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Integration to indicoCurrently all entangled to Indico’s coreConfiguration in indico.conf

conversion server URL: FileConverter[‘conversion_server’]

callback URL: FileConverter[‘response_url’]

Conversion handled by the Makac.export.fileConverter class

convert function : sends the filestoreConvertedFile function: gets the converted file back

Conversion serverIndico

HTT

P AP

I

Makac.export.fileConverter

convert

storeConvertedFile

Indico.conf

PDFs

files to convert

Page 10: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Conversion server sideA dedicated server running non-indico code and softwareWeb server: IIS (previously Apache)Listener script: getSegFile.py ; python; receives the file, saves it locally, creates the conversion task (a text file)Conversion Daemon: Engine.py ; python script in scheduled tasks; parses the conversion task files, and for each of them launch the conversion, wait for its completion and send the file back to the callback URL (to Indico)Conversion software: performs the conversion Conversion

serverIndico

HTT

P AP

I

Makac.export.fileConverter

convert

storeConvertedFile

Configuration.py

Conv

ersi

on s

oftw

are

getSegFile.py

Engine.py

www

task master

PDFs

files to convert

Page 11: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Conversion alternatives

Page 12: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Fully home madeWas the case at CERN until 2009

Using direct OLE-automation of Microsoft Office applicationspython scriptsExample

Page 13: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Using commercial productsExample at CERN: Neevia Document Converter ProOLE automation Pros:

More reliableBetter error

managementRegular updatesMore extensible

More formatsHot foldersCan be used for other

services

Features: • supports 300 file types• Com, hot folder, email

interfaces• Watermark, stamping etc.• Convert to PDF,

PostScript, TIFF (including Class F), BMP, PNG, PCX, JPEG

Page 14: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

NEEVIA doc Converter ProSimplified automation code in python

NDocConverter = win32com.client.Dispatch("docConverter.docConverterClass") NDocConverter.DocumentOutputFormat = "PDF" NDocConverter.DocumentOutputFolder = output_dir NDocConverter.JobOption = "printer" rv = NDocConverter.SubmitFile( file_path , "") rv = NDocConverter.CheckStatus( file_path , "")

Page 15: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Future directions

Page 16: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

What should be comingUnfortunately the feature is not directly usable by external instances

Rewrite the conversion server-side codeVery old!Improve the conversion server monitoring at

CERNNot planned yet

Replace current implementation with a plugin on the Indico side

v1.5 (2015?)

Page 17: First Indico Workshop Conversion Server Thomas Baron 29-27 May 2013 CERN

Thomas [email protected]