case study: rfa migration

Post on 02-Jan-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Case Study: RFA Migration. How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East. Who Am I?. Plone Consultant Non-profits in DC Foundation Member Zope/Python Users Group of DC (ZPUGDC) Events Organizer - PowerPoint PPT Presentation

TRANSCRIPT

Case Study: RFA Migration

How I migrated 208,566 news stories from Bricolage to Plone.

Alex Clark • http://aclark.net

March 12, 2008 • Plone Symposium East

Who Am I?

• Plone Consultant– Non-profits in DC

• Foundation Member• Zope/Python Users Group of DC

(ZPUGDC) Events Organizer• “UNIX guy”, sysadmin, Bachelor of

Science in Computer Science, not really a programmer.

What is this?

• An example of a “successful” migration, YMMV (your mileage may vary).

• Inspiration-a-palooza! If I can do it, anyone can.

• An opportunity to learn from my mistakes.– Analyses at the end.

• XXX: News ‘story’ not ‘news item’ ;-)– i.e. rfasite product ‘story’ content type, not Plone

default content type ‘news item’.

• Medium to large size migration

What this is not

• Plone vs. Bricolage.

• How to: <your migration>.

• Best practice (OK, maybe some best practice.)

Radio Free Asia

• RFA is a private, nonprofit corporation that broadcasts news and information in nine native Asian languages to listeners who do not have access to full and free news media. The purpose of RFA is to provide a forum for a variety of opinions and voices from within these Asian countries.

• Our Web site adds a global dimension to this objective. If you have comments, questions or suggestions, please contact us…

Before

After

• Not yet! ;-)

Pre-migration decisions

i.e. how to get the data out of the old site?• Relational database “content”?

– No one understood the Bricolage data model.

• http?– I didn’t want to crawl the website.

• “Baked” content on the filesystem.– provided the clearest migration path.– Find /var/www/rfa -name index.html

Zopectl run, then what?

• Need a way to structure the migration of 10 different language services– e.g. zopectl run mandarin.py.

• Need to ‘walk’ the file system.– i.e. how do we find the stories.

• Need a way to parse the html on the file system, – i.e. we can’t shove the entire index.html into the

body via setText()

• Need to do Unicode conversions.– E.g. from Big5, euc_kr, gb2312, ascii to Unicode.

Zopectl run, then what?• Use Framework for performing asynchronous

tasks, http://www.simplistix.co.uk/software/zope/stepper

• Use os.walk, http://docs.python.org/lib/os-file-dir.html (in particular cb2_examples/cb2_2_16_sol_1.py)

• Use HTML parsing, http://docs.python.org/lib/module-sgmllib.html (in particular diveintopython-5.4/py/BaseHTMLProcessor.py)

• Use Unicode conversions, http://docs.python.org/lib/standard-encodings.html

Stepper Basics• Allows you to break your migration into pieces.• Commits transactions for you.• Zopectl run run.py site-object steps-or-chains

Stepper config.py

Basic Results

• The ‘create’ step creates the site structure based on a list of categories defined in categories.py

• The ‘migrate’ step walks the file system looking for index.html files, then– Extracts the contents– Invokes the Factory on the new object in the

context of the category.– Calls mutators to insert content into fields,

• E.g. obj.setTitle(title_extracted)

Intermediate Results(How to: Promise Too Much)

• Slug-i-fication: Turning– /english/news/symposium_talks_rfa/2008/03/12/

index.html into– /english/news/20080312-symposium_talks_rfa.html

• Change “category” names, e.g. from– /english/news to – /english/exciting_news.

• Import audio and image files from file system– insert into story fields and/or story folders (stories are

folderish).• Featured audio or image, vs. inline audio or image.

Advanced Results(How to: Really Promise Too

Much)• Related Links

– At the bottom of each story are related links.

– Slug-I-fy then insert them inline.– Slug-I-fy, change the category, then insert

them inline.

No, Really…

• I promised too much.

The RFA Migration Story

• 10 Language Services

• 208,566 stories

• 5 Different encodings

• 70GB of content on the file system

• Hundreds of categories

The RFA Migration - E! True Hollywood Story

• Images everywhere– /english/category/story/2008/01/01/index.html has

image • /english/category/story/2008/01/01/foo.jpg and• /english/images/foo.jpg

• Audio everywhere• Duplicate stories everywhere

– Stories published as• /english/category/story/2008/01/01/index.html were also

published as• /english/category2/story/2008/01/01/index.html.

Sidebar: Buildout vs. Buildit

• Shortly after this project began, Buildout became the de facto standard for deploying a Plone site.

• Deploy migration code and sample data with your buildout.– e.g. bin/buildout -c migration.cfg

• where migration.cfg installs your migration code and sample data

– Even better: bin/migrate

And now the moment you have all been waiting for!

• Run buildout

• Add site

• Configure migration

• Run migration

Run buildout and add site

Configure migration ; run migration

Runme.py

Site wide results

Individual story results

Showcase of all language services

Wrap up

• Unexpected results

• Avoidable problems

• General wrap up

Unexpected results

• Missing content

• Wrong content

• Silent failures

Quick Fix for date!

Quick Fix for duplicates!

Quick Fix for broken content!

Avoidable problems

• Don’t promise too much

• Don’t write bad code (read: bare try/excepts, etc.)

• Don’t write slow code (use string methods over regular expressions, etc.)

General Wrap-up

• Client is happy• May actually launch soon• Huge rewards

– Great learning experience– This talk– Help others

• Things I would do different?– unrestrictedTraverse instead of app.rfa[‘english’]

[‘news’][‘20080101-slug.html’]

Questions/Comments?

• Email me: aclark@aclark.net

• http://aclark.net • ACLARK.NET, LLC

top related