case study: rfa migration how i migrated 208,566 news stories from bricolage to plone. alex clark...
TRANSCRIPT
![Page 1: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/1.jpg)
Case Study: RFA Migration
How I migrated 208,566 news stories from Bricolage to Plone.
Alex Clark • http://aclark.net
March 12, 2008 • Plone Symposium East
![Page 2: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/2.jpg)
Who Am I?
• Plone Consultant– Non-profits in DC
• Foundation Member• Zope/Python Users Group of DC
(ZPUGDC) Events Organizer• “UNIX guy”, sysadmin, Bachelor of
Science in Computer Science, not really a programmer.
![Page 3: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/3.jpg)
What is this?
• An example of a “successful” migration, YMMV (your mileage may vary).
• Inspiration-a-palooza! If I can do it, anyone can.
• An opportunity to learn from my mistakes.– Analyses at the end.
• XXX: News ‘story’ not ‘news item’ ;-)– i.e. rfasite product ‘story’ content type, not Plone
default content type ‘news item’.
• Medium to large size migration
![Page 4: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/4.jpg)
What this is not
• Plone vs. Bricolage.
• How to: <your migration>.
• Best practice (OK, maybe some best practice.)
![Page 5: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/5.jpg)
Radio Free Asia
• RFA is a private, nonprofit corporation that broadcasts news and information in nine native Asian languages to listeners who do not have access to full and free news media. The purpose of RFA is to provide a forum for a variety of opinions and voices from within these Asian countries.
• Our Web site adds a global dimension to this objective. If you have comments, questions or suggestions, please contact us…
![Page 6: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/6.jpg)
Before
![Page 7: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/7.jpg)
After
• Not yet! ;-)
![Page 8: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/8.jpg)
Pre-migration decisions
i.e. how to get the data out of the old site?• Relational database “content”?
– No one understood the Bricolage data model.
• http?– I didn’t want to crawl the website.
• “Baked” content on the filesystem.– provided the clearest migration path.– Find /var/www/rfa -name index.html
![Page 9: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/9.jpg)
Zopectl run, then what?
• Need a way to structure the migration of 10 different language services– e.g. zopectl run mandarin.py.
• Need to ‘walk’ the file system.– i.e. how do we find the stories.
• Need a way to parse the html on the file system, – i.e. we can’t shove the entire index.html into the
body via setText()
• Need to do Unicode conversions.– E.g. from Big5, euc_kr, gb2312, ascii to Unicode.
![Page 10: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/10.jpg)
Zopectl run, then what?• Use Framework for performing asynchronous
tasks, http://www.simplistix.co.uk/software/zope/stepper
• Use os.walk, http://docs.python.org/lib/os-file-dir.html (in particular cb2_examples/cb2_2_16_sol_1.py)
• Use HTML parsing, http://docs.python.org/lib/module-sgmllib.html (in particular diveintopython-5.4/py/BaseHTMLProcessor.py)
• Use Unicode conversions, http://docs.python.org/lib/standard-encodings.html
![Page 11: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/11.jpg)
Stepper Basics• Allows you to break your migration into pieces.• Commits transactions for you.• Zopectl run run.py site-object steps-or-chains
![Page 12: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/12.jpg)
Stepper config.py
![Page 13: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/13.jpg)
Basic Results
• The ‘create’ step creates the site structure based on a list of categories defined in categories.py
• The ‘migrate’ step walks the file system looking for index.html files, then– Extracts the contents– Invokes the Factory on the new object in the
context of the category.– Calls mutators to insert content into fields,
• E.g. obj.setTitle(title_extracted)
![Page 14: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/14.jpg)
Intermediate Results(How to: Promise Too Much)
• Slug-i-fication: Turning– /english/news/symposium_talks_rfa/2008/03/12/
index.html into– /english/news/20080312-symposium_talks_rfa.html
• Change “category” names, e.g. from– /english/news to – /english/exciting_news.
• Import audio and image files from file system– insert into story fields and/or story folders (stories are
folderish).• Featured audio or image, vs. inline audio or image.
![Page 15: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/15.jpg)
Advanced Results(How to: Really Promise Too
Much)• Related Links
– At the bottom of each story are related links.
– Slug-I-fy then insert them inline.– Slug-I-fy, change the category, then insert
them inline.
![Page 16: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/16.jpg)
No, Really…
• I promised too much.
![Page 17: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/17.jpg)
The RFA Migration Story
• 10 Language Services
• 208,566 stories
• 5 Different encodings
• 70GB of content on the file system
• Hundreds of categories
![Page 18: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/18.jpg)
The RFA Migration - E! True Hollywood Story
• Images everywhere– /english/category/story/2008/01/01/index.html has
image • /english/category/story/2008/01/01/foo.jpg and• /english/images/foo.jpg
• Audio everywhere• Duplicate stories everywhere
– Stories published as• /english/category/story/2008/01/01/index.html were also
published as• /english/category2/story/2008/01/01/index.html.
![Page 19: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/19.jpg)
Sidebar: Buildout vs. Buildit
• Shortly after this project began, Buildout became the de facto standard for deploying a Plone site.
• Deploy migration code and sample data with your buildout.– e.g. bin/buildout -c migration.cfg
• where migration.cfg installs your migration code and sample data
– Even better: bin/migrate
![Page 20: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/20.jpg)
And now the moment you have all been waiting for!
• Run buildout
• Add site
• Configure migration
• Run migration
![Page 21: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/21.jpg)
Run buildout and add site
![Page 22: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/22.jpg)
Configure migration ; run migration
![Page 23: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/23.jpg)
Runme.py
![Page 24: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/24.jpg)
Site wide results
![Page 25: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/25.jpg)
Individual story results
![Page 26: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/26.jpg)
Showcase of all language services
![Page 27: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/27.jpg)
Wrap up
• Unexpected results
• Avoidable problems
• General wrap up
![Page 28: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/28.jpg)
Unexpected results
• Missing content
• Wrong content
• Silent failures
![Page 29: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/29.jpg)
Quick Fix for date!
![Page 30: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/30.jpg)
Quick Fix for duplicates!
![Page 31: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/31.jpg)
Quick Fix for broken content!
![Page 32: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/32.jpg)
Avoidable problems
• Don’t promise too much
• Don’t write bad code (read: bare try/excepts, etc.)
• Don’t write slow code (use string methods over regular expressions, etc.)
![Page 33: Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark ://aclark.net March 12, 2008 Plone](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e2d5503460f94b1c5ef/html5/thumbnails/33.jpg)
General Wrap-up
• Client is happy• May actually launch soon• Huge rewards
– Great learning experience– This talk– Help others
• Things I would do different?– unrestrictedTraverse instead of app.rfa[‘english’]
[‘news’][‘20080101-slug.html’]