2001: bridging the gap between rss and java old school style
DESCRIPTION
Before things had really caught on with Atom, RSS etc. There were many people looking for ways to handle Syndicated content. This was a pretty successful talk that I ended up giving quite a bit.TRANSCRIPT
1
Enabling LiveNewsfeeds using RSS,
Servlets and Transformations
Russell Castagnaro
IntroductionPresenter? Russell Castagnaro? Chief Mentor
? 4Charity.com ? SyncTank Solutions, Inc
? [email protected]? Experience
2
Introduction4Charity.com? Application Service Provider for the Non-
Profit industry? Pure Java development? Http://www.4charity.com? Locations:
? San Francisco,CA (HQ)? Honolulu, HI (Tech Team)
Goals?Leverage the Servlet 2.2 APIEmploy XML for data and configurationUse Resource Definition Format for content dataFormat XML using XSL TransformationEliminate hard-coding values
3
What’s the deal?
Newsfeeds are becoming a requirement for portal sites.Easy integration with existing web services is a key requirement?How can we avoid writing custom code for information providers?Can we avoid applets!!?
BackgroundIn 1999 I wrote an information portal application.Live newsfeeds seemed like a good ideaI wrote custom parsers and employed an open-source tool called CocoonEvery time the html changed, I had to recode!
4
Code ExampleNeeded different ‘ParsSpec’ for each content providerURLToXMLConsumer.javaSpaceProducer.javaThese worked great for 2 months...
‘ParseSpec’#HeadlineEntrycacheTime=6000HeadlineEntry=start=\n,end=<p>,attributes=Link,URL,Headline,Source,DateHeadlineEntry.Link=start=<a href=",end=">HeadlineEntry.Headline=start=">,end=</a>HeadlineEntry.Source=start=<font size="-1">,end=</font>#HeadlineEntry.Description=start=<br>,end=<br>HeadlineEntry.Date=start=- <i>,end=</i>HeadlineEntry.DTD="http://space.synctank.com/dtds/newsfeed.dtd "HeadlineEntry.Doctype=NewsfeedHeadlineEntry.URL=http://search.news.yahoo.com/search/news?p=space+aerospace&n=HeadlineEntry.QTY=10HeadlineEntry.XML=version="1.0"HeadlineEntry.Header=\
<?xml-stylesheet href="http://space.synctank.com/xsl/spacenews.xsl" type="text/xsl"?>\n\<?cocoon-process type="xslt"?>\n\<!-- ============================================================ -->\n\<!-- spacenews.xml -->\n\<!-- Simple XML file that uses the Newsfeed DTD. -->\n\<!-- Author: XML Loader Russell Castagnaro Thu Nov 18 22:59:07 HST 1999 ->\n\<!-- ============================================================ -->\n\
5
Java CodeURLToXMLProducer.xml and subclasses
Nice Features
All search providers content was converted to one XML document typeOnce the XML was created all search engines results were handled easily with XSLT
6
Document Type Definition<?xml version="1.0" encoding="US-ASCII" ?><!-- Newsfeed.dtd --><!-- Simple DTD that defines a grammar for news Feeds. --><!-- Author: Russell Castagnaro Nov 15 1999 --><!ELEMENT Newsfeed (HeadlineEntry)+><!ELEMENT HeadlineEntry (Link, Headline, Source, Description, Date)><!ELEMENT Link (#PCDATA)><!ELEMENT Headline (#PCDATA)><!ELEMENT Source (#PCDATA)><!ELEMENT Description (#PCDATA)><!ELEMENT Date (#PCDATA)>
NewsFeed Content (XML)<?xml version="1.0"?><?xml-stylesheet href="spacenews.xsl" type="text/xsl"?><?cocoon-process type="xslt"?><Newsfeed><HeadlineEntry><Link>http://dailynews.yahoo.com/h/ap/19991222/sc/space_shuttle_77.html</Link><Headline>Shuttle Astronauts Begin <b>Space</b>walk</Headline><Source>(Associated Press)</Source><Date>Dec 22 6:08 PM EST</Date></HeadlineEntry><HeadlineEntry><Link>http://biz.yahoo.com/rf/991222/xr.html</Link><Headline>RESEARCH ALERT - Boeing raised to buy</Headline><Source>(Reuters)</Source><Date>Dec 22 12:03 PM EST</Date></HeadlineEntry></Newsfeed>
7
Transforming the Newsfeed
Make the news feed human readable:? Create a Stylesheet using the XML
DOCTYPE rules? Transform the XML Document Using the XSL
Document
* Specifics on transformations coming soon!
The StyleSheet<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:output method="html" indent="no"/><xsl:template match="/">
<TABLE width="100%" cellpadding="0" cellspacing="0" border="0"><TR><TD bgcolor="#3366CC" align="left" valign="middle"><font face="helvetica, arial" size="2" color="#FFFFFF"><nobr><b>News</b></nobr></font></TD><TD align="right" bgcolor="#3366CC" valign="top" ><a href="/space/news/spacenews.xml">
<font face="helvetica, arial" size="1" color="#FFFFFF">View</font></a><IMG SRC="/space/images/spacer2.gif" BORDER="0" WIDTH="5" HEIGHT="2"/>
</TD></TR><TR><TD><font size="2" face="Arial, Helvetica, sans-serif"><b>Space and Aerospace News</b></font><BR/>
<xsl:apply-templates/> </TD></TR></TABLE>
</xsl:template><xsl:template match="HeadlineEntry">
<B><FONT face="helvetica, arial" size="1"><A HREF="{Link}"><xsl:value-of select="Headline"/></A></FONT></B> - <I><FONT size="-2" face="Arial, Helvetica, sans-serif"><xsl:value-of select="Source"/></FONT></I><BR/>
</xsl:template></xsl:stylesheet>
8
HTML Content
Then the Display Format Changed
Simple changes in the format from any site required significant changesChanging the parsing rules was not trivialEventually this became boring and tiresome
9
Interesting PointsI was not interested in manipulating XML documents within Java*I did not want to deal with DOM or SAXI was interested in displaying data in a clean, efficient mannerThe producer code I created was a bit embarrassing
*I was not lazy. I had a very full schedule at the time… . Sheesh!
Time Warp (Oct 2000)None of my parsing instructions still worked ?I had no interest in using the old code
There had to be a better wayI heard about O’reilly’s merkat project…
10
Enter RDF Site SummaryPreliminary format was v .91 from Netscape (remember them?)Resource Definition Format Summary (RSS .91) http://my.netscape.com/publish/formats/rss-0.91.dtd
Eliminates the need to parse through HTML for content.Standard - now WC3 has recommended version 1.0
RSS Example<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN""http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91"><channel>
<title> Space science news</title> <link>http://www.moreover.com</link><description>Space science news - news headlines from around the web, refreshed every 15
minutes</description> <language>en-us</language><image>
<title>moreover...</title> <url>http://i.moreover.com/pics/rss.gif</url><link>http://www.moreover.com</link> <width>144</width> <height>16</height><description>News headlines from more than 1,800 sources, harvested every 15 minutes...</description>
</image><item>
<title>NASA releases space station crew logs</title><link>http://c.moreover.com/click/here.pl?r16768175</link><description>floridatoday.com Mar 22 2001 12:20AM ET</description>
</item> <item><title>Tough love but support for space by George W. Bushs team</title><link>http://c.moreover.com/click/here.pl?r16768185</link><description>floridatoday.com Mar 22 2001 12:20AM ET</description>
</item></channel></rss>
C:\development\Castagnaro\space\space-moreover.xml
11
RSS Stylesheet Example<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="rss">
<Foo bar="{version}"> <xsl:apply-templates/> </Foo></xsl:template><xsl:template match="channel">
<TABLE width="100%" cellpadding="0" cellspacing="0" border="0"> <TR><TH align="left" bgcolor="#3366CC" valign="top" ><a alt="{description}" href="{link}"><font face="helvetica, arial" size="2" color="#FFFFFF"><nobr><xsl:value-of select="title"/></nobr></font></a></TH> </TR>
<xsl:apply-templates select="image"/> <xsl:apply-templates select="item"/>
</TABLE></xsl:template> <xsl:template match="image"><TR><TD align="right"><a href="{link}"><IMG SRC="{url}" BORDER="0" WIDTH="{width}" HEIGHT="{height}"/></a></TD></TR></xsl:template><xsl:template match="item">
<TR><TD colspan="2"><B><FONT face="helvetica, arial" size="1"><A HREF="{link}"><xsl:value-of select="title"/></A></FONT></B> - <I><FONT size="-2" face="Arial, Helvetica, sans-serif"><xsl:value-of select="description"/></FONT></I></TD></TR>
</xsl:template></xsl:stylesheet>
Newsfeed HTML
12
Access to RSS FeedsWhere do you find providers???Directory of open RSS providers:? http://www.superopendirectory.com/directory/4/standards/rss/sources
RSS Providers? 10.am
? http://10.am/search/-rss?search=<your term here>? List of topics: http://10.am/extra/ocsdirectory.xml
? echofactor? http://www.echofactor.com/feed_categories.html?format=RSS
? MoreOver? http://w.moreover.com/categories/category_list.html
Now we need to make this content readable!
Transforming XML to HTML
We have many options on performing XSL Transformations:? Depend on the client’s browser to transform the XML? Write a Servlet to handle the transformation? Use software that is widely available and standards based
Issues:? IE 5.x is one of the few browsers that support XSL
transformations? Publicly available software has many merits too? Servlets are easy enough. Transformations can be done in
< 10 lines
13
Transformation in a Servlet
public void service(HttpServletRequest req, HttpServletResponse res) throws IOException, ServletException {
PrintWriter out = res.getWriter(); res.setContentType("text/html");File xmlFile = new File(sourcePath, req.getParameter("XML")); File xslFile = new File(sourcePath, req.getParameter("XSL"));try {
XSLTProcessor processor = XSLTProcessorFactory.getProcessor();processor.process(new XSLTInputSource(new FileReader(xmlFile)),
new XSLTInputSource(new FileReader(xslFile)), new XSLTResultTarget(out));
} catch (Exception e) {out.println("Error: " + e.getMessage());
}out.flush();
}
One Problem We have to get the XML (RSS) file from the content provider!Use the networking classes to access the URLBe considerate of your provider!
14
New Code public void doGet(HttpServletRequest req, HttpServletResponse res) {try {
PrintWriter out = res.getWriter(); res.setContentType("text/html");URLConnection con; DataInputStream in;
URL url = new URL(sourceURL); con = url.openConnection();con.connect(); String type = null;in = new DataInputStream(con.getInputStream());FileReader fr = new FileReader(xslsrc);try {
XSLTProcessor processor = XSLTProcessorFactory.getProcessor();processor.process(new XSLTInputSource(in), new XSLTInputSource(fr),
new XSLTResultTarget(out)); } catch (Exception e) { log("Error: " + e.getMessage());} finally { in.close(); fr.close(); }out.flush();
} catch (Exception e) { …}
XSLT ModelRequest
Response
Servlet
URL LoadedXML
XSLTProcessor
XSLDocument
HTMLNewsFeed
15
Setting up your servletMost Appservers or Webservers support WAR’s and Deployment DescriptorsYou create a WebApp which has servlets, parameters and servlet mappings
Deployment Descriptor<web-app>
<servlet><servlet-name>newsServlet</servlet-name><servlet-class>com.synctank.http.servlets.RSSServlet</servlet-class><init-param><param-name>ERROR_URL</param-name><param-value>/error.jsp</param-value><description>The error page for this app.</description>
</init-param><init-param><param-name>SOURCE_SERVLET_URI</param-name><param-value>http://www.moreover.com/cgi-local/page?o=rss&c=Space%20science%20news</param-value><description>An absolute url that points to your XML</description>
</init-param>
16
Deployment Descriptor<init-param>
<param-name>STYLESHEET</param-name><param-value>/xsl/rss.xsl</param-value><description>The Stylesheet for presentation of the headlines. Should be a subdirectory of the war. The default is /xsl/rss.xsl </description>
</init-param><load-on-startup>0</load-on-startup></servlet><servlet-mapping><servlet-name>newsServlet</servlet-name><url-pattern>/newsy</url-pattern>
</servlet-mapping> <welcome-file-list><welcome-file>/foo/news.html</welcome-file>
</welcome-file-list><error-page><error-code>404</error-code><location>/error.jsp</location>
</error-page></web-app>
War directory structureRoot? WEB-INF
? Web.xml
? classes? com\synctank\http\servlets\RSSServlet.class
? xsl? rss.xsl
? docs? Index.html
? error.jsp
17
Moving Forward
RSS version 1.0 has been recommended by the w3c1.0 Uses has more flexibilityOnce more providers support
ReviewDon’t do the time!Leverage RSS and open content providersUse XSL to transform XML content to your format of choiceCache requests to content providers (keep them free!)
18
FinallyThanks for attendingSource Code Available? http://www.synctank.com/xmldevcon? [email protected]
Aloha