Android RSS Tutorial Revisited - Some Questions Asked..

I was going to write a post about unit testing Spring web apps, but I have had a few questions recently about the Android RSS parser that I posted a while back, so I thought I should answer those quickly now (the explanation seemed a little long winded to add as a comment).

The questions were specifically about how to extend the example code provided to parse additional tags. These questions are actually just core SAX questions, and so the answer is basically more detail about SAX and can actually be applied to non RSS/Android parsers as well.

The two questions I wanted to tackle was
1) How to extend the code to handle additional tags
2) How to extend the code if there are tags that should be processed inside other tags (for example, an tag inside the tag

Both tasks require an understanding of the three core methods in the SAX Parser:
- startElement() – this is called every time an opening XML tag is found of any nature
- endElement() – This is called every time a closing XML tag is found of any nature
- characters() – this is called between start and end tags, but not necessarily at the end of the text being processed (this is very important to remember!)

In our very simple example, we are assuming that we are only ever interested in the lowest level leaf nodes of the XML feed, and they will only ever contain text. From this assumption, in the startElements() method, we don’t need to process which tag we are in, we can just reset the StringBuffer (the StringBuffer is used to gather text in between tags). By resetting the buffer every time we start an element, we know that when we close an element that we are interested in (as it is always only the leaf node) we have the text contents of that node in the buffer.

If you inspect the endElements() method from the original example, you will a nested IF block- this block examines the name of the tag being closed and decides if we are interested in the content of the current closing node. If we are interested, then we take the text from the buffer and save it in our context object, otherwise we just ignore it, and the buffer will be cleared when the next xml node is opened.

Therefore, to extend just to add another node, we can just add another condition to the IF block with the name of the new tag. Easy!
public void endElement(String uri, String localName, String qName) throws SAXException {

        if (localName.equalsIgnoreCase("title"))
        {
            Log.d("LOGGING RSS XML", "Setting article title: " + chars.toString());
            currentArticle.setTitle(chars.toString());

        }
        else if (localName.equalsIgnoreCase("description"))
        {
            Log.d("LOGGING RSS XML", "Setting article description: " + chars.toString());
            currentArticle.setDescription(chars.toString());
        }
        else if (localName.equalsIgnoreCase("the name of any new node goes here!"))
        {
            //here you can handle the node contents as you like.. chars.toString() will give you all
            //the contents of the node


The second issue is a little more challenging, as it requires some additional processing, and we have to turn to our startElement() method.

Lets imagine we are processing an XML that looks as follows:

    
        example feed title
        16/02/2011
        example description of feed, and here is a picture http://example.com/image.jpg i hope you like the feed!
        
    


The title and date are processed by our existing code easily, as when the parser reaches the opening title or pubDate tag it clears the buffer, and then when it reaches the closing tag for each node, it checks the IF block and saves the text in the node into our context object.

The description tag is more complex, as we want to keep the text, but currently, as soon as the inner img tag is reached by our parser, the buffer is cleared, and this clears all the description text we have up until that point.

To handle this scenario, the easiest way is to process the opening node in the startElement() method, and have a secondary buffer that is used specifically for the description node – this way, when we open the description node, we reset the description buffer, but it will not get reset at the img node, then in the endElement() block we add the condition to handle the closing of the tag. The full code to handle the above scenario is below: (note, we also need to update our characters() method to consume the text between the nodes – for performance, I have set a flag so it knows only to write to our new description specific buffer when inside the node)

public class RSSHandler extends DefaultHandler {

    // Feed and Article objects to use for temporary storage
    private Article currentArticle = new Article();
    private List
articleList = new ArrayList
(); // Number of articles added so far private int articlesAdded = 0; // Number of articles to download private static final int ARTICLES_LIMIT = 15; //Current characters being accumulated StringBuffer chars = new StringBuffer(); /** * THIS IS A NEW BUFFER SPECIFICALLY FOR DESCRIPTION **/ StringBuffer descriptionChars = new StringBuffer(); boolean processingDescription = false; /* * This method is called everytime a start element is found (an opening XML marker) * here we always reset the characters StringBuffer as we are only currently interested * in the the text values stored at leaf nodes * * (non-Javadoc) * @see org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes) */ public void startElement(String uri, String localName, String qName, Attributes atts) { chars = new StringBuffer(); //IF DESCRIPTION THEN SET FLAG TO START PROCESSING SPECIFIC BUFFER if (localName.equalsIgnoreCase("description")) { descriptionChars = new StringBuffer(); processingDescription = true; } } /* * This method is called everytime an end element is found (a closing XML marker) * here we check what element is being closed, if it is a relevant leaf node that we are * checking, such as Title, then we get the characters we have accumulated in the StringBuffer * and set the current Article's title to the value * * If this is closing the "Item", it means it is the end of the article, so we add that to the list * and then reset our Article object for the next one on the stream * * * (non-Javadoc) * @see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String, java.lang.String, java.lang.String) */ public void endElement(String uri, String localName, String qName) throws SAXException { if (localName.equalsIgnoreCase("title")) { Log.d("LOGGING RSS XML", "Setting article title: " + chars.toString()); currentArticle.setTitle(chars.toString()); } else if (localName.equalsIgnoreCase("description")) { /** * ALSO PROCESS HERE THE DECRIPTION SPECIFIC STRING BUFFER **/ Log.d("LOGGING RSS XML", "Setting article description: " + descriptionChars.toString()); currentArticle.setDescription(descriptionChars.toString()); //ALSO SET DESCRIPTION FLAG TO FALSE processingDescription = false; } else if (localName.equalsIgnoreCase("pubDate")) { Log.d("LOGGING RSS XML", "Setting article published date: " + chars.toString()); currentArticle.setPubDate(chars.toString()); } else if (localName.equalsIgnoreCase("encoded")) { Log.d("LOGGING RSS XML", "Setting article content: " + chars.toString()); currentArticle.setEncodedContent(chars.toString()); } else if (localName.equalsIgnoreCase("item")) { } /** * THIS IS NEW TO HANDLE IMAGE TAG **/ else if (localName.equalsIgnoreCase("img")) { Log.d("LOGGING RSS XML", "Setting article description image: " + chars.toString()); currentArticle.setDescription(chars.toString()); } else if (localName.equalsIgnoreCase("link")) { try { Log.d("LOGGING RSS XML", "Setting article link url: " + chars.toString()); currentArticle.setUrl(new URL(chars.toString())); } catch (MalformedURLException e) { Log.e("RSA Error", e.getMessage()); } } // Check if looking for article, and if article is complete if (localName.equalsIgnoreCase("item")) { articleList.add(currentArticle); currentArticle = new Article(); // Lets check if we've hit our limit on number of articles articlesAdded++; if (articlesAdded >= ARTICLES_LIMIT) { throw new SAXException(); } } } /* * This method is called when characters are found in between XML markers, however, there is no * guarante that this will be called at the end of the node, or that it will be called only once * , so we just accumulate these and then deal with them in endElement() to be sure we have all the * text * * (non-Javadoc) * @see org.xml.sax.helpers.DefaultHandler#characters(char[], int, int) */ public void characters(char ch[], int start, int length) { chars.append(new String(ch, start, length)); if (processingDescription){ descriptionChars.append(new String(ch, start, length)); } } /** * This is the entry point to the parser and creates the feed to be parsed * * @param feedUrl * @return */ public List
getLatestArticles(String feedUrl) { URL url = null; try { SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); url = new URL(feedUrl); xr.setContentHandler(this); xr.parse(new InputSource(url.openStream())); } catch (IOException e) { Log.e("RSS Handler IO", e.getMessage() + " >> " + e.toString()); } catch (SAXException e) { Log.e("RSS Handler SAX", e.toString()); } catch (ParserConfigurationException e) { Log.e("RSS Handler Parser Config", e.toString()); } return articleList; } }

rob hinds

I'm on to the next one, on to the next one..

2 comments:

  1. I am currently trying to use the original project to develop an application for news using rss feeds. When I use the original Rss feed provided by you it runs fine. Anytime I use any other feed I get a "JSON Error" message as it is programmed in. I am guessing is has something to do with how the rss is setup? Here is the feed I am trying to access: http://gamemakerblog.com/feed/

    ReplyDelete
  2. Same thing with my app .. :( JSON EXCEPTION... nothing happening after that :(

    ReplyDelete