Wikipedia

Search results

06 June 2013

HPPLE and accessing paths in XPATH

HPPLE and accessing paths in XPATH

WHY?? WHY OH WHY???

Ray Wenderlich's tutorial on utilizing HPPLE is freaking amazing! I highly recommend using these classes for parsing HTML and other XMLs. My experience with using HPPLE made me feel it was very similar to parsing JSON, although a wee bit more uhhh... challenging. Unfortunately for me, I was not able to find a good XPATH reader, as there are many available for JSON for free and online. Nonetheless, a little bit of experimentation and testing went a long way to find paths to the desired locations.

Here, I'll mention some of the three most used techniques for me during my process of parsing HTML with HPPLE:

1- Test, experiment, repeat

To access a path, there are numerous challenges. First and foremost, make sure you are using the correct path, and that the spelling of your path is correct. The Google Chrome Developer view is very helpful to view the path, make sure your quotes are alligned and balanced, and that your spelling is correct. Also be sure that you are parsing the correct URL. Check, and double check everything regarding your XPATH NSString path, and URL.

For instance, in Google Chrome Developer you can highlight most section(s) of a webpage and "Inspect Element". The path will then be provided to you at the bottom of the developer window, and you must then enter that information into your NSString. Such as seeing:

div#sectionOne, div#sectionTwo, a, p, h2

That would be translated to NSString as:
@"//div[id='sectionOne']/div[id='sectionTwo']/a/p/h2";
If your Breakpoint or NSLog shows a nil/null object, don't despair! Check your path again, check your quotes, make sure you are indicating the correct tag and whether it is id, class, etc. If you still have trouble, reduce your path. For instance, in the above, you would begin by removing the h2, and if that still doesn't work just keep removing until you have an array that you can use.

2- Access arrays, and access dictionaries

Sometimes, the XPATH array you will be returned is freakish looking. A monster string that is way too freakish to inspect as an array. In such cases, utilize fast enumeration! You can print out your array or dictionary via :

for (TFHppleElement *element in yourArray) {
 NSLog(@"\n%@\n\n", element);
}

You will often notice that your array will be a combination of strings, dictionaries, and/or arrays. If you come across trouble/difficulty with your data structure there are a couple approaches that may benefit you. These include to create a custom method in your TFHppleElement header and implementation files to access node(s) you are frequently encountering in your XPATH. Identify whether you are dealing with a string, array, or dictionary; and utilize the proper instrument to access the paths, as necessary. Common will be the TFHpple instance methods with the 'content' method, array brackets, and valueForPath:@"string". You can really benefit by breaking your large data structure down one piece at a time, instead of being stuck on trying to narrow it down in one stroke.

3- Regular expressions and string methods

Despite the awesomeness of HPPLE, you will sometime(s) reach a dead end where you are no longer working with XML, and are left with exorbitant whitespace and/or nasty tags. You can benefit from utilizing NSString methods and/or Regex to take care of such mishigas. Deal with what you have, and you will find at least some form of happiness with HPPLE, as many before have done for what may be considered a long or very long time in the world of iOS programming.

Alright!!

I hope this gives you a little bit of motivation in your pursuit of parsing XML, including HTML. If you are in the middle of a project currently, keep in there! There are great alternatives to HPPLE, and honestly your choice will probably not affect the growing pains that much AND HPPLE is a great and powerful instrument!

Sometimes though, the XML is embedded and not accessible. You can parse the entire accessible tags via assigning /html for the path, and you can do a basic find to see whether or not your element is accessible. If not, look for a different URL, or you may have to look to another instrument and/or approach for your immediate needs. For more, check out the example project I made to parse data in Toys'R Us at github.

No comments:

Post a Comment