BlogMatrix
 

Almost Universal Microformats Parser and hResume

edit David P. Janes 2006-09-16 20:21 UTC 15 comments  ·  ·  ·

Once of the projects we've been persuing over the last year is the "Almost Universal Microformats Parser", a Python library that does a pretty good job of breaking apart microformats. You can run the AUMFP against any webpage here or download (currently an old version of) the source here.

We've made quite a few modifications in the last two weeks and we're getting ready to release a new version of the source and also a few tools based on this project. We'll wait till we have more documentation and testing in place before we do the release though. We do the include-pattern thingie now and the interface makes a lot more sense.

As part of the update, we've extended the parser to handle hResume and tested against the samples pointed to on the Wiki. We also tried to identify places where documents don't conform to the proposed standard and document them as quirks (in general, we write our software to fail-as-last-resort). Here's the results of our testing:

Note that the "contact" quirk may be a misinterpretation on my part.