validating XML requests with Python and lxml

While working on pycsw, we found that there was a significant amount of code involved in processing the HTTP POST requests coming across as XML.  Since lxml is used as for XML support, why not use its native XML validation facilities?  We implemented this rather quickly, but found validation was taking up to 10 seconds.  Why?

In lxml, you have to specify an XML Schema to parse against, even if it is specified in xsi:schemaLocation.  Being a purist, I set this to fetch the schema on the fly from http://schemas.opengis.net.  The fetch was causing much of the bottleneck, so I decided to download all required OGC CSW schemas locally and have them as part of the implementation.  That should work right?  Validation was down to about 6 seconds.

The issue here was that even though the schemas were local, many xs:import definitions within them were pointing back to absolute URLs at schemas.opengis.net.  After modifying the schemas to point to relative locations, validation was extremely fast (way under a second).

Lesson learned: just because XML schemas are local, doesn’t mean they don’t point to remote URLs (though I’m not exactly sure why one would build a schema with non-local imports if they don’t have to).

Leave a Comment

Name: (Required)

E-mail: (Required)

Website:

Comment:

Modified: 21 April 2011 08:44:07 EST