validating XML requests with Python and lxml

While working on pycsw, we found that there was a significant amount of code involved in processing the HTTP POST requests coming across as XML.  Since lxml is used as for XML support, why not use its native XML validation facilities?  We implemented this rather quickly, but found validation was taking up to 10 seconds.  Why?

In lxml, you have to specify an XML Schema to parse against, even if it is specified in xsi:schemaLocation.  Being a purist, I set this to fetch the schema on the fly from http://schemas.opengis.net.  The fetch was causing much of the bottleneck, so I decided to download all required OGC CSW schemas locally and have them as part of the implementation.  That should work right?  Validation was down to about 6 seconds.

The issue here was that even though the schemas were local, many xs:import definitions within them were pointing back to absolute URLs at schemas.opengis.net.  After modifying the schemas to point to relative locations, validation was extremely fast (way under a second).

Lesson learned: just because XML schemas are local, doesn’t mean they don’t point to remote URLs (though I’m not exactly sure why one would build a schema with non-local imports if they don’t have to).

3 Comments so far »

  1. Stephan Meissl said,

    Wrote on April 22, 2011 @ 09:26:09

    Mozilla Firefox 3.6.16 Ubuntu Linux

    Hi Tom,

    let me point you to XML catalog which does the rewriting for you on the fly. You just have to write a XML catalog file similar to the example below and point the environment variable XML_CATALOG_FILES to it (export XML_CATALOG_FILES=”").

    cu
    Stephan

    Posted from Austria Austria
    Mozilla Firefox 3.6.16 Ubuntu Linux
  2. Stephan Meissl said,

    Wrote on April 22, 2011 @ 09:29:17

    Mozilla Firefox 3.6.16 Ubuntu Linux

    Apparently XML is not allowed here so I posted the example at: http://pastebin.ca/2049366

    cu
    Stephan

    Posted from Austria Austria
    Mozilla Firefox 3.6.16 Ubuntu Linux
  3. Jorge de Jesus said,

    Wrote on September 19, 2011 @ 17:17:24

    Mozilla Firefox 6.0.2 Linux

    Hi tom

    I had also similar problems with OGC validation + WSDL validation, and since my systems don’t have catalogs I had to make a reverse proxy,read the OGC/W3C proxy reverse in Taverna install of the pywps wiki

    http://wiki.rsg.pml.ac.uk/pywps/Installation_Taverna

    Posted from Norway Norway
    Mozilla Firefox 6.0.2 Linux

Comment RSS · TrackBack URI

Leave a Comment

Name: (Required)

E-mail: (Required)

Website:

Comment:

Modified: 21 April 2011 08:44:07 EST