CSW and repository thoughts

CSW allows for querying various metadata models (e.g. Dublin Core, ISO).  In pycsw, our current model is to manage one repository per metadata model (or ‘typename’ in CSW speak).  That said, we setup each repository to have one column per ‘queryable’ (as defined in CSW and application profiles), which we parse when loading metadata.  We also store the full metadata record as is (for GetRecords ElementSetName=’full’ requests).

Complexity increases as we start thinking about support for more information models, and transforming to/from requested information models (via CSW GetRecords/GetRecordById ‘outputSchema’ parameter).  Having said this, I’ve started to think about a core, agnostic information model which any metadata format could map to (for lowest common denominator).  This way, pycsw will always know the core information model queryables, which could be stored in columns as we currently do now.  The underlying queries would always query against the queryable columns.  Aside: it would be great to have a GDAL for metadata (MDAL anyone?).

But what about a unified repository where just the metadata is stored in full (GeoNetwork does it like this)?  In this scenario, we would need heavy use of XPath queries on the full XML document in realtime.  The advantage would be a.) less parsing on metadata loading b.) one repository is always loaded/queried c.) less configuration for the catalog administrator.

I like the use of XPath, but wonder about how this scales as additional databases are supported.  We currently support SQLite, which is great for simplicity (and Python SQLite bindings allow for mapping Python functions).  SQLite has no XPath support (but we could support this with Python bindings).  PostgreSQL does (if you build with libxml2), as does MySQL.  As well, I’m not sure about the performance implications (and how deep XPath queries are in the database fetching, i.e. the entire XML document would have to be serialized before XPath queries are executed).

Thoughts on a Friday morning.  Anyone have any advice/insight?


Leave a Comment

Name: (Required)

E-mail: (Required)



Modified: 6 May 2011 09:21:39 EST