{"id":559,"date":"2011-05-06T09:19:46","date_gmt":"2011-05-06T13:19:46","guid":{"rendered":"http:\/\/www.kralidis.ca\/blog\/?p=559"},"modified":"2011-05-06T09:21:39","modified_gmt":"2011-05-06T13:21:39","slug":"csw-and-repository-thoughts","status":"publish","type":"post","link":"https:\/\/www.kralidis.ca\/blog\/2011\/05\/06\/csw-and-repository-thoughts\/","title":{"rendered":"CSW and repository thoughts"},"content":{"rendered":"<p><a title=\"CSW\" href=\"http:\/\/www.opengeospatial.org\/standards\/cat\">CSW<\/a> allows for querying various metadata models (e.g. Dublin Core, ISO).\u00a0 In <a title=\"pycsw\" href=\"http:\/\/pycsw.org\/\">pycsw<\/a>, our current model is to manage one repository per metadata model (or &#8216;typename&#8217; in CSW speak).\u00a0 That said, we setup each repository to have one column per &#8216;queryable&#8217; (as defined in CSW and application profiles), which we parse when loading metadata.\u00a0 We also store the full metadata record as is (for GetRecords ElementSetName=&#8217;full&#8217; requests).<\/p>\n<p>Complexity increases as we start thinking about support for more information models, and transforming to\/from requested information models (via CSW GetRecords\/GetRecordById &#8216;outputSchema&#8217; parameter).\u00a0 Having said this, I&#8217;ve started to think about a core, agnostic information model which any metadata format could map to (for lowest common denominator).\u00a0 This way, pycsw will always know the core information model queryables, which could be stored in columns as we currently do now.\u00a0 The underlying queries would always query against the queryable columns.\u00a0 Aside: it would be great to have a GDAL for metadata (MDAL anyone?).<\/p>\n<p>But what about a unified repository where just the metadata is stored in full (<a title=\"GeoNetwork\" href=\"http:\/\/geonetwork-opensource.org\/\">GeoNetwork<\/a> does it like this)?\u00a0 In this scenario, we would need heavy use of XPath queries on the full XML document in realtime.\u00a0 The advantage would be a.) less parsing on metadata loading b.) one repository is always loaded\/queried c.) less configuration for the catalog administrator.<\/p>\n<p>I like the use of XPath, but wonder about how this scales as additional databases are supported.\u00a0 We currently support SQLite, which is great for simplicity (and Python SQLite bindings allow for mapping Python functions).\u00a0 SQLite has no XPath support (but we could support this with Python bindings).\u00a0 PostgreSQL does (if you build with <a title=\"libxml2\" href=\"http:\/\/xmlsoft.org\/\">libxml2<\/a>), as does MySQL.\u00a0 As well, I&#8217;m not sure about the performance implications (and how deep XPath queries are in the database fetching, i.e. the entire XML document would have to be serialized before XPath queries are executed).<\/p>\n<p>Thoughts on a Friday morning.\u00a0 Anyone have any advice\/insight?<\/p>\n<p>&nbsp;<\/p>\n<link rel=\"stylesheet\" href=\"http:\/\/cdn.leafletjs.com\/leaflet-0.5\/leaflet.css\" \/>\n<!--[if lte IE 8]>\n  <link rel=\"stylesheet\" href=\"http:\/\/cdn.leafletjs.com\/leaflet-0.5\/leaflet.ie.css\" \/>\n<![endif]-->\n<script src=\"http:\/\/cdn.leafletjs.com\/leaflet-0.5\/leaflet.js\"><\/script>\n<style type=\"text\/css\">#map559 { width: 300px; height: 200px; }<\/style>\n\n<div id=\"map559\"><\/div>\n<script type=\"text\/javascript\">\n  var map559 = L.map('map559').setView([43.620495, -79.513198], 10);\n  L.tileLayer('http:\/\/{s}.tile.osm.org\/{z}\/{x}\/{y}.png', {\n      attribution: '&copy; <a href=\"http:\/\/osm.org\/copyright\">OpenStreetMap<\/a> contributors'\n  }).addTo(map559);\n<\/script>\n","protected":false},"excerpt":{"rendered":"<p>CSW allows for querying various metadata models (e.g. Dublin Core, ISO).\u00a0 In pycsw, our current model is to manage one repository per metadata model (or &#8216;typename&#8217; in CSW speak).\u00a0 That said, we setup each repository to have one column per &#8216;queryable&#8217; (as defined in CSW and application profiles), which we parse when loading metadata.\u00a0 We [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,7,3,11],"tags":[],"class_list":["post-559","post","type-post","status-publish","format-standard","hentry","category-geospatial","category-open-source","category-technology","category-web"],"_links":{"self":[{"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/posts\/559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/comments?post=559"}],"version-history":[{"count":3,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/posts\/559\/revisions"}],"predecessor-version":[{"id":561,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/posts\/559\/revisions\/561"}],"wp:attachment":[{"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/media?parent=559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/categories?post=559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kralidis.ca\/blog\/wp-json\/wp\/v2\/tags?post=559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}