Archive for open source

Friday Metadata Thoughts

Not the most exciting topic, but I’ve found myself knee deep in metadata standards as they pertain to CSW in the last couple of weeks.

I’ve made some recommendations in the past for OWS metadata, which have helped in established publishing requirements for cataloguing.

Starting to look at ISO metadata (data, service) makes you quickly realize the pros and cons which come with making a standard flexible and exhaustive.  Let’s take 19139; almost everything in the schema is optional.  I think this is where profiles (such as ISO North American Profile) start to become especially important.

19119 is in the same boat.  Aside: then you start to wonder about the overlap between 19119 and OWS Capabilities metadata.  Wouldn’t it be nice if GetCapabiilties returned a 19119 document instead?  Which could plop nicely in a CSW query response as well. Oh wait, it already does.  But then try to validate the document instance.  You’ll find that OGC CSW and ISO use different versions of GML (follow the refs in the .xsd’s you’ll see them soon enough), yet apply them to the same namespace.  So validation fails.  Harmonization required!

Having said this, this is very complicated metadata which can be addressed by intelligent tools.  Tools that:

  • integrate with GIS systems which can automagically populate by:
    • fetching spatial extents
    • fetching reference system definitions
    • establish hierarchy (this would be tough as it would be tied to the data management of the system)
    • fetch contact information from a given user profile (how about getting this from the network’s email / global address book against the logged in user?)

Then again, what about keeping it simple and mainstream friendly?  The toughest part is metadata creation, so let’s make it as easy as possible to do so!

How do your activities try to make metadata easier to create?

new stuff in OWSLib

I’ve been spending alot of time lately doing a CSW client library in python, which was committed today to OWSLib.  CSW requests can be tricky to construct correctly, so this contribution attempts to provide an easy enough entry point to querying OGC Catalogues.

At this point, you can query your favourite CSW server with:

>>> from owslib import csw
>>> c = csw.request('http://example.org/csw')
>>> c.GetCapabilities() # constructs XML request, in c.request
>>> c.fetch() # HTTP call.  Result in c.response
>>> c.GetRecords('dataset','birds',[-152,42,-52,84]) # birds datasets in Canada       
>>> c.fetch()
>>> c.GetRecords('service','frog') # look for services with frogs, anywhere
>>> c.fetch()

That’s pretty much all there is to it.  There’s also support for DescribeRecord, GetRecordById and GetDomain for the adventurous.

I hope this will be a valuable addition.  Because CSW uses Filter, I broke things out into a module per standard, so that other code can reuse, say, filter for request building and response parsing.  A colleague is using this functionality to write a QGIS CSW search plugin.

My next goal will be to put in some response handling.  This will be tricky given the various outputSchema’s a given CSW advertises.  For now, I will concentrate on the default csw:Record (a glorified Dublin Core with ows:BoundingBox).

So try it out;  comments, feedback and suggestions would be most valued.

Oh ya, thank you etree!

MapServer 5.4.0 released

Announced yesterday, this release closes 92 bugs, and adds some new goodies.

Next stop: MapServer 6.0

Sun, Oracle and MySQL

http://www.sun.com/third-party/global/oracle/.  I wonder what this will mean for MySQL?

Creating sitemap files for GeoNetwork

Sitemaps are a valuable way to index your content for web crawlers.  GeoNetwork is a great tool for metadata management and a portal environment for discovery.  I wanted to push out all metadata resources out as a sitemap so that content can be found by web crawlers.  Python to the rescue:

#!/usr/bin/python
import MySQLdb
# connect to db
db=MySQLdb.connection(host='127.0.0.1', user='foo',passwd='foo',db='geonetwork')
# print out XML header
print """<?xml version="1.0" encoding="UTF-8"?>
<urlset
 xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
 xmlns:geo="http://www.google.com/geo/schemas/sitemap/1.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">"""

# fetch all metadata
db.query("""select id, schemaId, changeDate from Metadata where isTemplate = 'n'""")
r = db.store_result()

for row in r.fetch_row(0): # write out a url element
    if row[1] == 'fgdc-std':
        url = 'http://devgeo.cciw.ca/geonetwork/srv/en/fgdc.xml'
    if row[1] == 'iso19139':
        url = 'http://devgeo.cciw.ca/geonetwork/srv/en/iso19139.xml'
    print """ <url>
  <loc>%s?id=%s</loc>
  <lastmod>%s</lastmod>
  <geo:geo>
   <geo:format>%s</geo:format>
  </geo:geo>
 </url>""" % (url, row[0], row[2], row[1])
print '</urlset>'

Done!  It would be great if this were an out-of-the-box feature of GeoNetwork.

Using Python to parse config files

Alot of tools out there have some sort of configuration which, at run time, is read and used in the process accordingly.  When writing tools, my config file format has always been something like:

title: My Tool
# commented out line

description: This is my tool.  # another comment

Since I’m using Python for much of my scripting these days, I decided to write a small parser to handle this type of config.  So here’s what I’ve come up with:

import fileinput, re

def parse(file=None, delim=':'):
    '''
        Parses a config file formatted like:
        foo: bar
        # comments: out line
        - comments allowed (#)
        - empty lines allowed
        - spaces allowed

    '''

    d = {}

    if file is None:
        return -1

    for line in fileinput.input(file):
        if not line.strip(): # skip empty or space padded lines
            continue
        if re.compile('^#').search(line) is not None: # skip commented lines
            continue
        else: # pick up key and value pairs
            kvp = line.strip().split(delim)
            if kvp[1].strip().split('#') is not None:
                d[kvp[0].strip()] = kvp[1].split('#')[0].strip()
            else:
                d[kvp[0].strip()] = kvp[1].strip()
    return d

Seems to work well so far.  I wonder if there’s a config file standard out there?

MapServer Disaster: you have got to be kidding me

http://n2.nabble.com/FW%3A-MapServer-enhancements-refactoring-project-td2571268.html

I’m beyond words at this point.

fun with Shapelib

We have some existing C modules which do a bunch of data processing, and wanted the ability to spit out shapefiles on demand.  Shapelib is a C library which allows for reading and writing shapefiles and dbf files.  Thanks to the API docs, here’s a pared down version of how to write a new point shapefile (with, in this case, one record):

#include <stdio.h>
#include <stdlib.h>
#include <libshp/shapefil.h>
/*
 build with: gcc -O -Wall -ansi -pedantic -g -L/usr/local/lib -lshp foo.c
*/
int main() {
    int i = 0;
    double *x;
    double *y;

    SHPHandle  hSHP;
    SHPObject *oSHP;
    DBFHandle  hDBF;

    x = malloc(sizeof(*x));
    y = malloc(sizeof(*y));

    /* create shapefile and dbf */
    hSHP = SHPCreate("bar", SHPT_POINT);
    hDBF = DBFCreate("bar");

    DBFAddField(hDBF, "stationid", FTString, 25, 0);

    /* add record */
    x[0] = -75;
    y[0] = 45;
    oSHP = SHPCreateSimpleObject(SHPT_POINT, 1, x, y, NULL);
    SHPWriteObject(hSHP, -1, oSHP);
    DBFWriteStringAttribute(hDBF, 0, 0, "abcdef");

    /* destroy */
    SHPDestroyObject(oSHP);

    /* close shapefile and dbf */
    SHPClose(hSHP);
    DBFClose(hDBF);
    free(x);
    free(y);

    return 0;
}

Done!

Less Than 4 Hours

A benefit of open source.

< 4 hours.  That’s how long it took to address a MapServer bug in WMS 1.3.0.  Having been on the other side of these many times, it’s gratifying to bang out quick fixes as well.

Committing often 🙂

MapServer Code Sprint Progress

MapServer action from the Toronto Code Sprint 2009:

Paul has full details on his blog (day 1, day 2, day 3, day 4, post-mortem).  More details from Chris (day 1, day 2, day 3, day 4).  Also check out some pictures from the event.

Personally, I was happy to bang out fixes for:

  • optionally disabling SLD for WMS (#1395)
  • support for resultType=hits for WFS (#2907)
  • working code for WFS spatial filters against the new GEOS thread safe C API (#2929)
  • WFS 1.1.0 supporting OWS Common 1.0.0 instead of 1.1.0 (#2925)
  • The beginnings of support for correct axis ordering for WFS 1.1.0 (#2899)

Good times!

UPDATE 12 March 2009: here’s a Camptocamp report of the event.

Modified: 13 October 2009 14:30:20 EST