Geoprocessing with OGR and PyWPS

PyWPS is a neat Python package supporting the OGC Web Processing Service standard.  Basic setup and configuration can be found in the documentation, or Tim’s useful post.

I’ve been working on a demo to expose the OGR Python bindings for geoprocessing (buffer, centroid, etc.).

Here’s an example process to buffer a geometry (input as WKT), and output either GML, JSON, or WKT:

from pywps.Process import WPSProcess
import osgeo.ogr as ogr

class Buffer(WPSProcess):
 def __init__(self):
  WPSProcess.__init__(self,
  identifier='buffer',
  title='Buffer generator',
  metadata=['http://www.kralidis.ca/'],
  profile='OGR / GEOS geoprocessing',
  abstract='Buffer generator',
  version='0.0.1',
  storeSupported='true',
  statusSupported='true')

  self.wkt = self.addLiteralInput(identifier='wkt', \
   title='Well Known Text', type=type('string'))
  self.format = self.addLiteralInput(identifier='format', \
   title='Output format', type=type('string'))
  self.buffer = self.addLiteralInput(identifier='buffer', \
   title='Buffer Value', type=type(1))
  self.out = self.addLiteralOutput(identifier='output', \
   title='Buffered Feature', type=type('string'))

 def execute(self):
  buffer = ogr.CreateGeometryFromWkt( \
   self.wkt.getValue()).Buffer(self.buffer.getValue())
  self.out.setValue(_genOutputFormat(buffer, self.format.getValue()))
  buffer.Destroy()
 def _setNamespace(xml, prefix, uri):
  return xml.replace('>', ' xmlns:%s="%s">' % (prefix, uri), 1)
 def _genOutputFormat(geom, format):
 if format == 'gml':
  return _setNamespace(geom.ExportToGML(), 'gml', \
     'http://www.opengis.net/gml')
 if format == 'json':
  return geom.ExportToJson()
 if format == 'wkt':
  return geom.ExportToWkt()

Notes:

  • _setNamespace is a workaround, as OGR’s ExportToGML doesn’t declare a namespace prefix / uri in the output, which would make the ExecuteResponse XML choke parsers
  • _genOutputFormat is a utility method, which can be applied to any OGR geometry object

As you can see, very easy to pull off, integrates and extends easy.  Kudos to the OGR and PyWPS teams!

Tips on Finding a Job

Great post by Dave here on his experience and suggestions / ideas on finding a job.  Upbeat, positive and encouraging.  Congratulations and good post Dave!

Displaying GRIB data with MapServer

I recently had the opportunity to prototype WMS visualization of meteorological data.  MapServer, GDAL and Python to the rescue!  Here are the steps I took to make it happen.

The data, (GRIB), is a GDAL supported format, so MapServer can handle processing as a result.  The goal here was to create a LAYER object.  First thing was to figure out the projection, then figure out the band pixel values/ranges and correlate to MapServer classes (in this case I just used a simple greyscale approach).

Here’s the hack:

import sys
import osgeo.gdal as gdal
import osgeo.osr as osr

if len(sys.argv) < 3:
 print 'Usage: %s <file> <numclasses>' % sys.argv[0]
 sys.exit(1)

cvr = 256  # range of RGB values
numclasses = int(sys.argv[2])  # number of classifiers

ds = gdal.Open(sys.argv[1])

# get proj4 def and write out PROJECTION object
p = osr.SpatialReference()
s = p.ImportFromWkt(ds.GetProjection())
p2 = p.ExportToProj4().split()

print '  PROJECTION'
for i in p2:
 print '   "%s"' % i.replace('+','')
print '  END'

# get band pixel data ranges and classify
band = ds.GetRasterBand(1)
min = band.GetMinimum()
max = band.GetMaximum()

if min is None or max is None:  # compute automagically
 (min, max) = band.ComputeRasterMinMax(1)

# calculate range of pixel values
pixel_value_range = float(max - min)
# calculate the intervals of values based on classes specified
pixel_interval = pixel_value_range / numclasses
# calculate the intervals of color values
color_interval = (pixel_interval * cvr) / pixel_value_range

for i in range(numclasses):
 print '''  CLASS
  NAME "%.2f to %.2f"
  EXPRESSION ([pixel] >= %.2f AND [pixel] < %.2f)
  STYLE
   COLOR %s %s %s
  END
 END''' % (min, min+pixel_interval, min, min+pixel_interval, cvr, cvr, cvr)
 min += pixel_interval
 cvr -= int(color_interval)

Running this script outputs various bits for MapServer mapfile configuration.  Passing more classes to the script creates more CLASS objects, resulting in a smoother looking image.

Here’s an example GetMap request:

Meteorological data in GRIB format via MapServer WMS

Users can query to obtain pixel values (water temperature in this case) via GetFeatureInfo.  Given that these are produced frequently, we can use the WMS GetMap TIME parameter to create time series maps of the models.

OWSLib CSW Updates and Implementation Thoughts

I’ve had some time to work on CSW support in OWSLib in the last few days.  Some thoughts and updates:

FGDC Support Added

Some CSW endpoints out there serve up GetRecords responses in FGDC CSDGM format.  This has now been added to trunk (mandatory elements + eainfo).  Note that csw:Record (DMCI + ows:BoundingBox) and ISO 19139 are already supported.  One tricky bit here is that FGDC was/is mostly implemented without a namespace, which CSW requires as an outputSchema parameter value.  I’ve used http://www.fgdc.gov for now.

Both FGDC and ISO (moreso ISO) have deep and complex content models, so if there are elements that you don’t see supported in OWSLib, please file an feature request ticket in trac, and I’ll make sure to implement (and add to the doctests).

Metadata Identifiers are Important!

When parsing GetRecords requests, we store records in a Python dict, using /gmd:MD_Metadata/gmd:fileIdentifier (for ISO), /csw:Record/dc:identifier (for CSW’s baseline) or /metadata/idinfo/datasetid (for FGDC).  Some responses return metadata without these ids for whatever reason.  This sets the dict key to Python’s None type, which ends up overwriting the dict key’s entry.  Not good.  I implemented a fix to set a random, non-persistent identifier so as not to lose data.

Of course, the best solution here would be for providers to set identifiers accordingly from the start.  Then again, what happens when CSW endpoints harvest other CSWs and identifiers are the same?  Perhaps a namespace of some sort for the CSW would help.  This would be an interesting interoperability experiment.

Harvest Support Added

I implemented a first pass of supporting Harvest operations.  CSW endpoints usually require authentication here, which may vary by the implementation.  Therefore, I’ve left this logic out of OWSLib as it’s not part of the standard per se.

Transaction Support Coming

Sebastian Benthall of OpenGeo has indicated that they are using OWSLib’s CSW support for some of their projects (awesome!), and has kindly submitted a patch for an optional element issue (thanks Sebastian!).  He also indicated that Transaction support would be of interest, so I’ve started to think about this one.  As with the Harvest operation, Transaction support also requires some sort of authentication, which we’ll leave to the client implementation.  Most of the work will be with marshalling the request, as response handling is very similar to Harvest responses.

Give it a go, submit bugs and enhancements to the OWSLib trac.  Enjoy!

Batch Centroid Calculations with Python and OGR

I recently had a question on how to do batch centroid calculations against GIS data. OGR to the rescue again!

Using OGR’s Python bindings (GDAL/OGR needs to be built –with-geos=yes), one can process, say, an ESRI Shapefile, and calculate a centroid for each feature.

The script below does exactly this, and writes out a new dataset (any input / output format supported by OGR).

import sys
import osgeo.ogr as ogr

# process args
if len(sys.argv) < 4:
 print 'Usage: %s <format> <input> <output>' % sys.argv[0]
 sys.exit(1)

# open input file
dataset_in = ogr.Open(sys.argv[2])
if dataset_in is None:
 print 'Open failed.\n'
 sys.exit(2)

layer_in = dataset_in.GetLayer(0)
feature_in = layer_in.GetNextFeature()

# create output
driver_out = ogr.GetDriverByName(sys.argv[1])
if driver_out is None:
 print '%s driver not available.\n' % sys.argv[1]
 sys.exit(3)

dataset_out = driver_out.CreateDataSource(sys.argv[3])
if dataset_out is None:
 print 'Creation of output file failed.\n'
 sys.exit(4)

layer_out = dataset_out.CreateLayer(sys.argv[3], None, ogr.wkbPoint)
if layer_out is None:
 print 'Layer creation failed.\n'
 sys.exit(5)

# setup attributes
feature_in_defn = layer_in.GetLayerDefn()

for i in range(feature_in_defn.GetFieldCount()):
 field_def = feature_in_defn.GetFieldDefn(i)
 if layer_out.CreateField(field_def) != 0:
  print 'Creating %s field failed.\n' % field_def.GetNameRef()

layer_in.ResetReading()
feature_in = layer_in.GetNextFeature()

# loop over input features, calculate centroid and output features
while feature_in is not None:
 feature_out_defn = layer_out.GetLayerDefn()
 feature_out = ogr.Feature(feature_out_defn)
 for i in range(feature_out_defn.GetFieldCount()):
  feature_out.SetField(feature_out_defn.GetFieldDefn(i).GetNameRef(), \
  feature_in.GetField(i))
  geom = feature_in.GetGeometryRef()
  centroid = geom.Centroid()
  feature_out.SetGeometry(centroid)
  if layer_out.CreateFeature(feature_out) != 0:
   print 'Failed to create feature.\n'
   sys.exit(6)
  feature_in = layer_in.GetNextFeature()

# cleanup
dataset_in.Destroy()
dataset_out.Destroy()

easy CSW with eXcat

If you have existing metadata and simply want a CSW interface as a means to search and discover your geospatial metadata, eXcat provides a simple solution.  Following the installation steps, it’s quite simple to populate your CSW:

$ cd excat/csw/WEB-INF/harvest
$ tar zxf my_metadata_files.tgz
$ for i in *.xml
> do
> lwp-download "http://localhost/excat/csw?request=Harvest&service=CSW&\
> version=2.0.2\&namespace=xmlns(csw=http://www.opengis.net/cat/csw)&\
> source=$i&resourceFormat=application/xml\
> &resourceType=http://www.isotc211.org/2005/gmd"
> done

That’s pretty much it.  Lightweight simple approach, particularly for those who already have metadata management tools in place and need CSW.

/me thinks it would be nice to have a Python port of this for those in Java-less environments (why does it seem most CSW server implementations are in Java?)

NYC Sprint is Upon Us

Building on the Toronto Code Sprint 2009 (I had the honour of helping Paul set this up), this year MapServer, GDAL, PostGIS, etc. devs are headed to to the Big Apple for the New York Code Sprint 2010.  Having participated in last year’s event, I can say that it is a fun, spirited and productive event.  Though I won’t be able to make it there in person this year, I will be among those ‘present in spirit’ on #tosprint over the weekend.

Keep an eye on Paul’s blog for sprint updates.  Have fun guys!

GeoScript

GeoScript looks like a neat effort to leverage GeoTools into Python (an increasingly widely used language for GIS scripting) and JavaScript.

I love this for JavaScript, and I wonder how this relates to the other Python work out there (like Shapely and WorldMill); Sean?

Why XML Libraries Rock

msautotest is MapServer’s way of unit testing and sanity checking various features and bug fixes.

When testing the addition of AuthorityURL and Identifier support in WMS Capabilities XML, I found an issue with the output being invalid XML, which was tested and fixed. Another fix was then added to ensure valid XML (isn’t open source great)?

MapServer outputs XML by way of a modified printf as well as using libxml2 for newer code. Here was a case of a feature being added to older code. I’ve always pushed for libxml2 as it negates the possibilities of trying to print out XML via printf, which IMHO is error prone and can lead to poorly formed and invalid XML, and tons of printf’s for closing elements. Something like libxml2 trims down your code so you don’t have to do that (just declare the element, and libxml2 will close it for you). Same goes for etree for python folks.

At the same time, using something like libxml2 can yield heavy processing, especially for huge XML response (did someone say WFS GetFeature responses?).

What do you use for outputting XML in your development environment?

Why I Love Linux

$ uptime
 21:20:01 up 112 days,  9:06,  1 user,  load average: 0.05, 0.09, 0.18

Modified: 11 December 2009 21:48:16 EST