Scope
This lab covers issues dealing with geocoding a point dataset, and overlaying the points on a polygonal vector map to determine how many points fall within each polygon.
Software
Desktop Mapping | MapInfo |
Operating System | Windows NT |
Data Inputs
Region of Interest | Ottawa, Ontario, Canada |
Projection | geographic |
Census Tract data | ottct.tab |
Postal Code Conversion data | pccf.tab |
Postal Code data file | postal.tab |
Method
The flowchart illustrates the process involved of this lab, in Gane-Sarson notation.
The following SQL was propagated to initially geocode the postal data, using the Lat and Long field data from the pccf data:
update postal add column 'latitude' float update postal add column 'longitude' float update postal set latitude = pccf.lat where postal.postal_ = pccf.postal update postal set longitude = pccf.long where postal.postal_ = pccf.postal..followed by the MapInfo Table -> Create Points function, mapping fields postal.longitude and postal.latitude to X and Y respectively.
When initially displaying the postal data with Ottawa Census Tract data, the points are not plotted as expected. Upon inspecting the postal data, we find that the latitude and longitude fields are not representative of common latitude and longitude values for the area. Here's a sample record of the current postal data:
Postal_ | latitude | longitude |
---|---|---|
K1B4E7 | 454230 | 756084 |
K1B4H1 | 454234 | 756059 |
K1B4H6 | 453992 | 756166 |
K1B4J5 | 453995 | 756186 |
Further investigation into the Statistics Canada Postal Code Conversion file documentation reveals that the latitude and longitude values must be either divided by 10000 or multiplied by 0.0001 to output decimal degrees values. Furthermore, we must multiply all longitudinal values by -1 to express coordinates west of central meridian.
Why this occurs seems to be a mystery, and at the discretion of the data provider. This can depend on specific data models, database schemas, etc. Luckily, the data provider supplies adequate documentation for end users. Regardless, the documentation provides this information to users so that these values are correctly converted for visualization or query purposes.
Below is some SQL to update the postal table as per above. Note that the SQL below sets these values and corrects them from the initial data ingest stage. So, the first steps to correctly populate the postal data are as follows:
update postal set latitude = pccf.lat * .0001 where postal.postal_ = pccf.postal update postal set longitude = (pccf.long * .0001) * -1 where postal.postal_ = pccf.postal OR update postal set latitude = pccf.lat / 10000 where postal.postal_ = pccf.postal update postal set longitude = (pccf.long / 10000) * -1 where postal.postal_ = pccf.postal..or one can work on the existing (incorrect) values:
update postal set latitude = latitude / 10000 update postal set longitude = (longitude / 10000) * -1 OR update postal set latitude = latitude * .0001 update postal set longitude = (longitude * .0001) * -1..whichever approach is taken, this results in output values indicative of the sample data snippet below:
Postal_ | latitude | longitude |
---|---|---|
K1B4E7 | 45.423000000 | -75.608400000 |
K1B4H1 | 45.423400000 | -75.605900000 |
K1B4H6 | 45.399200000 | -75.616600000 |
K1B4J5 | 45.399500000 | -75.618600000 |
Using SQL to Determine Data Points within Polygons
The following SQL was executed to derive how many postal code point locations from the postal data are within each census tract data polygon.
select ottct.cma_ct, count(*) from ottct, postal where ottct.obj contains postal.obj group by ottct.cma_ct into selectionThe group by option, combined with the SQL count function, enables sum counts for each census tract in the table.
The output table of this query is included here.
A map of the Ottawa Census Tracts and the geocoded postal code points is included here.
About Postal Codes and Data EncodingIn this instance, supported with documentation, the postal code data serves as a useful data product. However, what happens if the documentation cannot be found, or is misplaced? In addition, what happens if a user does not have MapInfo software, or a MapInfo data reader?
Below is an example of GML encoding of a postal code information geocoding service, hosted by the GeoConnections Discovery Portal (CEONet). This service is web accessible, and accepts 3 character forward sorting area (FSA) codes as input, and outputs GML.1 format, which includes geographic coordinates of the given FSA. Below is a sample request URL, which can be typed into a common web browser:
http://ceonet.gc.ca/postalcodeserver?code=k2c&content-type=xml&..the output of this request is as per below:
<PostalCode sortArea="FSA"> <SpatialReferenceSystem srsName="epsg:4326" Dimension="2"> <Geographic name="epsg:6326"> <Datum> <DatumName>WGS 84</DatumName> <Authority>EPSG</Authority> </Datum> <PrimeMeridian ID="epsg:8901"> <Name>Greenwich Meridian</Name> </PrimeMeridian> <CoordinateAxis ID="epsg:9902"> <Name>Geodetic longitude</Name> <AngularUnit ID="epsg:9102"> <ConversionFactor>pi/180</ConversionFactor> </AngularUnit> </CoordinateAxis> <CoordinateAxis ID="epsg:9901"> <Name>Geodetic latitude</Name> <AngularUnit ID="epsg:9102"> <Name>Decimal Degree</Name> <ConversionFactor>pi/180</ConversionFactor> </AngularUnit> </CoordinateAxis> </Geographic> </SpatialReferenceSystem> <gml:name>K2C</gml:name> <status>SUCCESS</status> <placeName>Ottawa:+Nepean</placeName> <province>ON</province> <gml:centerOf> <gml:Point srsName="epsg:4326" Name="Geographic"> <gml:coordinates>-75.718277,45.372131</gml:coordinates> </gml:Point> </gml:centerOf> </PostalCode>
This type of encoding is self-describing and documenting, resulting in a vast array of opportunities for data services, such as the online services provided by CEONet. A sample scenario here would be to use this web service within a given application (such as MapInfo or ESRI ArcView), and plot postal code points dynamically, from the web, into a given view / map. Such interoperability enables application neutral data and services, as well as data closest to the source.