Point-in-Polygon Statistics by Geocoding using MapInfo


Scope
This lab covers issues dealing with geocoding a point dataset, and overlaying the points on a polygonal vector map to determine how many points fall within each polygon.

Software
Desktop Mapping MapInfo
Operating System Windows NT

Data Inputs
Region of Interest Ottawa, Ontario, Canada
Projection geographic
Census Tract data ottct.tab
Postal Code Conversion data pccf.tab
Postal Code data file postal.tab

Method
The flowchart illustrates the process involved of this lab, in Gane-Sarson notation.

Analysis

The following SQL was propagated to initially geocode the postal data, using the Lat and Long field data from the pccf data:

update postal add column 'latitude' float
update postal add column 'longitude' float

update postal set latitude  = pccf.lat  where postal.postal_ = pccf.postal
update postal set longitude = pccf.long where postal.postal_ = pccf.postal
   
..followed by the MapInfo Table -> Create Points function, mapping fields postal.longitude and postal.latitude to X and Y respectively.

When initially displaying the postal data with Ottawa Census Tract data, the points are not plotted as expected. Upon inspecting the postal data, we find that the latitude and longitude fields are not representative of common latitude and longitude values for the area. Here's a sample record of the current postal data:

Postal_ latitude longitude
K1B4E7 454230 756084
K1B4H1 454234 756059
K1B4H6 453992 756166
K1B4J5 453995 756186

Further investigation into the Statistics Canada Postal Code Conversion file documentation reveals that the latitude and longitude values must be either divided by 10000 or multiplied by 0.0001 to output decimal degrees values. Furthermore, we must multiply all longitudinal values by -1 to express coordinates west of central meridian.

Why this occurs seems to be a mystery, and at the discretion of the data provider. This can depend on specific data models, database schemas, etc. Luckily, the data provider supplies adequate documentation for end users. Regardless, the documentation provides this information to users so that these values are correctly converted for visualization or query purposes.

Below is some SQL to update the postal table as per above. Note that the SQL below sets these values and corrects them from the initial data ingest stage. So, the first steps to correctly populate the postal data are as follows:

update postal set latitude  = pccf.lat  * .0001 where postal.postal_ = pccf.postal
update postal set longitude = (pccf.long * .0001) * -1 where postal.postal_ = pccf.postal

OR

update postal set latitude  = pccf.lat  / 10000 where postal.postal_ = pccf.postal
update postal set longitude = (pccf.long / 10000) * -1 where postal.postal_ = pccf.postal
   
..or one can work on the existing (incorrect) values:
update postal set latitude  = latitude / 10000
update postal set longitude = (longitude / 10000) * -1

OR

update postal set latitude  = latitude * .0001
update postal set longitude = (longitude * .0001) * -1
   
..whichever approach is taken, this results in output values indicative of the sample data snippet below:

Postal_ latitude longitude
K1B4E7 45.423000000 -75.608400000
K1B4H1 45.423400000 -75.605900000
K1B4H6 45.399200000 -75.616600000
K1B4J5 45.399500000 -75.618600000

Using SQL to Determine Data Points within Polygons

The following SQL was executed to derive how many postal code point locations from the postal data are within each census tract data polygon.

select ottct.cma_ct, count(*) from ottct, postal where ottct.obj contains postal.obj group by ottct.cma_ct into selection
   
The group by option, combined with the SQL count function, enables sum counts for each census tract in the table.

The output table of this query is included here.

A map of the Ottawa Census Tracts and the geocoded postal code points is included here.

About Postal Codes and Data Encoding

In this instance, supported with documentation, the postal code data serves as a useful data product. However, what happens if the documentation cannot be found, or is misplaced? In addition, what happens if a user does not have MapInfo software, or a MapInfo data reader?

Below is an example of GML encoding of a postal code information geocoding service, hosted by the GeoConnections Discovery Portal (CEONet). This service is web accessible, and accepts 3 character forward sorting area (FSA) codes as input, and outputs GML.1 format, which includes geographic coordinates of the given FSA. Below is a sample request URL, which can be typed into a common web browser:

http://ceonet.gc.ca/postalcodeserver?code=k2c&content-type=xml&
   
..the output of this request is as per below:
<PostalCode sortArea="FSA">
  
  <SpatialReferenceSystem srsName="epsg:4326" Dimension="2">
    <Geographic name="epsg:6326">
      <Datum>
        <DatumName>WGS 84</DatumName>
        <Authority>EPSG</Authority>
      </Datum>
      <PrimeMeridian ID="epsg:8901">
        <Name>Greenwich Meridian</Name>
      </PrimeMeridian>
      <CoordinateAxis ID="epsg:9902">
        <Name>Geodetic longitude</Name>
        <AngularUnit ID="epsg:9102">
          <ConversionFactor>pi/180</ConversionFactor>
        </AngularUnit>
      </CoordinateAxis>
      <CoordinateAxis ID="epsg:9901">
        <Name>Geodetic latitude</Name>
        <AngularUnit ID="epsg:9102">
          <Name>Decimal Degree</Name>
          <ConversionFactor>pi/180</ConversionFactor>
        </AngularUnit>
      </CoordinateAxis>
    </Geographic>
  </SpatialReferenceSystem>

  <gml:name>K2C</gml:name>
  <status>SUCCESS</status>
  <placeName>Ottawa:+Nepean</placeName>
  <province>ON</province>
  <gml:centerOf>
    <gml:Point srsName="epsg:4326" Name="Geographic">
      <gml:coordinates>-75.718277,45.372131</gml:coordinates>
    </gml:Point>
  </gml:centerOf>
</PostalCode>
   

This type of encoding is self-describing and documenting, resulting in a vast array of opportunities for data services, such as the online services provided by CEONet. A sample scenario here would be to use this web service within a given application (such as MapInfo or ESRI ArcView), and plot postal code points dynamically, from the web, into a given view / map. Such interoperability enables application neutral data and services, as well as data closest to the source.



Analytical and Computer Cartography Home