How To Add Google Maps To Your Review Site

GEOCODING -- Review Foundry Tutorial 08

Adjust Text:  a a a a
« Review Foundry User Manual   |   Tutorial Table Of Contents   |   Obtain Review Foundry »


GEOCODING

How Does The Geocoder Work?

As mentioned earlier, geocoding is the process of turning a human-friendly physical address, like the one that appears on letters mailed to you, into a pair of latitude and longitude coordinates that can be used to generate a geographical map of the region surrounding the physical address.

Under normal circumstances, when you have activated the Google Maps feature for a given table, geocoding takes place when a new record is added. It is not carried out when a record is edited since that would involve sending a request to Google, and in most instances this is unnecessary because the record has already been geocoded. If you need to override this behavior, temporarily toggle the configuration variable gmaps_geocode_when_editing.

If you have existing records in your table that have yet to be geocoded, you can run the geocoder from the command line and geocode records in bulk. More on that later.

The geocoder works by formatting the addressing columns of the record into a string that can be sent off to Google for geocoding. Google returns an XML structure that contains various pieces of data about the geocoded address. Among them are the latitude and longitude and also the geo_accuracy they have assigned the coordinates. You do not need to know anything about that XML data structure, which is parsed automatically for the latitude, longitude and geo_accuracy values. But you do need to know how the addressing columns are formatted into the string that is sent to Google for geocoding.

Formatting The Address String

The Google geocoder API requires the kind of physical address you would add to a parcel to get it delivered somewhere in the world. Review Foundry generates a string of this type by concatentating a number of addressing columns that are available for the table to be geocoded. Column values are joined with a comma (and a space), so that the string might look like "1600 Amphitheatre Parkway, Mountain View, CA, 94043", which is Google's address. If a column value is missing, it is simply omitted from the concatentation, possibly resulting in a geocoded result that has a lower geo_accuracy.

For example, your Supplier table might have the following columns available for addressing (with example values shown):

addr_street   1600 Amphitheatre Parkway
addr_city     Mountain View
state_id      CA
addr_zipcode  94043
country_id    223

Which of these columns is used in the formatting of the addressing string is up to you. If appropriate, use the columns shown above, and in that order, since this most closely represents a mailing address and is likely to result in geocoded records with the maximum accuracy (which is represented by the number 8 in Google's accuracy measure).

Column selection is performed on the Configure > Google Maps page where you transfer columns from a left menu to a right menu to perform the selection. The order of the selected columns is important for the right menu. For example, if you inverted the order of the first two columns shown above, addr_street and addr_city, the addressing string you send to Google would look like:

Mountain View, 1600 Amphitheatre Parkway, CA, 94043, United States

When Google receives a string like that which it cannot make sense of, it does the best job it can, and perhaps it decides that the parts which it can understand are:

Mountain View, CA, 94043, United States

So you get back latitude and longitude coordinates for the post office at zipcode 94043, which is not what you wanted, and only has a Google accuracy of 5 (which means to the nearest couple of miles).

Note that in the example above, the country_id column value is 223, which, if we look at the record in the Country table with id = 223, is the United States. It is possible to perform such lookups of foreign key values if that needs to be done. You can specify such lookups from the Configure > Google Maps page AFTER you have chosen the addressing columns. Note that in Review Foundry the state_id column holds two-letter state values, like 'CA' which can be used as part of the addressing string without converting them to another form, like 'California', which would require a lookup.

Geocoding Records In Bulk

To geocode all the records in an Item, Member, or Supplier table, telnet into your web account and go to the admin directory for Review Foundry. If you wanted to geocode every record in the Item table you would type the following:

> cd /some/path/to/foundry/do/admin
> perl geocoder.pl --table Item --range all

The range parameter determines which records in the table should be geocoded. Here is the documentation for that variable, taken from the RedQueen::Admin::Paid::Geocode::Geocoder class:

Additionally you must add a 'range' specification for the geocoding
which lets the geocoder know which records are to be treated. Here
are examples of the possibilities: specified either by an integer,
or by a recognized string:

    range = 3             e.g. geocode the first 3 records in the table
    range = 'all'         e.g. geocode everything
    range = 'all_null'    e.g. geocode everything that has not been geocoded already
    range = 'from_4'      e.g. geocode records from id = 4 onwards
    range = 'from_4_to_8' e.g. geocode records from id = 4 to id = 8

or even these:

    range = 'all_null_from_4'      e.g. handle non-geocoded records from id = 4 on
    range = 'all_null_from_4_to_8' e.g. handle non-geocoded records from id = 4 to id = 8

These options allow you to:

    (1) test geocoding for a few records.
    (2) attempt to geocode everything.
    (3) attempt to geocode everything that has not been geocoded already.
        all records are selected and then only those with a NULL for
        the geo_accuracy column are processed.
    (4) geocode from a lower limit specified by a stated record ID
        value--which may be necessary if the geocoder dies part way
        through for any reason and you need to restart where it stopped.
    (5) geocode a range of records.
    (6) geocode non-geocoded records starting at a given record.
    (7) geocode non-geocoded records in a given range of record ID values.

There a full specification of the geocoder invocation might look
like this (geocoding the Item table records with ID 4 through 8):

> perl geocoder.pl --table Item --range from_4_to_8

Or if Google drops your connection because it determines you are making
too many requests, you can repeatedly run the following command which
picks out all records not already geocoded and attempts to geocode them:

> perl geocoder.pl --table Item --range all_null

By the way, you can read the full documentation for the class by typing the following at the command line when in the admin directory:

> perldoc RedQueen::Admin::Paid::Geocode::Geocoder

Unfortunately Google rate-limits your geocoding requests, so it will stop sending back coordinates after a few thousand records (15,000 as of time of writing). The process will abort at this stage with an error message which includes the phrase [no geocoder data]. You can confirm that you are getting no response from Google by invoking this script in your browser:

http://yourdomain.com/cgi-bin/rs/foundry/do/admin/geocoder_test.cgi

If Google is sending back an empty string for the geocoded data you will just see something like:

    <pre></pre>

whereas a non empty string would display an XML data structure between the pre tags. When you see a full XML record you can expect your bulk record geocoder to function correctly again (at least for another thousand or so records).

As the geocoder chugs through records you will see output that looks like the following:

updated Item.id 6922    accuracy: 8   lat: 28.065010   long: -82.459358
updated Item.id 6923    accuracy: 8   lat: 28.869348   long: -81.252468
...

It will be normal to see an accuracy of 8 reported if your addressing string contains a street address, since 8 represents street-level accuracy in Google's accuracy scheme. Occasionally you will see lower scores like 5 which, upon inspection, may reveal that you have supplied a post office box for the address. Or your address simply may not include a street address (e.g. a state park might be specified by a street name only).

Result Of A Successfully Geocoded Record

When you have geocoded a record, the result is simply the population of the latitude, longitude, and geo_accuracy columns. So those columns for a single record might look like this:

latitude       34.143963
longitude     -118.107300
geo_accuracy   7

Once these values are available for a record it is possible for Review Foundry to construct the necessary javascript to place on a public page to create a Google Map. If a record is listed on a page as a result of a proximity search, but it has no geocoded coordinates, it will not appear on the Google Map (even if the record has a well-defined zipcode or postcode).

Testing Without The Geocoder

You can test whether Google Maps work for your Review Foundry installation WITHOUT purchasing the Geocoder module. You can simply add some latitude, longitude, and geo_accuracy values by hand to your existing records. Here are some well-known addresses and the corresponding coordinates for them obtained from geocoding. Note: these addresses are all located fairly close to each other.

White House address: 1600 Pennsylvania Ave NW, Washington, DC 20006
latitude       38.898774
longitude     -77.036655
geo_accuracy   8

Library Of Congress address: 101 Independence Ave SE, Washington, DC 20003
latitude       38.887556
longitude     -77.005921
geo_accuracy   8

Capitol Building address: E Capitol St NE & 1st St NE, Washington, DC 20515
latitude       38.889975
longitude     -76.960739
geo_accuracy   6

How To Handle Missing Coordinate Data

Not all requests sent to Google via the geocoder will result in responses with useful latitude and longitude coordinates. Sometimes Google cannot send back information it has on an address because of legalities involving the surveying companies it uses to provide data. When that is the case you might see a 603 response for the status code displayed in the XML record received from Google (this status code is recorded in the geo_status_code column for every record that the geocoder handles). In such cases your only resort may be to find the latitude and longitude information some other way.

For example, if you use the geocoder to attempt extraction of Irish coordinate data you may be disappointed to see you get nothing back. In this case (at the time of writing, at least) it appears that Ordnance Survey (the UK government agency that provides Google with their England, Scotland and Wales mapping data) will not allow them to make this geographic addressing information public.

To get around this problem you can simply manually enter latitude and longitude info for a given record that contains information about an Irish address. You need to enter these latitude and longitude values as decimal quantities. Do NOT enter them in minutes and seconds.

Review Foundry only requires that a latitude and longitude column value exist, and that the geo_accuracy column value be greater than zero for the location to generate a marker on the associated Google Map (use 8 for the geo_accuracy value if you are reading the coordinates off a GPS unit which is generally good to 10 meters or so, or if you are using a city map of the area).

One important point you need to be aware of when adding coordinates manually, is that if you set the gmaps_geocode_when_editing configuration variable to Yes, rather than its default value of No, an attempt will be made to geocode the record whenever you modify any of the content for that record. However if Google does not send back usable latitude and longitude information, no attempt will be made to automatically update these column values, so your hand-entered corrdinate data will be preserved. Should Google suddenly start providing the information, of course, the new coordinate values would replace your hand-entered data.

Finally, the geo_status_code column value is just an indicator of Google's response to a geocode request. So you do not need to set that to any particular value when manually entering coordinate data. In fact you might want to leave it as 602, or whatever, to remind you that the record has problems when geocoding via Google.

One more thing. If Google decides that you are requesting geocode data too rapidly it sends back a 620 (not a 602) response. The Review Foundry geocoder is designed to spot this and slow the rate at which queries are made, increasing the delay time between requests by 1/10 th of a second with each 620 that is returned. This should be sufficient to keep the geocoder running rapidly, but not too rapidly.

Trouble Shooting

The geocoder makes use of several fairly standard Perl modules to perform geocoding. These are listed below. You likely have them already installed. If not, have them installed before proceeding. The XML::SAX module appears to be required by XML::Simple which is used to convert the XML record sent back from Google into an equivalent Perl hash that can be parsed and used for the record import.

    use Fcntl;
    use Socket;
    use Getopt::Long;   ## used to parse the command line arguments
    use LWP::Simple;    ## use for file fetching
    use XML::Simple;    ## note: use of XML::Simple required XML::SAX which
    use XML::SAX;       ## required Encode::ConfigLocal

I am not entirely sure that XML::SAX needs to be installed on you system if XML::Parser is available instead (XML::Simple makes use of the first class if available, else attempts to use the second class). If your system is missing XML::SAX you might try commenting out the line in this file (a little way down from here) which reads:

    use XML::SAX;

Place a # at the beginning of the line to comment it out. If it appears you still need XML::SAX to process your imports, ask your web hosting company to install it for you.

Also, if you receive an error about a missing Encode::ConfigLocal class (which is used by XML::Simple) this is probably because your web hosting company has an ancient copy of the Encode class installed. Tell them is issue the following command to rebuild the class:

    enc2xs -C

Next Section: CUSTOMIZING MARKERS

« Table of Contents


Copyright © 2004 Random Mouse Software. All Rights Reserved.