|
Red Queen Review Engine User Manual
ADDING COLUMNS TO A TABLE
|
|
« Table of Contents
|
Obtain Red Queen »
ADDING COLUMNS TO A TABLE
More than likely you will want to add custom columns to at least the Item,
Member, and Supplier tables. Possibly you might at some stage think
you also need to add a column or two to the Category, Team, or
YellowPage tables, but most Red Queen users will probably find that
unnecessary. Either way, the procedure for modifying the structure of a table
is the same, regardless of the table involved.
There are quite a few tables with a PRIMARY KEY in Red Queen, and each of these
has been made available for inspection in the Database menu of the
Database control panel. However, that does not mean you should consider
modifying their structure. In general, unless you have a really good reason
for doing it, you should restrict table modifications to those mentioned in the
previous paragraph, and probably just the first 3 at that (the "Thing" tables).
To add a column to a table you first select the "Columns" option of the
Database menu, as well as the name of the table to which the column will
be added. Clicking on the submit button for this menu will bring up the
Column Properties frame for the table, summarizing the existing columns.
At the bottom of this frame you will find a link labeled "Add Column". When you
click on that you will be presented with a form something like the following
(here we'll assume we are dealing with the Item table). The same javascript
constraints that are normally present are embedded in this page too--at least
for the following set of form elements, though not for others later in the
page--so you can try different combinations of properties to see if they are
accepted (although the constraints aren't so exhaustive that you won't be able to
come up with inputs that wouldn't be accepted by Red Queen in practice):
For the full definition of each of the form inputs shown above, see
COLUMN PROPERTIES. On this page will be
provided some examples on how to fill out this form in order to add one column type
or another. Not all possibilities will be covered, just the ones that you are
more likely to use. These are the column types
TEXT,
INT,
ENUM,
DATE,
FLOAT.
TEXT Column
The TEXT
column is used when you want to collect a sizeable block of text as one of
your record elements (up to 64 kB). For the Item table we already have an
Item.description column. But you might want to add more such columns for
any number of different reasons. At the very least, if you break textual input into
a number of different columns you can position these independently of one another
when it comes to displaying the record. Everything we say here about specifying
a TEXT column also applies to the
CHAR,
VARCHAR,
TINYTEXT, MEDIUMTEXT, and LONGTEXT
column type too (these are are string column formats but differ in the
maximum input length of the string and the amount of memory required to store the data).
Let's pretend we want to add a Design Specification field which the manufacturer of
an Item used to blueprint the Item before the manfacturing stage. However, let's
also assume that the Design Specification might not always be known or available,
so we want to allow for the possibility that the spec will not be available when
we come to creating a new record. We'll insert the new column after the existing
Item.description column, and to make things interesting we'll make the
new column a searchable column too, so that our search engine can find query terms
embedded in our Design Specification (in practice we would only add a search index
to the most important columns, so as to reduce the disk space required to store
the search index for each column). Below are the filled
out form elements that might be used to specify our Item.design_specs column
(note: this form is non-functional and has had excess options removed from it):
We've given the column a Search Weight of 1, so search terms that appear in the
Design Specification would have the same weight as those that appear in the Description
field. To give more importance to search terms found in the column, increase the
weighting. Notice that our Not Null value is No because we anticipate
missing spec data. Also the Form Type we've chosen is a TEXTAREA form element
with 70 columns and 5 rows (the number of columns and rows are specified as a pair
of comma-delimited integers). If we wanted our Design Specification field to be treated
in a cAse sENsitIVE way we would set the Attributes value to BINARY, rather
than leave it blank (in practice the only time you might specify BINARY strings is
for short strings where case might be important, or for actual BINARY data--though
you would normally use a
BLOB for such data).
The form elements that don't apply to the specification of a
TEXT column are the
Column Size/Precision, Column Values, Default, Form Values,
and Form Regex.
INT Column
If you need to supply a count of some aspect of your record you would use one of the
integer columns, TINYINT,
INT, SMALLINT, MEDIUMINT, or BIGINT.
We'll assume here that the
INT column type is most suitable
for the new column we're considering, say, because we are recording the number of
units sold, and this could get fairly big. The name of our column will be
Item.units_sold. To add our column just after the existing Item.item_image
column we would fill out the form to look something like the following (note: this
form is non-functional and has had excess options removed from it):
Things to note here are that we've selected
UNSIGNED for the
Attributes element
because we know the number of units sold cannot be a negative integer, so we double
the range of positive integers that can be recorded. By selecting Yes for
the Not Null element we are saying that we will always have a value to go
into the column, even if it is zero.
For collecting the integer value we have specified a
TEXT Form Type with
a size of 10. We've also specified a regular expression to be applied to the column
value at the time the record is created (or modified). This is the ^\\d+$
expression in the Form Regex which tests for the presence of one or more
digits only in the column value. If the regular expression test fails, the record
will not be reated (or modified).
The form elements that do not apply when creating an integer column type are:
Column Values, Search Weight, and Form Values.
ENUM Column
Suppose you wish to add a field to your Item table which can take any one of a finite
number of values, and you want the person inputting the record to have to choose the
value from a select menu. The ideal column choice in this case is the
ENUM column
type. Let's suppose the field will represent the size of the Item, and that
the size value for any Item record MUST be one of the 3 possible values small,
medium, or large. To add an Item.size column just after the
existing Item.is_validated column (another
ENUM) we would fill out
the form to look something like the following (note: this form is non-functional
and has had excess options removed from it):
Note that the Column Values are separated by linebreaks, and that corresponding
values appear in the Form Values textarea box. These latter values are what the
user will see in the drop-down select menu, and represent the human-readable
values (the Column Values, after all, could have been supplied as numeric quantities,
though in practice sticking to human-readable values there as well is almost always a good
idea if you can manage it). The number of values supplied for the Column Values
and the Form Values should, of course, match. Note also that we have specified
a value for the Default, so that this value will be automatically selected when
the select menu is constructed. As an alternative to the select menu,
we could also have chosen RADIO for the Form Type, instead of SELECT. This
would force the user to click on a radio element when specifying the Item size instead
of selecting from a menu.
Note also that we have specified that the column should be NON NULL--every Item should
have an associated size. As for the Column Size/Precision, Attributes,
Search Weight, and Form Regex inputs, these are meaningless in the
context of an
ENUM column. Leave them blank.
The Form Size/Length also should not be specified for either a select
menu or radio form element.
Finally, if the column we were adding was for administrative use only, and we did
not want it to appear on the Add/Modify forms seen by regular Members, then we would
also toggle the last form element, the Hide on Add/Modify Forms radio element,
to Yes.
DATE Column
The DATE,
DATETIME,
TIME,
YEAR, and
TIMESTAMP
columns are similar in behavior. The form elements required to specify one of these
column types are few. Here we suppose we wish to add a Date Available field to the
Item table (column Item.date_available to go after the existing
Item.hits column). We assume the column value might not always be known, so
we allow it to assume the NULL value. The date-like columns have a well-defined
input format which can be translated into a menu format for collecting year,
month, day, hour, minute, and second values by choosing DATE_SELECTABLE for the
Form Type (the range of selectable years can, if necessary, be adjusted
by editing the db_date_field_input.ttml template, found in
/_lib/templates/cpman of your CGI bin area). Otherwise, you can simply select
TEXT for the Form Type to force the user to enter the date in the correct
format, and then check this with a suitable Form Regex condition. For example,
^\\d{8}$ might be used to force collection of a date with a 4-digit year
like YYYYMMDD.
Thus we would fill out the form for a DATE column to look something like the
following (note: this form is non-functional and has had excess options removed from it):
While it is possible to add a Default date, you are not likely to want to do
that. The other form elements that have been left blank are not required when specifying
one of the date-like column types.
FLOAT Column
Our final example column addition involves the
FLOAT column type, which is
used to store decimal numbers. There are sibling column types like DOUBLE and DECIMAL,
so check which is best suited to your data before plunging ahead. For a money column
you might want to use DECIMAL, but here we are considering how to add a Weight column
to the Item table, where our weight is measured to the nearest 1/100 of a kilogram.
We specify the form elements for our Item.weight column as follows (note: this
form is non-functional and has had excess options removed from it):
Notice that we have specified for the Column Size/Precision element the pair of
integers 8,2. This means 8 character is total to display the value, with 2 of these
used for the part appearing after the decimal point. This is also reflected in the
regular expression devised to catch input values that do not conform to the expected
format.
Other Column Types
On this page we have covered in some detail how to add to your tables 5 of the more
commonly used column types. You are not restricted to these 5 types. Red Queen will
let you create all the standard column types, bar the BLOB varieties. This is because
only file uploads might require this column type and instead, accommodation has been
made for storage of these files in the file system itself. To understand all of the
options available to you when adding or modifying columns, study the information
presented in COLUMN PROPERTIES.
Special Form Types
When adding a new column it is important to select not only the correct column type
to represent your data internally, but also some thought needs to be given to how
it will be displayed. This is the province of the Form Type, of which there
are some special cases.
If the column you add is intended to represent an uploaded
file, you should select one the types PUBLIC_FILE, PRIVATE_FILE, or if the file
corresponds to an image, PUBLIC_IMAGE or PRIVATE_IMAGE. The first 2 of these types
can be downloaded by visitors to your site (the second only by registered Members),
while the image types are formatted for viewing when a visitor or Member clicks
on the link for the file.
If the column you add is intended to represent a URL, selecting URL for the
Form Type will turn the column value into a formatted hyperlink when the
value is presented on the page. The URL_STATUS Form Type is reserved for
future use and is presently non-functional.
If the column you add is intended to represent a date or time of some time, you
should use either the TEXT or DATE_SELECTABLE Form Type. The other available
type is DATE, but this is reserved for use with date-ranges, such as when searching
for records that fall within a given time period. So you should avoid trying to
use it.
ZIPCODE Form Type (MEDIUMINT column)
One of the more useful Form Types is the ZIPCODE type. When a MEDIUMINT column is designated as a zipcode holder
in this way, it becomes possible to use the column to locate other records whose zipcode can be considered
to be "nearby". Thus, if the Item table was used to represent restaurants, and a column was added named
Item.zipcode, and the Form Type was chosen to be ZIPCODE, then the search engine would allow one to search
for all restaurants within X miles of a supplied zipcode. This designation can also be applied to a (single)
column in the Member or Supplier tables.
Provided that you downloaded the zipcode related data before installing
(from http://www.randommouse.com/cgi-bin/rms/product/obtain/obtain_zipcodes.cgi), your zipcode tables will
allow zipcode searching out to a range of 10 miles from a given zipcode. For customers who have purchased
a key, an extra script will be provided on demand so that you can extend this zipcode-based searching
out to a distance of up to 100 miles.
When a zipcode field is added to any of the Item, Member, or Supplier tables,
searches will acquire an advanced search form element that looks something like the following,
allowing the results to be restricted to "nearby" zipcodes:
In order to use this type of geographic search, you first need to add a column to the table (if it does
not already exist) which can carry a 5-digit zipcode. The column definition form should look something
like the one shown below. Note that unless you have a very good reason to force members to
supply a zipcode, it is better to leave this as an optional field. The form of the regular expression used
to check the zipcode field allows for an empty value, otherwise if one were forcing members to supply a
zipcode for the record, one might use ^\d{5}$ (i.e. value must match exactly 5 integers)
as the value of the regular expression property.
Because this column has been added to enable zipcode-based searches, it is important to add an INDEX on
the zipcode column to make searches efficient. To do this, click on the Add Key link found
at the bottom of the column definition page. You should elect to add a key using the following
parameters:
Key Name: zipcdndx
Key Type: INDEX
Key Definition: zipcode
It is also important that when you add a ZIPCODE column to a table you make sure you do NOT have any
other columns of the POSTCODE_UK or OUTCODE_UK type. These other column types control UK postcode lookups
as discussed in the next section. If you do have such column types, either change the Form Type to a
plain TEXT field, or delete the column entirely.
To increase the out-of-the-box lookup distance from 10 miles out to 50 or 100 miles you need to run a script named build_ZipcodeNearby.pl which you can obtain from Random Mouse Software after purchasing
a software key. This script is designed to rewrite the ZipcodeNearby table, making it considerably
larger, so that inter-zipcode distances are pre-computed and can be quickly looked up when searches are
performed. After running the script and rebuilding the ZipcodeNearby table, you would update the search_adv_zipcode_distance_cutoff configuration variable to reflect the new range (like 50 miles).
When this is done, the drop down menu on the proximity search interface will display the new lookup ranges.
The build_ZipcodeNearby.pl script needs to be run via the command line after logging in with
either telnet or SSH. Instructions are in the header section of the file, so read it before you attempt
to run it. You will need to edit the file to add your MySQL database parameters, and to adjust the maximum
lookup range (for example, lookups out to 100 miles require approximately 150 MB of MySQL disk space for
the ZipcodeNearby table, while only 66 MB is needed for a 50 mile lookup). It can take some time
to run this program, perhaps an hour or more, so instructions on how to run the script as a background
process are provided in the file.
For more information in the use of zipcode columns, see the SOCIAL REVIEW SITE tutorial.
POSTCODE_UK Form Type (CHAR column)
Another special Form Type is the POSTCODE_UK type. In this case a CHAR column is used to designate a
postcode (such as ME1 1NK--note the space between the out code which starts the postcode, and the in
code which ends it) in the United Kingdom. As with U.S. zipcode columns, a U.K. postcode column is
used to locate other records whose postcode is considered "close" (within, say, a 20 mile radius of a
stated postcode). Again, this designation can be applied to a (single) column in the Item, Member,
or Supplier tables, and the search engine will automatically offer range-based searches. Actually,
that is not quite the whole truth. To activate postcode-based proximity searches another column,
which will represent only the out code portion of the postcode, also needs to be added. We will
get to that.
Provided that you downloaded the U.K. postcode related data before installing
(from http://www.randommouse.com/cgi-bin/rms/product/obtain/obtain_postcodes_uk.cgi), your postcode tables
will allow postcode-based searching out to a range of 10 miles from a given postcode. Customers who have
purchased a key can request a special script which will allow this searching to be extended out to a
distance of up to 100 miles.
When a postcode field (actually two fields) is added to any of the Item, Member,
or Supplier tables, searches will acquire an advanced search form element that looks something
like the following, allowing the results to be restricted to "nearby" postcodes:
In order to use this type of geographic search, you first need to add a column to the table (if it does
not already exist) which can carry a 6-8 character postcode (the full postcode). The column definition
form should look something like the one shown below. Note that unless you have a very good reason to
force members to supply a postcode, it is better to leave this as an optional field. The form of the
regular expression used
to check the postcode field allows for an empty value, otherwise if one were forcing members to supply a
postcode for the record, one might use ^(|[a-zA-Z0-9]{2,4}\s[a-zA-Z0-9]{3})$ (i.e. if supplied, the value
must match 2 to 4 alphanumeric characters, a space, then another 3 alphanumeric characters) as the value
of the regular expression property.
But this is only half of the story. We also need to add a column with a Form Type of OUTCODE_UK which will
be auto-populated with just the out code portion of a supplied postcode. This will be a hidden column that
is never presented to the public, but which IS used to perform the searches (which only deal with the
more granular out codes). Here is what the form looks like to add the OUTCODE_UK column:
Note that this column is accessible to admin, but does NOT appear on any public forms (see the last form
element which specifies that the column should be hidden from the public). As a practical matter, as long
as this out code column exists you should be able to ignore it. The value it takes will be derived behind
the scenes from the value of the full postcode. It never needs to be entered directly.
As with zipcode-based searches, it is important to add an INDEX on the OUTCODE_UK column to make
searches efficient. To do this, click on the Add Key link found at the bottom of the column
definition page. You should elect to add a key using the following parameters:
Key Name: outcdndx
Key Type: INDEX
Key Definition: outcode
It is also important that when you add POSTCODE_UK and OUTCODE_UK column types to a table you make sure
you do NOT have any other columns of the ZIPCODE type which controls US zipcode lookups
(as discussed in the previous section). If you do have a ZIPCODE column type, either change the Form Type
to a plain TEXT field, or delete the column entirely.
As explained for the Zipcode lookups, you will want to expand the range of proximity searches from the
limited out-of-the-box configuration of 10 miles, to some maximum possible range like 50 or 100 miles.
To do so you will need to contact Random Mouse Software after you have purchased a software key
and obtain a script named build_PostcodeNearbyUK.cgi which is to be run as a command line process
(see the discussion above for the Zipcode case for more details). UK postcodes are defined over a much
smaller geographical region that US zipcodes, so the amount of data required to store pre-computed
inter-postcode distances is much smaller also. Even a proximity search table with a maximum 100 mile
lookup only requires about an extra 10 MB of MySQL storage.
Note: After adding a UK postcode proximity search, and extending the range of the lookups, be sure
to adjust the value of the search_adv_postcode_distance_cutoff variable which determines the
maximum lookup range offered on the proximity search pages. You can find this variable on the
Configure > Search page.
POSTCODE_CA Form Type (CHAR column)
Similar to the POSTCODE_UK type is the POSTCODE_CA Form Type. This one is used to characterize Canadian
postcodes and allows proximity searching on the roughly 800,000 Canadian postcodes. Once again, the
POSTCODE_CA designation can be applied to a (single) column in the Item, Member, or Supplier tables,
and the search engine will automatically offer range-based searches.
Provided that you have downloaded the Canadian postcode related data
(from http://www.randommouse.com/cgi-bin/rms/product/obtain/obtain_postcodes_ca.cgi), you can run a special
command line script to create your PostcodeCA table, which will hold the 798,143 postcodes:
> cd /path/to/cgi-bin/rs/redqueen
> perl build_PostcodeCATables.pl
Because there are so many postcodes to deal with, Red Queen does not allow for any out-of-the-box Canadian
postcode proximity searches. You must first create the distance lookup tables by running a special script
to build those tables. As the lookup tables are very large, the build process will take perhaps a couple of
months to complete. No, that's not a typo. Your server will spend around one or two thousand hours of CPU
time calculating the inter-postcode distances, depending on the maximum lookup range you specify for proximity searches. For more information about the process for building the lookup tables (one per
province), read the content of the file named build_PostcodeNearbyCA.pl (which is available only to Red Queen customers). Customers who have purchased a key can request this script to build search capability out to a
distance of up to 100 kilometers--although, due to the size of the tables required to implement long range
searches, the practical limit for building proximity lookup tables is probably 30 kilometers.
Because the time required to build the proximity lookup tables is so long, you will want to run the build process as a background process (so that you can logout without killing the process), but only if you have a dedicated server that you share with no-one else. Otherwise you will likely want to run it in the foreground to ensure that it gets allocate it share of CPU time. The command for placing the process in the background, it the following:
> cd /path/to/cgi-bin/rs/redqueen
> nohup perl build_PostcodeNearbyCA.pl &
Please read the documentation contained in the header area of the script build_PostcodeNearbyCA.pl
before launching the program.
Size Of Lookup Tables
To give you an idea of the amount of disk space that is required to create proximity lookup tables, have
a look at the table below. This table will be augmented as more data points are generated. Sizes are
given for a specified maximum lookup range in kilometers. If you do not need to generate lookup tables for
every province you can disable the provinces you don't need in build_PostcodeNearbyCA.pl. Note that the Quebec and Ontario tables are particularly huge.
PROVINCE # RECORDS SIZE MB (30km) (40km)
---------------------------------------------------------
AB
BC
MB
NB 57134 278 888
NL 10410 140 134
NS 25181 572 539
NT
NU
ON
PE 3212 20 17
QC 187284 14,987
SK
YT
Proximity Search Form
When a Canadian postcode field is added to any of the Item, Member, or Supplier
tables, searches will acquire an advanced search form element that looks something
like the following, allowing the results to be restricted to "nearby" postcodes:
In order to use this type of geographic search, you first need to add a column to the table (if it does
not already exist) which can carry a 6 character postcode (the full postcode with NO spaces anywhere in it).
The column definition
form should look something like the one shown below. Note that unless you have a very good reason to
force members to supply a postcode, it is better to leave this as an optional field. The form of the
regular expression used
to check the postcode field allows for an empty value, otherwise if one were forcing members to supply a
postcode for the record, one might use for the value of the regular expression property ^(|[a-zA-Z0-9]{6})$
(i.e. if supplied, the value must match a 6-character alphanumeric).
As with zipcode-based searches, it is important to add an INDEX on the POSTCODE_CA column to make
searches efficient. To do this, click on the Add Key link found at the bottom of the column
definition page. You should elect to add a key using the following parameters:
Key Name: pstcdndx
Key Type: INDEX
Key Definition: postcode
It is also important that when you add a POSTCODE_CA column type to a table you make sure
you do NOT have any other columns of the ZIPCODE or POSTCODE_UK/OUTCODE_UK types
(as discussed in the previous sections). If you do have any of these column types floating around,
either change their Form Type to a plain TEXT field, or delete the columns entirely, as you cannot
presently add searches on more than one proximity-type search column.
Note: Some of the PostcodeNearbyCA tables that are constructed to perform proximity searches are LARGE.
This is because there are approximately 800,000 postcodes covering Canada and a lot of them cover very small
geographic regions, so there are a lot of "neighboring postcodes". For this reason, the lookup tables are
constructed for each province and those that are not needed need not be computed when the
build_PostcodeNearbyCA.pl script is run (which can take a significant amount of time to run to completion).
After adding a Canadian postcode proximity search, be sure
to adjust the value of the search_adv_postcode_distance_cutoff variable which determines the
maximum lookup range offered on the proximity search pages. You can find this variable on the
Configure > Search page.
« Table of Contents
|
Obtain Red Queen »
Copyright © 2004 Random Mouse Software. All Rights Reserved.
|