Review Foundry Review Engine User Manual

IMPORTING RECORDS

Adjust Text:  a a a a
« Table of Contents   |   Obtain Review Foundry »


IMPORTING RECORDS

Overview

For small sites that you might wish to build from scratch, entering records into your Review Foundry database one at a time is a reasonable approach. However, if you have an existing set of records that were created in another application, and you want to import these into Review Foundry, and especially if these are large record sets, then you won't want to be faced with the prospect of doing it by hand. In cases like these you can make use of the Import facility built into Review Foundry. The interface for this process is a script in the admin directory that you invoke from the command line. This means you will need to have telnet access to your web hosting account. To execute the script you will move to the Review Foundry admin directory and invoke something like the following command (the exact configuration file you will provide as an argument to the script will depend upon the kind of import you are performing):

    > cd /some/path/to/foundry/do/admin
    > perl importer.pl --config /some/path/to/ItemConfig.pm

The specific tables that can receive new or updated records by way of the import script are these ones:

        Item
        Category
        Supplier
        Yellowpage

Do NOT attempt to use importer.pl for any other tables.

To invoke the importer, you issue the simple command shown above from the command line (after you have moved to the Review Foundry admin directory). The importer takes only ONE argument, a path to a configuration file for the import. See the Command Line Arguments section below for details.

The importer.pl script works by parsing an import file that you create, in Excel form. That is, the plain text file consists of a series of newline delimited records. Each record is composed of fields separated by either tabs or commas (see the editable parameters). Below is an example of the kind of content that would be found in an import file. Note: the name of the import file must start with 'import_' be followed by the record type, and have either a .tsv or .csv extension depending on whether the field separator is a tab or a comma. For example, import_item.tsv for tab-separated item fields, or import_supplier.csv for comma-separated supplier fields.

You MUST place your import file into the /my/import directory of your RQ installation, as that is where RQ will look for import files.

In the example below, the (partial) file content is shown for tab-delimited Supplier records (for brevity, only a few fields are shown):

    sku      container_full_name  member_id  known_as  logo_image ...
    bignose  Noses/Big|Noses/Red  1          Big Nose  /some/path/big_logo.gif ...
    rednose  Noses/Red            1          Red Nose  /some/path/red_logo.gif ...
    ...

Some of these fields are recognizable from the default schema for the Supplier table. Others are not. In fact, in order to facilitate imports for Item and Supplier records, 2 auxiliary columns need to be added to the import table. These columns are named sku and container_import_list (not to be confused with the container_full_name field seen above in records--which will be discussed later).

Do NOT add the id column to any import file. Review Foundry will determine a suitable PRIMARY KEY value for the record. The sku column, discussed next, is used to identify import records for the Item and Supplier tables. If you do add the id column to your import file, the column will simply be ignored.

The sku column should be added ONLY to the Item or Supplier table if it is receiving import records (you should leave the column in place even after completion of the import). The sku acts as a unique identifier of a record, and should carry the same value if a record is re-imported at a later date (i.e. a record update). This sku column, if your table does not already have such a column, should be added using the Review Foundry Database control panel (see the Add Column link at the bottom of the page when you view the Columns associated with a table). Use the following definitions:

    For the 'sku' column
    --------------------
    Column Type: VARCHAR(32) [OR you can use INT if your sku values will be integers]
    Not Null: No
    Column Position: perhaps put this column after the 'keywords' column
    Form Display: SKU
    Hide on Public Add/Modify Forms: Yes
    Hide on Editor Add/Modify Forms: Yes

IMPORTANT: If your sku values will ALL be integers, you should make this column an INT type variable. That way you can perform SQL selects of records with sku values BETWEEN certain values if you need to in the future. For example, you might give all your 'Shoe' records skus that lie in the range 1000 to 2000, so they are easy to identify later on. Well, that's what I would do.

Next, if you are importing into EITHER the Item or Supplier table, there is anther column you should add. This is the container_import_list column (you do not need to specify values for this field in import records as it is merely used for internal bookkeeping during an import):

    For the 'container_import_list' column
    --------------------------------------
    Column Type: VARCHAR(255)
    Not Null: No
    Column Position: perhaps put this column after the 'sku' column
    Form Display: Container Import List
    Hide on Public Add/Modify Forms: Yes
    Hide on Editor Add/Modify Forms: Yes

Once these columns are added (or not if you are importing Category or Yellowpage records), the next step is to "configure" the Import module for importing into the relevant table. There are 4 configuration files, one for each of the Item, Category, Supplier, and Yellowpage tables. You might, of course, only need to configure one of these modules if you are going to import into only, say, the Item table. The importer reads the configuration file that you specify with the config command line argument to the importer.pl script (see the Command Line Arguments section below). Therefore you need to EDIT the relevant configuration file before attempting an import. Information about the configuration parameters is to be found in each of these configuration files.

If you wish to associate a list of Suppliers with each Item record you are importing, you can add one of two possible columns. If you add a 'supplier_id_import_list' column (column definition below), the value must be either an empty string or a comma (or pipe) delimited list of Supplier ID values from the Supplier table. The Suppliers MUST already exist for them to be associated with the Item. If instead you add the 'supplier_sku_import_list' column to your Item table, the list of comma (or pipe) delimited values should correspond to the SKU values in your Supplier table (these will be converted to Supplier ID values before the association is made).

Here are the associated column definitions for the two columns. You only need to add one of them to your table, and of course the column name must appear in the first line of your Item import file.

    For the 'supplier_id_import_list'
    and the 'supplier_sku_import_list' columns
    ------------------------------------------
    Column Type: VARCHAR(255)
    Not Null: No
    Column Position: perhaps put this column after the 'container_import_list' column
    Form Display: Supplier Import List
    Hide on Public Add/Modify Forms: Yes
    Hide on Editor Add/Modify Forms: Yes

Command Line Arguments

The importer needs to be told certain details about the records that are to be imported. There are 4 configuration classes, only one of which will be invoked during an import, depending on which table you are importing into. Use the config argument to specify the relevant configuration module:

    > perl importer.pl --config /some/path/to/ItemConfig.pm
    > perl importer.pl --config /some/path/to/CategoryConfig.pm
    > perl importer.pl --config /some/path/to/SupplierConfig.pm
    > perl importer.pl --config /some/path/to/YellowpageConfig.pm

Initially, you can find copies of these files in the following directory (but you should probably copy these to a location in your /my/import directory so that they persist from one upgrade to the next):

    /cgi-bin/rs/foundry/do/admin/Foundry/Admin/Import/Config/

Edit the relevant configuration file. There are instructions within each one to guide you.

Output to Screen

When you run the importer.pl script you will see some output printed to the screen as records are imported. In particular, if something goes wrong you should get an error message with useful debugging information. Study it for clues as to what might have happened. If your import records also have references to uploadable files, such as images, you might also see output generated from the image manipulation library you have set up with Review Foundry. This might occur if thumbnailing has been switched on and the thumbnailing code emits warnings as it goes about its job (you'll likely get such warnings if you use the NETPBM image library, as the author of Review Foundry does). You can ignore these messages, provided none is actually an error message.

When the script has finished importing records it will attempt to delete the temporary directory in /my/import that it uses to store any uploaded files. If for any reason the directory cannot be removed you will get an error message to that effect (in which case you can attempt to delete it yourself).

Importable Records

Records to be imported follow the Excel style formatting, with each record placed on its own line. The first line in the import file is special in that it contains the column names only. The sku column (if you are importing Item or Supplier records) contains a unique record identifier like 'abcde' or 'abc123' or '123'--any ALPHANUMERIC string (this includes INTEGERS) that you can repeatedly attach to a given record any time you format it into an Excel style file (the presence of this unique identifier will ensure that you do not import duplicate records). An sku can have any sequence of ascii characters in it, but it is recommended that you stick to any of a-z0-9 and the underscore.

To specify categories, or yellowpages, to associate with an Item or Supplier record you do so by supplying a container_full_name field, which is a pipe-delimited list of category names, or yellowpage names (corresponding to the full_name column in either the Category or Yellowpage table). Note: this field is NOT imported into the Item or Supplier table, it is used merely to assign Categories or Yellow Pages to the records that are imported.

Thus, if an Item record is to be associated with the two categories named:

           Noses/Big
    and    Noses/Red

then the container_full_name field would have the value:

           Noses/Big|Noses/Red 

Things to note here about naming the containers:

  1. There is NO 'Home/' prefix on the full category names specified in your file. The top-level 'Home' container seen in the Category, Team, or Yellowpage Browser found in the Review Foundry admin area is NOT a real container and the 'Home/' prefix does not appear in any category, team, or yellowpage full names. So do not use it! If you DO add 'Home/' to your container names, you will have a Home > Home > Some Name hierarchy in your Category, Team, or Yellowpage Browser, which is very likely NOT what you want.

  2. Note the pipe symbol | used to separate the full category names above. If these categories do not yet exist in the Category table, they will be created during the course of the import. To create a set of categories that are defined by more than just their name alone, create a separate import file for categories and import them first. Or you might have the categories already in place within your Review Foundry database. In that case, you need only specify the relevant value for the container_full_name field in your import file for items.

You can UPLOAD various files along with the usual text or numeric values that typically appear in an an Excel import file. In particular, images and other files that you would normally upload with the File Upload Manager that comes with Review Foundry, can be uploaded by specifying either a full URL or a full file path (on your file system) for the relevant record column. For example, the value to use for the item_image field of an item record might be:

        http://www.somedomain.com/images/nose5.jpg
    or  /path/to/images/nose5.jpg

If the image resides on your file system you might want to specify the file path as this is the quickest (and most reliable) way to retrieve it. Otherwise, for remote (off-site) images, the URL specification is required for a network request.

Note: The author of Review Foundry has found that the NETPBM image manipulation library seems to do much better thumbnailing than the GD library (with the exception of transparent GIFs), so set Review Foundry for NETPBM thumbnailing if possible, else Image::Magick, or GD if you run out of options.

Also, if you are importing images, a thumbnail of the image will only be created if you have already enabled thumbnailing from within Review Foundry (See the admin Configure > Thumbnails page if you have not already configured thumbnailing).

Category and Yellowpage Imports

When you need to construct import files for the Category or Yellowpage tables you can get away with specifying just 2 fields, corresponding to the full_name, and the known_as columns. Other fields are either optional (and therefore do not need to be present unless you wish to provide a value for them), OR they are required but they have specified default values (such as the total_items or total_suppliers columns which will be populated with the default value of 0).

The name field can be derived from the full_name field, so it need no be specified in the import file, and will be ignored if it is supplied. Therefore, a minimal import file for the Yellowpage table might look like this:

full_name
Spare Parts/Auto/Interior
Spare Parts/Auto/Exterior

Any other fields you add are optional.

Parsing Records

Before importing records into your table you should check that the Importer is correctly parsing your import file. You can do this by editing the configuration file, setting these variables:

        create_xml_file => 1,
        update_tables => 0,

Now when you invoke the importer, all that will happen is that the Importer will attempt to parse your import file and output the parsed content as an XML file at the following location (supposing we are dealing with a Supplier import):

        /my/import/supplier_xml_out.xml

There will be NO attempt to import the contents of your file into the targetted import table. Scan the content of the XML file. You should find that your import fields have been transformed into an XML structure that is easy to read. For example, one of your item records might result in this XML element for the item_image field:

        <item_image>http://www.somedomain.com/images/nose5.jpg</item_image>

If the XML file looks sensible, and you see all your fields represented for each import record, you can set

        update_tables => 1,

in the configuration file, and the next time you run the importer the XML file will be parsed for the content to import into your table, and the records will actually be INSERTed into your table. If you set

        create_xml_file => 0,

then the previously-created XML file will be parsed instead of creating it anew (saving time for large import files).

Required Fields

The only fields that MUST appear in an import record are those that correspond to the required (NOT NULL) columns of your table (and of those, only the ones WITHOUT an associated default value--see comment below). Also required in the case of Item or Supplier records, is the sku value. Remember also that any id specification will simply be ignored.

Note: if a NOT NULL column has a specified DEFAULT value (which you can see from the RQ Admin page that shows the definition of a given column in a table) then it CAN be omitted from the import record if you are satisfied with the default value. For instance, if you do not specify a value for the solicit_reviews column in an Item or Supplier record, then the inserted default value will be (unless you have changed it) 'Yes'.

One exception to the rule: unless you specify an explicit value for the is_validated column, the importer will use 'Yes'. So if you want a record to be/remain invalidated, specify this explicitly in the import file.

Problems With Unique Fields

When you import Item, Member, or Supplier records you will encounter problems if the known_as field is not unique across all imported records. This is because that column has a UNIQUE key on it. In other words, it is assumed you will not be adding records which share the same main (public) record identifier. The known_as column is used to store the name, or title of the record.

In some cases, however, it is not possible to uniquely name your records. For example, when importing from a purchased database where the field corresponding to the known_as column carries non-unique values. When this is the case, the thing you want to do is change the key on the known_as column from UNIQUE to INDEX (i.e. a non-unique key).

To remove the UNIQUE key, first go to the table definition page for the relevant Item, Member or Supplier table.

Next, select the Database control panel, then select the Columns option from the drop down menu on the Database page. At the bottom of the Columns page (or table definition page) there are a bunch of table keys. Click on the knwnasndx key and remove the known_as column from the key. Because that was the ONLY column that defined the key, it will be removed.

Now click on the Add Key link at the bottom of the table definition page, because now you want to create a replacement INDEX key: give the new key the same name (knwnasndx, or something else, it is not critical), assign the known_as column to the new key, and save it as an INDEX key.

Now you can import records which are not necessarily unique for the known_as column. You do not need to change the key after importing--in fact, it should remain as an INDEX key.

Recommendations

Before importing large sets of records into one of your tables, test the import process on a small data set of, say, 3 records. After the import, check that everything works as expcted: uploaded images appear in the right place, items are located in the correct category, and so on. If there is any problem you can easily delete 3 records and seek advice on where you went wrong. Only import large sets of data if you are confident that the importer is working correctly.

Also, if a set of imported records contains incorrect imformation, simply correct your Excel-style import file and re-import. Provided you use the same sku value for your re-imported (Item or Supplier) records, the information will simply be used to UPDATE each existing record, not create a new one. Wrongly assigned categories can also be corrected this way (just correct the container_full_name field).

Note: Category and Yellowpage records do NOT require an sku column because they already have a unique identifier for each record: namely, the full_name field. Incidentally, if you are importing a set of Category or Yellowpage records, it is possible to specify just a single field in your import file--the full_name field, and have the importer create minimal-info Category of Yellowpage records. In general, however, you'll likely flesh out the structure of your Category or Yellowpage records, and extra fields will appear in the import record.

Examples

Example import files can be found in /admin/import_examples

Non Review Foundry Modules Used During Import

The importer makes use of several fairly standard Perl modules to perform an import. These are listed below. You likely have them already installed. If not, have them installed before proceeding. The XML::SAX module appears to be required by XML::Simple which is used to convert an XML representation of your import record into an equivalent Perl hash that can be parsed and used for the record import.

    use Fcntl;
    use Socket;
    use Getopt::Long;   ## used to parse the command line arguments
    use File::Path ();  ## used to delete the temporary directory of upload files
    use LWP::Simple;    ## used for file fetching
    use XML::Simple;    ## note: use of XML::Simple required XML::SAX which
    use XML::SAX;       ## required Encode::ConfigLocal
    use File::MMagic;   ## used for determining MIME type of files to upload

If you receive an error about a missing Encode::ConfigLocal class this is probably because your web hosting company has an ancient copy of the Encode class installed. Tell them is issue the following command to rebuild the class:

enc2xs -C

Another XML::SAX problem I was alerted to by Eric of visiondecor.com relates to Window servers running ActivePerl. It seems that sometimes the ActivePerl installer does not install the XML::SAX ParserDetails.ini file, so you may have to hunt down a copy of that file and add it. The file should normally be found under

c:/perl/site/lib/XML/SAX/ParserDetails.ini

and should probably contain just these 2 lines:

[XML::SAX::PurePerl] http://xml.org/sax/features/namespaces = 1

Of course, you may already have the file, so only worry if the filename ParserDetails.ini shows up in an error message.


« Table of Contents   |   Obtain Review Foundry »


Copyright © 2004 Random Mouse Software. All Rights Reserved.