Red Queen Review Engine User Manual

BUILDING PAGES

Adjust Text:  a a a a
« Table of Contents   |   Obtain Red Queen »


BUILDING PAGES

If you run a small site with not too much traffic, having pages on your site generated on the fly by CGI scripts presents no real problem to your web server. However, as site traffic increases, this situation changes. It becomes ever harder for the web server to keep up with requests, as each time a page needs to be generated the perl interpreter needs to be evoked, the script code compiled and executed. One way to lighten the server load is to turn CGI pages whose output changes relatively infrequently into an equivalent set of static HTML pages. The browsable Category, Team, and Yellow Pages produced by Red Queen are obvious candidates for this process, and each can be turned into a static directory that you can build on, say, a weekly basis. Red Queen can also convert Member and Supplier Profile pages to HTML, as well as Top Reviewer pages.

Via The Browser

There are better ways to build pages than the method discussed in this section, namely via your browser. However, for reasons discussed in the next sections, these other methods for building may not be viable, so the browser method may be your only option.

To build pages via your browser, go to the Build control panel. There you'll see options to separately build each of the Category, Team, and Yellow Page branches. There are also options to build pages for the Member and Supplier Profiles, and the Top Reviewer pages. You will also see an option to build all of these branches, one after the other. The process is performed in steps (i.e. is staggered). Up to a couple of hundred pages can generally be created before your web server reaches its timeout limit and the process is unexpectedly aborted. To keep this from happening, you should ensure that not too many pages are attempted at any one time.

Built pages are presently batched by Container for the Category, Team, and Yellowpage branches, and by blocks of 100 for built profiles and reviewer pages. You can specify (as a configuration option) the number of containers to be processed together before the browser page is refreshed and the next batch is begun. If your Containers are relatively light, and each has no more than a few dozen Things within it, you might process 5 or 10 Containers per staggered page. If you are building the various possible Thing and Review orderings, the number of Containers that can safely be processed will decrease correspondingly. You may be required to process as little as a single Container per browser page. The number you decide upon is one of the configuration variables that can be set from the Build / Browse frame of the Configure control panel. It is recommended that you keep Containers on small side, if possible.

If you cannot keep your Containers on the small side and run into timeout problems, then consider building pages from the command line (see next method).

Note: The implementation of page building via the browser is handled by something called an NPH process (for non-parsed headers). For NPH processes the page headers are NOT handled by the server, and instead the script produces all of its own header info. However, on some servers this is forbidden and you may see a 500-type error message produced when you try to build pages via the browser. If that is the case, you can try toggling the nph_headers configuration variable found on the Configure > Build / Browse page. This variable allows you to switch off the NPH header management, and hand the headers generation process back over to the web server. So try this if you see an error of the form "Error 500 - Internal server error".

Via The Command Line

If you have telnet (or SSH) access to your site, you can log in and run the build script via the command line. This method has the advantage that it is (somewhat) faster than the equivalent process carried out from the browser, because no CGI processing is involved. Also, building can take place in one (generally) long uninterrupted job--unlike building from the browser, where the process is split into many smaller jobs to reduce memory consumption and avoid timeout limits. But it still suffers from one drawback shared by the browser method of building--the process needs to be carried out manually. In the next section a possible solution to that problem is discussed.

The command line invocations for a telnet-initiated build can be one of the following (this assumes you are issuing the command from the Red Queen /do/admin directory which should be directory protected):

perl ./nph-admin.cgi --do=BuildAll
perl ./nph-admin.cgi --do=BuildItem
perl ./nph-admin.cgi --do=BuildMember
perl ./nph-admin.cgi --do=BuildSupplier
perl ./nph-admin.cgi --do=BuildMemberProfile
perl ./nph-admin.cgi --do=BuildSupplierProfile
perl ./nph-admin.cgi --do=BuildReviewer

If you wish to build all the static directories listed in your build plan (see the corresponding configuration variables for building), use the first command with the 'BuildAll' argument. In this case, if Member and Supplier Profiles are in the build plan, they will be built first, then Top Reviewers. If your Yellow Page directory is in the build plan the Supplier pages will be built next. If your Team directory is in the build plan the Member pages will follow. Finally, if your Category pages are in the build plan the Items pages will be built. If you need to only build one of the directories in your build plan, use one of the other commands shown above.

Note: If you are building all possible Thing and Review orderings, the build process is going to take a long time, particularly if you have a lot of Things and Reviews in your database.

Conflicting User Identities

WARNING: If you decide to switch between browser-based building and command-line building don't expect to get anywhere unless the process executes as the SAME user in both cases. Why? Because if one process builds static files which are then owned by user A, and then the other process attempts to overwrite those same files as user B, permissions on the files will likely prohibit any overwriting from taking place and the process may seem to die mysteriously (unless you have 777 permissions on everything). Generally both processes on a server will execute as the same user, but I have spent hours looking for problems in code when it has turned out that conflicting user identities are the cause of the problem.

Dealing With Timeouts

There is also a configuration variable that can be used to help you if your server timeouts for any reason while build from the command line. If your maximum build time for any process is, say, 60 minutes, and you set the build_expiration_in_minutes to 120 minutes, then you can get the build process to pick up where it left off if the process dies unexpectedly due to a server problem. Provided the process is restarted within the period before the expiration, it will skip rebuilding the pages already build. After the period has expired (like, the next time you intend to rebuild pages) the build will start again from scratch. This feature is useful, for instance, if your Perl process runs out of memory during a long command-line build (which happens to be the actual motivation behind the addition of this feature).

Via Cron Job

If you know how to run scheduled cron jobs--automated execution of programs--you may be able to set things up so that the build process takes place according to a preset schedule that requires no human intervention. However, many web hosts RESTRICT the amount of CPU time that can be allocated to a single cron job. If this is the case for you, very likely you will find yourself running into timeout problems yet again. Possibly, the cron job may only be of use to you if you are running your own dedicated web server and you can remove the time limit for cron execution.

Check with your web host first about timeout limits for cron jobs before you invest time trying to get the build process automated. Otherwise, if you believe that setting up a cron job should be feasible, edit your crontab file and add something like the following lines:

38 1 * * 1 perl /path/to/nph-admin.cgi --do=BuildSupplierProfile --cron=1
38 2 * * 1 perl /path/to/nph-admin.cgi --do=BuildMemberProfile --cron=1
38 3 * * 1 perl /path/to/nph-admin.cgi --do=BuildReviewer --cron=1
38 4 * * 1 perl /path/to/nph-admin.cgi --do=BuildItem --cron=1
38 5 * * 1 perl /path/to/nph-admin.cgi --do=BuildMember --cron=1
38 6 * * 1 perl /path/to/nph-admin.cgi --do=BuildSupplier --cron=1

This example, which rebuilds every Monday at 1:38, 2:38, 3:38, 4:38, 5:38, and 6:38 A.M., respectively, the Supplier Profile, Member Profile, Top Reviewer, Category, Team, and Yellow Page branches, assumes that the individual builds each take less than an hour to complete. Alternatively, if you cannot be sure of the time required to compete one of the build arms, you can elect to build the lot, one after the other, like this:

38 2 * * 1 perl /path/to/nph-admin.cgi --do=BuildAll  --cron=1

The extra --cron=1 argument ensures that logging to the screen is switched off unless an error message needs to be output. This ensures that any email message sent to you after your cron jobs are completed remains of manageable size.

If you cannot run cron jobs, try to use the telnet method instead. If that isn't possible, try the browser method.

Meaning Of The Build Plan

When static pages are built, Red Queen has to have some idea about what static pages will ultimately be created in the build so that it can put in links to these pages before they are actually built (since not everything can be built all at once). This is handled by specifying a bunch of "build plan" variables, which can be located on the Configure > Build / Browse control panels. In fact, these are the build plan variables you will find there:

	build_plan_item
	build_plan_member
	build_plan_supplier
	build_plan_member_profile
	build_plan_supplier_profile
	build_plan_reviewer
	build_plan_member_reviews_as_rss

If (for example) you won't be using the Item branch, or the Member branch, of your public page, because you have decided to represent your reviewable things as Suppliers, the first and second of these build plan variabled need not be enabled. However you can enable all the others if you like (even enabling everything will do you no harm). That way static pages will be created for Suppliers, Supplier Profiles, Member Profiles, Top Reviewers, as well as RSS feeds for individual reviewers.

When you "Build All" the program builds everything according to the plan, and puts in all the static links assuming that the entire build will proceed to completion.

In fact, you should probably ALWAYS elect to "Build All", because if you only build (for example) the Supplier pages, but the Member Profiles are mentioned in the plan, the links to static Member Profile pages will be inserted into static pages (and dynamic pages too), but they will lead nowhere because the Member Profile pages were not actually built. If you hit "Build All" the member profile pages will actually be built before the Supplier pages are. So "Build All" is generally the best way to go. You won't have to remember what to build and what need not be built.

Once you have done a build, you can point your browser to the build root page (the URL of which is used to create the link labelled Static in your admin navigation bar on the far right) to see the results. Static links will also appear in most (but not all) places on the dynamic pages, so anyone who starts on the dynamic pages will end up on static ones fairly quickly.

Building Compressed Pages

If you build static pages you will see that a fair amount of disk space is devoted to the result. In particular, if you have defined a number of rating attributes, and have allowed pages to be build with sortings based on the average value of those rating attributes, a LOT of disk space is chewed up.

You do have the option of NOT offering those review sortings to visitors. See the Configure > Build / Browse control panel if you wish to deactivate those review sorting options in static pages. Note you will need to remove your static pages and rebuild them if you do this. Red Queen does not delete old versions of built pages at present (rather it simply overwrites existing pages).

If you have a large database of review items and want to keep all those review sorting options when building pages, and happen to be hosted on an Apache server that has the mod_gunzip module installed, then there is another option: build the static pages in compressed gzip format. This option is offered from the Configure > HTML Compression control panel.

Instead of building pages like harley_motor_cycle.html, Red Queen will allow you to instead write pages as harley_motor_cycle.html.gz where the content is gzipped and occupies around 25 percent the amount of disk space as the equivalent uncompressed file. The mod_gunzip module will negotiate with browsers, sending them the compressed pages when they request it, saving significantly on bandwidth as well as disk space. For browsers that cannot handle the gzip format, mod_gunzip will inflate the file before sending it.

The lesson to take away here? If you can find a decent web hosting company that offers Apache solutions with mod_gunzip as an option, strongly consider using them as your hosting platform--particularly if you intend to create a fairly large Red Queen database (with several thousand or more reviewable items).

Reducing Total Page Count

When you build pages there are some important considerations to keep in mind. You don't want to run out of disk space, or exceed the total number of files you are permitted to create on your server. Let me show you how this happens...

One of my customers came to me recently. His build process wasn't working. It would not even start. After checking his site I found that during the initialization process, where the build log file was created, the server was denying the creation of the file. The error message: "disk quota exceeded". I thought he had run out of disk space. But he came back later and told me his web hosting company had informed him that he had exceeded his allotment of 800,000 files. He thought this must be a mistake. How could Red Queen generate 800,000 files?

Answer: Easy. Just don't take into consideration how many pages need to be built to accomodate all category sortings and review sortings.

My customer had imported 20,000 suppliers into his system. Now that's not a horrendous number, so I did a rough calculation for him to see whether it might be possible to create 800,000 files before collecting even a handful of reviews. The content of that rough calculation is reproduced below. Think about the important numbers that come up when you are planning your own system and you intend to build pages.

"Gary," I told him after my back of the envelope calculation of the relevant numbers, "It's certainly possible to reach 800,000 files. Let me show you why." This is what I told him:



Let's do a quick rough calculation. Let's say you have M = 20,000 suppliers
and N = 12 rating attributes (actually 6, but you have keep both ascending
and descending sortings for each).

Let's also assume each supplier resides in just one yellowpage and that
P suppliers are listed per yellowpage, and also that you have Q reviews listed
per page, and R reviews in total.

There will be about (M/P)*N files for yellowpage listings, or
( 20,000 / 10 ) * (12) = 24,000 files.

There will be about M*N*( the average number of review pages per supplier )
files for review listings. You have barely any reviews so let's say this is
the minimum number M*N*1 = 20,000 * 12 * 1 = 240,000.

If you place each supplier in multiple yellowpages you would have to multiply
by that number, and there is some indication you did that. If you placed each
supplier in 2 yellowpages on average then you would come up with 480,000 files.

Then there are supplier profiles. Another 20,000.

So very roughly, I count 284,000 files if you did not add suppliers to multiple
yellowpages.

That's a ballpark figure. You could easily reach 800,000 files if you added
each supplier to--on average--3 or more yellowpages.

--Stephen


So, the important factors here are (1) the number of review sortings due to rating attributes. If you retain only the scensing or the descending sorting for each rating attribute (i.e. keep "sort by best service" and toss "sort by worst service") you'll cut page number by roughly a factor of 2. Furthermore, cut the number of rating attributes by 2 again (do you really need them all?) and you cut page total in half again.

If you are placing each reviewed thing in multiple contains, then this increased the total number of page by roughly the same factor. Place each item in 3 categories, and you triple your built page count.

Those are important considerations. Think about them before you go wild and assume you have endless disk space and file allocations to play with. You never do.

Things That Can Go Wrong When Building

In this section I am just going to list some problems I have come across with customers who have had trouble building pages for one reason or another. They are presented in no particular order, and some are copied from the TROUBLESHOOTING section of the manual.


  1. Missing Static Containers
    Containers which are selected for building must be validated, which is to say the is_validated column of records in the Category, Team, or Yellowpage table must be set to 'Yes'. Having non-validated containers is generally not a problem. However, you WILL run into problems building when you have non-validated containers which are parents to validated containers. It is not particularly sensible to have a container hierarchy like this, but there is nothing stopping you from setting things up this way.

    When a build is performed, validated containers are selected, but no check is performed on the hierarchy of these validated containers (because performing such a check would be more computationally intensive than is warranted for a special case like this). So just don't do it. If you have validated containers under non-validated containers, make them invalidated too.

  2. Build Process Never Completes
    If you are attempting to perform a Staggered Build via the browser, the page should refresh itself after building static pages for all Items (or Members, or Suppliers) in the current batch of Categories (or Teams, or Yellow Pages). If the web server process dies with a non-useful error message it may be that it has simply timed out. In this case you can try reducing the number of Categories processed per browser page refresh. If you are already processing a single Category for each browser page, then check the size of the category. Very large categories can be a problem with the current Staggered build algorithm. If you cannot reduce the size of the Category because it is impractical for any reason, then consider performing the build via the telnet command line.

    Telnet into your account, and go to your Red Queen admin directory
    cd /path/to/cgi-bin/rs/redqueen/do/admin
    Then issue this command to build all pages that require building:
    perl nph-admin.cgi --do=BuildAll

    If the process runs out of memory before completing (less likely, but it can happen if you have a large Red Queen database) then set the build_expiration_in_minutes configuration variable to some value like 90 (or roughly twice the time--in minutes--that a build is expected to take) and redo the build. If you restart a failed build within the expiration time it will pick up where it left off. If you need to restart a build from scratch before the expiration time has passed you set the expiration time back to zero.

  3. Build Process Never Begins
    If you are attempting to build pages via the browser, check whether you need to switch off (or on) the nph headers that are normally used for building and indexing via the browser. Usually nph headers are required, but on some web servers they must be switched off before the web server can properly return the requested results.


« Table of Contents   |   Obtain Red Queen »


Copyright © 2004 Random Mouse Software. All Rights Reserved.