|
VERIFYING URLSWhether or not you are using Review Foundry as a Link Manager, you will probably have URLs in your Item, Member, or Supplier tables that you wish to keep up to date. Doing this manually rapidly becomes impractical as the number of records in your tables increases. You can, however, have Review Foundry attempt to retrieve the web pages associated with these URLs and record the HTTP status codes. Anything other than a 200 response code likely indicates a problem with the URL. You can run the URL verification program via the browser or, better yet, set up a regularly scheduled cron job to handle the job. Although Review Foundry can identify problematic URLs, it is currently up to you to remove them, if you deem it necessary. If a URL is verified, the corresponding URL_STATUS column will be updated with a 200 status code. A URL which points to a non-resolvable domain will result in a special 999 status code. Other non-200 value status codes will reflect the value returned by the web server responsible for serving up the web page associated with the URL.
Note: If you have non-resolvable hosts mixed into the URLs you are attempting to retrieve, this will slow down the process. To speed things up, you should ensure that the Perl module Net::DNS::Resolver is installed on your system. However, the module is optional. The URL verifier will function without it. Via The BrowserAs mentioned in the discussion about building static pages, using your browser to perform a repetitive process like retrieving web pages is a convenience. If you have the choice, the recommended way to go is to set up a cron job instead. Here, however, we discuss how to go about the URL verification process using the web interface. To retrieve the web pages pointed to by the URLs in your Review Foundry database, and use your browser to do it, go to the Verify control panel. There you'll see options to separately process each of the Item, Member, and Supplier tables. The process is performed in steps (i.e. is staggered). Up to, perhaps, a hundred web pages can generally be retieved before your web server reaches its timeout limit and the process is unexpectedly aborted. If this should happen, reduce the number of records processed per page of browser output. The number you decide upon is one of the configuration variables that can be set from the Verify frame of the Configure control panel. Via The Command LineIf you have telnet (or SSH) access to your site, you can log in and run the URL verification script via the command line. Because no CGI processing is involved, this method is (somewhat) faster than the equivalent process carried out from the browser. Also, the processing takes place in one (generally) long uninterrupted job--unlike from the browser, where the task is split into many smaller jobs to reduce memory consumption and avoid timeout limits. But it still suffers from the drawback shared by the browser method--the process needs to be carried out manually. In the next section a possible solution to that problem is discussed. The command line invocations for a telnet-initiated URL verification process can be one of the following (this assumes you are issuing the command from the Review Foundry /do/admin directory which should be directory protected): perl ./nph-admin.cgi --do=VerifyAll perl ./nph-admin.cgi --do=VerifyItem perl ./nph-admin.cgi --do=VerifyMember perl ./nph-admin.cgi --do=VerifySupplier If you wish to check the URLs in each of the Item, Member, and Supplier tables, use the first command with the 'VerifyAll' argument. In this case, each of the tables will be processed entirely before the next one is begun. If you need to process only one of the tables, use one of the other commands shown above. Note: If you are processing all possible Item, Member, and Supplier records, the URL verification process is going to take a long time, particularly if you have many records in your database. Via Cron JobIf you know how to schedule cron jobs--automated execution of programs--you may be able to arrange to run the URL verification process according to a preset schedule that requires no human intervention. On the other hand, many web hosts RESTRICT the amount of CPU time that can be allocated to a single cron job. If this is true for you, very likely you will find yourself running into timeout problems yet again. Possibly, the cron job may only be of use to you if you are running your own dedicated web server and you can remove the time limit for cron execution. Check with your web host first about timeout limits for cron jobs before you invest time trying to get the URL verification process automated. Otherwise, if you believe that setting up a cron job should be feasible, edit your crontab file and add something like the following lines: 38 2 * * 2 perl /path/to/nph-admin.cgi --do=VerifyItem --cron=1 38 3 * * 2 perl /path/to/nph-admin.cgi --do=VerifyMember --cron=1 38 4 * * 2 perl /path/to/nph-admin.cgi --do=VerifySupplier --cron=1 This example, which runs every Tuesday at 2:38, 3:38, and 4:38 A.M., assumes that the individual processes each take less than an hour to complete. Alternatively, if you cannot be sure of the time required to process one of the tables, you can elect to process the lot, one after the other, like this: 38 2 * * 2 perl /path/to/nph-admin.cgi --do=VerifyAll --cron=1 The extra --cron=1 argument ensures that logging to the screen is switched off unless an error message needs to be output. This ensures that any email message sent to you after your cron jobs are completed remains of manageable size. If you cannot run cron jobs, try to use the telnet method instead. If that isn't possible, try the browser method. « Table of Contents | Obtain Review Foundry » Copyright © 2004 Random Mouse Software. All Rights Reserved. | ||||||||||||||||||