Appendixes


Sytem requirements

W3Olista only runs in multi-tasking Unix environments. I don't know yet whether this includes Windows NT, but certainly excludes DOS systems.

On this machine, the following conditions must be met:

Low-Level Assumptions

These are some assumtions that must be met to run the program. If you don't know too much of C and Unix, they'll probably be of no interest for you.

File patterns for Logfiles

If you have daily or monthly logfiles, each logfile has, of course, a different filename that reflects the period of time of logging. W3Olista uses this information from the filename to check if the data inside a file is needed for a report, and based on this information, it may decide not to read the file. For example, if you want statistics of Yesterday, it would be quite useless to scan last month's files. This procedure saves an enormous amount of time.

In order for this to work, you must use the logfile pattern to describe how each filename is constructed from a date, so that W3Olista can reverse-engineer the date from a filename. This is done similar to the C function strftime(), where you can give date formatting directives to fill a string with date information. You construct your logfile pattern as follows:

Ordinary characters in your pattern must exactly match the same position in the filename. Formatting specifiers begin with a percent sign '%', followed by one of the following characters:

%b
The abbreviated english month name, exactly three characters long. The first character may or may not be uppercase.
%B
The full english month name. The first character may or may not be uppercase.
%d
The two-digit day of the month, ranging from 00 to 31.
%m
The two-digit month of the year, ranging from 01 to 12. Note: Month information as one digit is accepted if followed by a non-digit character, although use of this policy is not encouraged. Better, rename your files.
%y
Two digits for the number of years since 1900. If this number is lower than 60, a year after 2000 is assumed.
%Y
The year as four digits.
%%
The percent sign.
You can also use the dot '.' to step over changing arbitrary information in the filename, like the day of the week or other goodies. This way the dot has two meanings (as the literal dot, and as a wildcard character), but this shouldn't be a problem.

Compressed Logfiles

The compression of logfiles is automatically detected upon its suffix. You must not give this suffix to the pattern. See the below examples, they explain much better how this is done.

Example

A typical logfile on our site has the name httpd.zeus.access.Apr0195 (zeus is the host name of our Web server). Now the correct logfile pattern for this is httpd.zeus.access.%b%d%y, upon which the program can correctly detect that this logfile is from the first of April in 1995. Here are more examples:
Logfile Name            Logfile Pattern           Used Information
--------------------------------------------------------------------------
access.04.01.1995       access.%d.%m.%Y           April 1, 1995
access.March1995        access.%B%Y               March 1 to March 31 1995
access.SunApr011995     access....%d%m%Y          April 1, 1995
access.0395.Z           access.%m%y               March 1995, compressed
access.0395.gz          access.%m%y               March 1995, gzip'ped

Multiple Logfile Patterns

On our site, we have the policy that daily logs are kept for the current month. At the end of the month, the daily logs are pasted together and compressed with gzip. These different filename formats can be used by giving multiple filename patterns separated by a semicolon.

Thus, on our site, I use httpd.zeus.access.%b%d%y;httpd.zeus.access.%m.%y. The first pattern reads the daily logs for the current month, and the second pattern reads the compressed monthly files.

Note that these different filename patterns must not overlap. Imagine the following scenario: Similar to above, daily logs for the current month are kept. But instead of pasting them together at the end of the month, the daily logs are added at the end of each day. In this case, the data for the current month, except today, exist twofold, once in the daily file and once in the compressed monthly log. W3Olista cannot detect this 'mistake' and would doubly report all the data.


Custom Headers and Footers

Custom Headers and Footers must contain several required elements of HTML, so that the final document appears stylishly correct.

The Header

The Header should at least contain, for good style: Besides, you can mention the purpose of the page and some introducing text.

The Footer

Defines a filename to print as a footer below the report. The filename should be given as absolute filename. The file should at least contain the following: It is good style to also mention the page's maintainer.

The splitlog Utility

Splitlog is a simple shell script that reads in a big logfile and creates the logs for individual days. There are two cases when you want to use this program, first, if you want to speed up report processing, and second, if you can't get timezones to work on your system. Let me explain these two different cases:
  1. If you keep daily logfiles, W3Olista can select which files to use, and which not to use. For example, if you want statistics for the last week, then it's pointless to let the program browse through last year's data. With daily logs, only exact seven files are read.
  2. The program can either report dates and times in local time or corrected to GMT. If you want them to be reported in local time, daily logfiles have to begin at midnight and run through 23:59:59. If the logfile has an offset to midnight, e.g. it runs from 02:00:00 to 01:59:59 on our site, the times must be corrected to GMT (since 2 o'clock local time is midnight GMT here). Now, this time correction requires that your operating system supports time zones, and that they are correctly setup. You can detect this correct setup by running a report on yesterday, printing a detailed date listing (split up by the hour). If each hour from 00 to 23 gets reported, then you're fine. If some times don't get covered, then the time conversion failed. Splitlog will then help you to reorganize the logs.

Usage of Splitlog

Synopsis
splitlog prefix logfiles [logfiles ...]
Parameter
prefix
A filename prefix to be given to the created files. A string of MMMDDYYY will automatically be appended to this pattern.
logfiles
A single or multiple logfiles to be read in and split up.
Examples
splitlog access. big_logfile
reads in the data from big_logfile and produces files like access.Apr0195 and so on.

Correcting the Setup of W3Olista

After splitting up the logfile, you must patch the configuration of the program:
  1. Set the new logfile pattern in the Makefile.
  2. With the files produced by splitlog, you must configure the program to use localtime. To do so, comment out the GMT setting in the Makefile.
  3. Recompile the program with the make new command. This ensures that all files are correctly rebuilt.

Your own scanning function

If your server does not use the Common logfile format for logging access, you have to write your own scanning function. If you know C, then it should be quite easy and straightforward.

Edit file fileio.c. You must rewrite the function scanstr, which is responsible for splitting up an input string into separate parts. It gets one line of input, str, and must fill the following output variables.

The function returns 0 on failure (unexpected end of string or wrong input format) or 1 on success. If you have successfully written your own function, please send it to me!


Frank Pilhofer <fp -AT- fpx.de> Back to the Homepage
Last modified: Wed Apr 12 11:23:35 1995