Appendixes
W3Olista only runs in multi-tasking Unix environments. I don't know yet
whether this includes Windows NT, but certainly excludes DOS
systems.
On this machine, the following conditions must be met:
- The closer your operating system is to Posix, the better.
Certain assumptions are made by the software, if you know about
your system, you can verify them below.
- You must have an Ansi-compatible C compiler installed. The
program doesn't compile with pre-Ansi compilers. A bad example is
the default SunOS 4.x compiler (the one that comes with
SunWorks is OK). If you don't have an Ansi compiler yet,
hurry and install the GNU C compiler, GCC, which works
perfectly.
- Your Web server must produce logfiles in the Common logfile
format, which is probably the default on your system. If it isn't,
you can still write your own
logfile scanning
function, if you know how to.
A description
of the Common format.
Low-Level Assumptions
These are some assumtions that must be met to run the program. If you
don't know too much of C and Unix, they'll probably be of no interest
for you.
- The '/' must be your directory separator.
- popen() and pclose() must be present.
- time_t is an integer value; the substraction of two time_t
values must yield the time difference in seconds, and adding or
substracting integer values (seconds) must be allowed and result
in another correct time_t value.
- The include file dirent.h is needed, and the functions
must comply to the Posix standard. Most notably, the field
d_name of the dirent structure must be a null-terminated
string.
- The include files unistd.h, string.h and
stdlib.h are present.
If you have daily or monthly logfiles, each logfile has, of course,
a different filename that reflects the period of time of logging.
W3Olista uses this information from the filename to check if the
data inside a file is needed for a report, and based on this information,
it may decide not to read the file. For example, if you want statistics
of Yesterday, it would be quite useless to scan last month's files. This
procedure saves an enormous amount of time.
In order for this to work, you must use the logfile pattern to describe
how each filename is constructed from a date, so that W3Olista can
reverse-engineer the date from a filename. This is done similar to
the C function strftime(), where you can give date formatting
directives to fill a string with date information. You construct your
logfile pattern as follows:
Ordinary characters in your pattern must exactly match the same position in
the filename. Formatting specifiers begin with a percent sign '%', followed
by one of the following characters:
- %b
- The abbreviated english month name, exactly three
characters long. The first character may or may not be uppercase.
- %B
- The full english month name. The first character may
or may not be uppercase.
- %d
- The two-digit day of the month, ranging from 00 to 31.
- %m
- The two-digit month of the year, ranging from 01 to 12.
Note: Month information as one digit is accepted if followed by
a non-digit character, although use of this policy is not encouraged.
Better, rename your files.
- %y
- Two digits for the number of years since 1900. If this
number is lower than 60, a year after 2000 is assumed.
- %Y
- The year as four digits.
- %%
- The percent sign.
You can also use the dot '.' to step over changing arbitrary information
in the filename, like the day of the week or other goodies. This way the
dot has two meanings (as the literal dot, and as a wildcard character), but
this shouldn't be a problem.
Compressed Logfiles
The compression of logfiles is automatically detected upon its suffix. You
must not give this suffix to the pattern. See the below examples, they
explain much better how this is done.
Example
A typical logfile on our site has the name
httpd.zeus.access.Apr0195 (zeus is the host name of our Web server).
Now the correct logfile pattern for this is httpd.zeus.access.%b%d%y,
upon which the program can correctly detect that this logfile is from the
first of April in 1995. Here are more examples:
Logfile Name Logfile Pattern Used Information
--------------------------------------------------------------------------
access.04.01.1995 access.%d.%m.%Y April 1, 1995
access.March1995 access.%B%Y March 1 to March 31 1995
access.SunApr011995 access....%d%m%Y April 1, 1995
access.0395.Z access.%m%y March 1995, compressed
access.0395.gz access.%m%y March 1995, gzip'ped
Multiple Logfile Patterns
On our site, we have the policy that daily logs are kept for the current
month. At the end of the month, the daily logs are pasted together and
compressed with gzip. These different filename formats can be used by giving
multiple filename patterns separated by a semicolon.
Thus, on our site, I use
httpd.zeus.access.%b%d%y;httpd.zeus.access.%m.%y. The first pattern
reads the daily logs for the current month, and the second pattern reads the
compressed monthly files.
Note that these different filename patterns must not overlap. Imagine the
following scenario: Similar to above, daily logs for the current month are
kept. But instead of pasting them together at the end of the month, the
daily logs are added at the end of each day. In this case, the data for the
current month, except today, exist twofold, once in the daily file and once
in the compressed monthly log. W3Olista cannot detect this 'mistake' and
would doubly report all the data.
Custom Headers and Footers must contain several required elements of HTML,
so that the final document appears stylishly correct.
The Header
The Header should at least contain, for good style:
- The tags <html><head>
- A title as in <title>This is a title</title>
- the end-marker of the HTML header, and the beginning tag of
the body </head><body>
- a caption <h1>Statistics from our site</h1>
Besides, you can mention the purpose of the page and some introducing
text.
The Footer
Defines a filename to print as a footer below the report. The filename
should be given as absolute filename. The file should at least contain
the following:
- The end marker of the document </body> </html>
It is good style to also mention the page's maintainer.
Splitlog is a simple shell script that reads in a big logfile and creates
the logs for individual days. There are two cases when you want to use this
program, first, if you want to speed up report processing, and second, if
you can't get timezones to work on your system. Let me explain these two
different cases:
- If you keep daily logfiles, W3Olista can select which files to use,
and which not to use. For example, if you want statistics for the
last week, then it's pointless to let the program browse through
last year's data. With daily logs, only exact seven files are read.
- The program can either report dates and times in local time or
corrected to GMT. If you want them to be reported in local time,
daily logfiles have to begin at midnight and run through 23:59:59.
If the logfile has an offset to midnight, e.g. it runs from 02:00:00
to 01:59:59 on our site, the times must be corrected to GMT (since
2 o'clock local time is midnight GMT here). Now, this time correction
requires that your operating system supports time zones, and that they
are correctly setup. You can detect this correct setup by running
a report on yesterday, printing a detailed date listing (split up by
the hour). If each hour from 00 to 23 gets reported, then you're fine.
If some times don't get covered, then the time conversion failed.
Splitlog will then help you to reorganize the logs.
Usage of Splitlog
- Synopsis
-
splitlog prefix logfiles [logfiles ...]
- Parameter
-
- prefix
- A filename prefix to be given to the
created files. A string of MMMDDYYY will automatically
be appended to this pattern.
- logfiles
- A single or multiple logfiles to be
read in and split up.
- Examples
-
splitlog access. big_logfile
reads in the data from big_logfile and produces files like
access.Apr0195 and so on.
Correcting the Setup of W3Olista
After splitting up the logfile, you must patch the configuration of the
program:
- Set the new logfile pattern in the Makefile.
- With the files produced by splitlog, you must configure the
program to use localtime. To do so, comment out the GMT
setting in the Makefile.
- Recompile the program with the
make new
command.
This ensures that all files are correctly rebuilt.
If your server does not use the Common logfile format for
logging access, you have to write your own scanning function. If you
know C, then it should be quite easy and straightforward.
Edit file fileio.c
. You must rewrite the function
scanstr
, which is responsible for splitting up an input
string into separate parts. It gets one line of input, str
,
and must fill the following output variables.
- hostname the fully qualified hostname (or IP address).
May not contain any whitespaces. May not be longer than 256 chars.
- filename the full filename of the request. CGI input
(after a '?') must be truncated. Must contain an initial slash,
and may not contain any whitespaces. Must be shorter than 512 chars.
- when The date and time of the request. if GMT is not
defined, you must return the date and time exactly as they are in the
logfile. If GMT is defined, then this date and time must be
corrected to GMT. The output string must be in the internal format
"YYY MMM DD HH mm ss". The month must be the english three-character
abbreviation. All other values are numerical. It is recommended that
days, hours, minutes and seconds are given as two digits each (values
lower than ten with an additional zero). Minutes and seconds are not
required and may be omitted.
- status Integer value. The status of the request. As for now, all
requests that produced a status other than decimal 200 are ignored.
So if there are no status codes in your logs, set this status to 200.
- size Integer value. The number of bytes that were sent to the
client.
The function returns 0 on failure (unexpected end of string or wrong
input format) or 1 on success. If you have successfully written your
own function, please send it to me!
Frank Pilhofer
<fp -AT- fpx.de>
Back to the Homepage
Last modified: Wed Apr 12 11:23:35 1995