Operating W3Olista with Custom URLs or from the Command Line
Few people want to browse their statistics, and then there are
sysadmins that to not want to pass on the full processing power available
with browsing to the users. And of course you will want to include links
to brand-new statistics to your pages, without your visitors having to
enter all search criteria into the Query Form on their own.
There are two ways to do this. First, you can link to a specially
constructed URL, and second, you can have the reports created offline
from the command line, writing W3Olistas output to a file where it can
then be accessed as usual.
If you have not already read the Introduction
to W3Olista, please do so before proceeding with this text. The Intro will
tell you some details you must know before you can go on.
Custom URLs
A Custom URL is a specially constructed URL where the commands to
W3Olista are embedded in the Extra Pathname Information. You simply
select all needed commands, separate them with slashes, and add this
information to the basic URL of W3Olista. See below for all the commands
that the program accepts.
The Command Line
On the command line, you give all the commands as parameters to
the program, separated by spaces. W3Olista will print its results to
standard output, so you will have to redirect its output to a file using
the '>' operator. See below for all the commands you can give W3Olista.
Considerations
You may ask yourself why there are two possibilities to create the same
report. There is a simple answer to this question: Using Custom URLs burns
much more CPU power as offline-reports. If you put a Custom URL onto your
pages, it will cause W3Olista to scan the logfiles for every request of
the statistics. But if you create the report once a day and write it to
a file, the logs are only scanned once. In addition, the Custom URLs need
their time, and many users won't want to wait a couple of minutes to
access your statistics.
Even if you are on a fast machine, and you have daily logfiles, I recommend
using Custom URLs only in rare cases, like limited statistics for a single
day. For more sophisticated statistics, you're much better off with offline
reports from the command line.
Command Reference
This is a listing of all the commands you can give W3Olista, either as part
of the Custom URL or on the command line. Some commands are only recognized
in some contexts, so please read carefully. Commands are case-insensitive
except where noted. Most commands are given as an equation of the form
item=value. There are only a few commands that don't take a value;
you already know some of these exceptions: cgi, html and
form.
First Section: Required
At least one of these commands must appear in order to make the program
do something (at least something different than producing an error message).
Listings
You can choose any combination of HostList, FileList and
DateList with a single request, but the Entry Listing is exclusive.
-
HostList=<switch>
- Selects whether or not to print a listing of the hosts that have
accessed your server. <switch> can have the following
values:
-
Yes
prints a complete host listing. You can choose the depth
(the detail) of the listing with the HostDetail
command. If you choose a detail of greater than one,
an index of top-level domains is automatically
printed.
-
No
Does not print a host listing.
-
Sum
Prints a short summary of the top-level domains.
-
FileList=<switch>
- Basically the same as the HostList. Selects whether or not
to print a directory tree of the accessed files. <switch>
can have the following values:
-
Yes
prints a complete directory tree. The depth (the detail)
to which the tree shall be printed can be chosen with the
FileDetail command. If you choose a detail greater
than two, an index of top-level directories is automatically
printed.
-
No
Does not print a directory tree.
-
Sum
Prints a short listing of the files and directories in the
server root directory.
-
DateList=<switch>
- Again similar to HostList. Selects whether or not to print
a directory tree of the accessed files. <switch> can
take the following values:
-
Yes
Prints a date listing down to hourly access. You can select
the detail with the DateDetail command. A detail
greater than two automatically includes a daily summary.
-
No
Does not print date information.
-
Sum
Prints a daily summary of the access.
-
EntryList=<switch>
- This one's a little different from the other two above. This
setting selects wheter or not to print the individual logfile
entries. <switch> can take the following values:
-
Yes
Prints an entry listing. This value overrides all the above
commands, since an entry listing can only be printed exclusively.
Since entry listings take enormous amounts of space, usage of
this listing type is discouraged. Only select an entry listing
with massive display limitations.
-
No
Does not print an entry listing.
Timespan to Report
You should also give the program a timespan it shall produce a report on.
If you don't, the report will be on today. This is done with three
commands:
-
Report=<date>
- Selects the date(s) on which to produce the report.
>date< can have the following values:
-
Today
If you make spelling mistakes with the other values,
this one is used, too.
-
Yesterday
-
ThisWeek
From last Sunday up to now, which is usually less than seven
days.
-
LastWeek
The previous week from sunday to saturday. This report always
covers full seven days.
-
ThisMonth
From the first of this month to today.
-
LastMonth
The whole last month.
-
Last24Hours
The last 24 hours, not bound to the beginning of the day, e.g.
from 4.15pm yesterday to 4.14pm today.
-
Last7Days
The last 7 days before today. If today is Wednesday,
then this will cover previous Wednesday to this Tuesday.
-
Everything
Everything that we can find logs of.
-
Dates
selects a custom range of dates, This range must be given
with the following two commands (to be precise, only the
first command is required).
-
FromDate=<startdate>
- This command only has a meaning if Report=Dates. It sets the
starting date of a report (inclusive). <startdate> is
accepted in a variety of formats:
-
MM/DD/YY
or MM/DD
for the current
year. The year can also be given as four digits.
-
DD.MM.YY
or DD.MM
for the current
year. The year can also be given as four digits.
-
Mmm DD YY
, where Mmm is the english
three-character abbreviation of the month. The year can
also be given as four digits.
-
-nn
(a hypen '-' followed by a decimal number).
This is interpreted as a negative offset in days, relative
to today and means 'this number of days before today'. The
number must be negative, since positive offsets for future
dates don't make sense.
If you don't specify a final date at which to stop reporting with
the following command, only this single day is reported.
-
ToDate=<enddate>
- Only has a meaning if Report=Dates. It sets the date at which
to stop reporting (inclusive, the given date is reported). If you
don't supply this command, then FromDate is copied, and just
the single day is reported. <enddate> accepts the same
date formats as above.
Example: Report=Last7Days is equivalent to
Report=Dates FromDate=-7 ToDate=-1
.
Second Section: Useful
Well, there are defaults for these values, but you really should set at
least some of them.
- There are four types of Query limitations. It takes some time
to get acquainted with their syntax, but let me explain. First there
are the basics,
-
inhost
Reports only statistics for requests from this host (this
domain).
-
exhost
Reports requests from all hosts except this host (this domain).
I use this setting to remove requests from our own domain to
my pages (I don't want to see how often I have loaded my own
pages).
-
infile
Reports only requests of files that contain this string in
their full path name. I use this setting to limit the report
to my own pages. If you want to use slashes ('/') here,
substitute them with a dollar sign ('$').
-
exfile
Reports only files that do not contain this string in
their full path name. If you want to use slashes ('/') here,
substitute them with a dollar sign ('$'). You could use this
to remove such annoying things like inline graphics
(gif
) from the report.
If there are multiple limitations of the same type, they are
or'ed. The four types are then combined by and.
You must add two things to the string in order to make them work.
The first is, of course, the argument to each limitation. Then, you
must also give each limitations its own serial number.
All limitation types are numbered separately. Let me give you an
example:
Take a look at the three commands inhost001=.com
,
infile001=html
and inhost002=.edu
.
This includes requests from the .com and .edu domains,
and reports only files that contain the string html. Note that
the infile limitation is numbered 001
, since it's
the first of this type, and the second inhost is numbered
002
because it's the second one.
-
HostDetail=<level>
Sets the display depth for a full host listing (produced by
HostList=Yes). <level> is a numerical value
greater than zero and less than 8. Zero means All Detail
without limitation.
1 means only top-level domains. Greater numbers add more
levels of detail.
-
FileDetail=<level>
Sets the display depth for the directory tree of a full file listing
(produced by FileList=Yes). <level> is a numerical
value greater than zero and less than 8. Zero means All Detail
without limitation. 1 prints only the root (makes some sense
with proxy logs), 2 prints all files and directories in
the server root. Greater numbers add more levels of detail.
-
DateDetail=<level>
Sets the display depth for date listings as produced by the
DateList=Yes command. <level> is a numerical value
of zero to 4. Zero means All Detail (produces a listing down
to hourly access), 1 means Years, 2 means Months, 3
means Days, 4 means hours and is equivalent to 0.
-
Sort=<stype>
Sets the sorting of all listings. They will not be
globally sorted; only each level is sorted (meaning, a
sorted directory tree remains a 'tree'; global sorting would mix
up files from all different directories).
There are the following possible values for <stype>
-
by-alpha
sorts alphabetically.
This one's the default (at least for now).
-
by-req
sorts by the number of requests.
-
by-size
sorts by the number of kilobytes transferred.
There is yet no possibility to apply different sortings to the
different listings in a single report.
Section three: Advanced settings
-
Link=<ltype>
Defines which kind of links W3Olista shall print in its HTML output.
<ltype> can take the following values:
-
NoLink
Don't print any links anywhere. The output is just the plain
text report. This one's the default if the program is run
from the command line.
-
FileLink
Prints links to the real pages. Meaningless if Display
is anything other than as-file. If the program is
called from the command line, you must define your Web server
with the below Server command.
-
RepLink
Includes the full-featured report links that allow the user
to click'n'browse the report. Don't choose if you're
concerned with your system load. This is the default if
the program is invoked as CGI or with a command URL.
-
Server=<server>
meaningful only for Link=FileLink. Sets the address of the
server containing these documents. You must use this command (a) if
you run W3Olista from the command line, or if (b) the Web server you
run the program on is different from the one your documents are on
(e.g. if you run different servers for scripts and documents).
<server> must be a full qualified hostname (or IP
address).
-
ServerType=<type>
meaningful only for Link=FileLink. Sets the access type how
to access the server which serves the reported documents. Defaults
to http. You only need to use this command if you use different
ways to access scripts and documents (like serving documents from an
ftp server). You can guess that this command is never needed.
-
ServerPort=<number>
meaningful only for Link=FileLink. Sets the port number on
which to contact the server which serves the reported documents.
Defaults to 80. You need only define this (a) if you run the program
from the command line and your server uses a different port; or (b)
if you run the program as CGI or with a command URL and your script
server uses a different port number than your document server.
-
CustomHeader=<file>
-
CustomFooter=<file>
For a description of these two commands, see the General Commands
description in the
Introduction.
Section four: Hostname Resolution
-
ResolveAddr=<switch>
- If set to
Yes
, the program will try to resolve
all numerical IP addresses found in the log file (though if you're
analyzing proxy log files, numerical addresses in URLs are not
resolved). This feature must be enabled at compilation with the
RESOLVEADDR in the Makefile
.
All following directives in this section are only effective if this
parameter is yes.
-
ResolveCacheFile=<file>
- Gives the full path name of a file featuring address/hostname pairs
in a format similar to /etc/hosts. This file will be scanned
for unknown IP addresses before bothering your DNS server (being
much faster, of course). The idea is to keep resolved addresses (and
also addresses we know to be unresolvable) across program runs. This
file must exist at program start, but may be empty (to start,
create an empty file with
touch filename
).
-
ResolveCacheFileReadOnly=<switch>
- Usually the program will add the results of its lookups to the given
cache file. This is prevented if you set this parameter to No;
then the cache file will be opened read-only. Only one instance
of W3Olista may write to the cache file, so if you want to run multiple
instances simultaneously, all but one must have this parameter set to
No.
-
ResolveCacheLogIP=<switch>
- Many IP addresses that appear unresolved in log files aren't resolvable
at all, and we can save much time if we also remember our lookup
failures. So the default behaviour is to cache unresolvable addresses
in the cache file, too, unless you prevent this by setting this
parameter to No.
The trouble with caching unresolvable addresses is that this may only
be a temporary state, or just that the DNS registration hasn't yet
found its way through the net (not to talk about temporarily
unreachable servers which know the right name). So a utility is
provided with which you can delete all unresolved addresses from the
cache from time to time.
-
ResolveCacheLookUp=<switch>
- This parameter defaults to Yes. If you set it to No,
the program will only try to look up the IP address in its cache
file, but if that fails, it doesn't do the 'real' lookup. So with
this parameter being No, you could run the program on a
machine which is not connected to the net, using a cache file
created somewhere else.
Examples
These are a few examples that should help you swallowing the dry explanations
above.
The Command Line
This is a simple invocation of W3Olista from the command line. We assume that
the program is globally accessible (somewhere in your search path).
olista html Report=Yesterday HostList=Sum FileList=Yes > Statistik.html
This command creates statistics on yesterday, including a host summary and
the full directory tree. The results are then written to Statistik.html
in the current directory. A little more complex is
olista html Report=Dates FromDate=-10 ToDate=-1 FileList=Yes LinkType=FileLink Server=www.uni-frankfurt.de > MoreStats.html
This produces statistics on the last ten days (up to yesterday). A full
directory tree is printed, where each item is a link to the real page on
the given HTTP server.
You still need something more complex? Well, then try to cope with two
entries from my crontab file (if you don't know about cron, then
ignore the first couple of colums).
15 2 * * * rm -f $HOME/WWW/StatLastWeek.html ; $HOME/c/olista/olista html Report=Last7Days infile001=/~fp exhost001=rbi.informatik.uni-frankfurt.de Sort=by-alpha Link=FileLink HostList=Yes HostDetail=1 FileList=Yes DateList=Summary Server=www.uni-frankfurt.de > $HOME/WWW/StatLastWeek.html
30 * * * 0 rm -f $HOME/WWW/EverythingFile.html ; $HOME/c/olista/olista html Report=Everything infile001=/~fp exhost001=rbi.informatik.uni-frankfurt.de Sort=by-alpha Link=FileLink HostList=Yes HostDetail=1 FileList=Yes DateList=No Server=www.uni-frankfurt.de > $HOME/WWW/EverythingFile.html
Examples of Command URLs
These are some examples of command URLs. We assume
that /~fp/cgi/olista is the redirection rule for the program,
and that your server is www.uni-frankfurt.de, which runs on
Port 83. What you see here are real URLs that point to our server, but
they're linked to pre-prepared documents. Please don't try to access the
live document since our server's quite loaded.
http://www.informatik.uni-frankfurt.de/~fp/cgi/olista/html/Report=Yesterday/HostList=Sum/FileList=Yes/Link=NoLink/CountUnique=Yes/exhost001=uni-frankfurt.de/infile001=$~fp
This produces a host summary listing and a complete directory tree of the
accesses to my own pages (/~fp), excluding all requests from our own domain
(uni-frankfurt.de).
http://www.uni-frankfurt.de/~fp/cgi/olista/html/Report=Yesterday/HostList=Sum/FileList=Yes/Link=FileLink/ServerPort=80/CountUnique=Yes/exhost001=uni-frankfurt.de/infile001=$~fp
This is nearly the same as above, but with each entry linking to the real
page. Our document server runs on a different port than the script server,
hence I have to give the port number manually.
Frank Pilhofer
<fp -AT- fpx.de>
Back to the Homepage
Last modified: Tue Nov 7 16:45:24 1995