The BRDSTATS Program - Dec. 5, 2002
Dec. 5, 2002: Updated to BRDSTATS 1.50a
    
    Simon Begin has written a free program called BRDSTATS.EXE to do some  analysis of the BorderManager common log files. You can download  version 1.50a, (149k) HERE.  <--You may need to hold the shift key down when you click on this link.
BRDSTATS read me (v1.40) (2001/05/28)
    Description:
  This program scans proxy server common log files, and creates HTML files containing these statistics:
- The Top 20 Users
- The Top 20 Web sites (URL)
- The analysis of each Top 5 Users (Top 10 URL for each of the Top 5 Users)
- The analysis of each Top 5 URL (Top 10 Users for each of the Top 5 URL)
- 24 hour traffic analysis
- 7days / 24 hour traffic analysis
- Top 20 proxy return codes (for example error 404=not found)
- Top 20 file types (for example .gif, .html, .jpg)
- Top file sizes
- A summary
(*) These are the default settings.
Requirements:
- Novell BorderManager Proxy server or any other proxy that supports Common Log format.
- Common log files available to analyze. In BorderManager for example, configure HTTP Proxy Logging (common format - rollover by date for example 7 days).
- HTTP Proxy authentication needed for 'Top users'.
Note: This utility is free to use, please keep the author posted if you like it. In case of problems please read this entire file.
Installation and usage:
Copy BRDSTATS.EXE directly in your log directory (For  BorderManager by default it is SYS:\ETC\PROXY\LOG\HTTP\COMMON). Take  note that the program always works with the current directory.  You can copy the executable anywhere you want, just remember to CD to  the directory where your log files reside before running it.
    
    From any DOS/Win9x/WinNT PC, open a DOS box and go into that directory. Run BRDSTATS or BRDSTATS [filename]. The logs must be closed, any open log is inaccessible. [Note from Craig: I suggest  setting log files to roll over every 4 hours.]
  
    The program will then read and summarize the ENTIRE log file. PLEASE BE  PATIENT! The program can analyze more than 1000 lines/sec, depending on  your PC and the number of statistics. It takes in my  case 10 minutes to analyze 1 week of proxy activity (about 20 MB log  file). You can abort the program at any time with the ALT-C key.
  
    The output file written has the same name than the log file, but with  the extension .HTM. If no filename is specified, it will scan ALLLOG FILES (*.LOG) in the current directory that doesn't  already have an equivalent .HTM file. If you want to redo an HTM file, just delete it, and rerun BRDSTATS.
  
    BRDSTATS will also create a INDEX.HTM file containing links to all  other .HTM files available in the directory. This INDEX.HTM is  recreated from scratch every time BRDSTATS is run and at least 1 log  file is analyzed.
  
    Configuration is made through BRDSTATS.INI, which is automatically  created on the first time the program is run, with all the defaults.  After the INI has been created, just use Notepad to modify it  to suit your needs. All Top xx numbers can be set from 0 to 1000. If  you want to remove a stat, put "0" to deactivate it. If any parameter  is missing or misspelled in the INI file, defaults are used.  You can delete the INI file and it will be recreated with defaults.
  
  IMPORTANT: If you are upgrading from v1.30 you should delete the  .INI file or at least rename it so it would be created with the latest  settings. Parameters and also documentation within the  .INI file changed in this version, and this is the only way to have  that information. Also look at the "History" section at the end of this  document for new features.
Additional info:
You can create a custom log file to obtain a specific analysis. I use the grep command to create a specific log file when specific needs arise. For xample, if I need an analysis of the web site "yahoo", I do:
grep -i yahoo (logfile) > yahoo.log
Then I rerun BRDSTATS and the yahoo.log is analyzed. (Grep is a unix command also available for DOS/Windows).
    
    If you wish to automate BRDSTATS, I suggest you use a simple batch file  that will CD to the Logs directory, run BRDSTATS, then copy all .HTM  files to the desired web server directory.
  
    The URL summary is based on the root url. For example  http://www.123.com/main.htm and http://www.123.com/images/header.gif  are counted as "http://www.123.com". The User stats uses the login name  for  the top 20. If there is no authentication to your proxy, you will have  only 1 user, named "Unknown".
  
    The speed BRDSTATS runs depends on the PC and also the number of stats  to produce. Normally you should get a speed of 500 to 1000 lines/sec on  a recent PC. If you need more speed, disable unused  stats from the INI config file, starting with the more hungry ones: Top  users/URL analysis and file type analysis. Note that you need to  disable a feature (set to No or to 0) to gain more speed.  Whether there's 1 or 40 items selected on any statistic, the time spent  analyzing is the same.
Troubleshooting:
BRDSTATS will reject a log file if after 100 lines in  error, or if there is more than 50% errors. If you need to see the  lines that are rejected, set the "Debug" option to "Yes" in  BRDSTATS.INI.
    
    You may have a URL named "/ (Local file system)" or "http://(your local  web server here)". This usually means that your users pass through your  proxy server to get to the local web server, which may  be a web browser configuration problem.
  
    Another problem seen: In some cases, the log file reports a file size  of 2GB. This really affects the statistics! Usually it is a video  stream. Of course the user didn't download that much data, but  the transfer has started. At this time, I don't have any answer for  this, besides using a file editor and manually deleting those lines  from the log file.
  
    The proxy log does not tell if the data has been served from cache or  from the internet. A file accessed 10 times may be downloaded from the  internet only once, then read from the cache the 9 other  times. The proxy stats will show the 10 times, thus you cannot use the  proxy stats to evaluate your internet traffic. The proxy return code  304 Not Modified seems to give some hint on cache "hits".  But these cache Hits do not account for all cache hits of the proxy  server. This return code turns around 20 to 30% of all hits, while  BorderManager stats always shows 70% cache hits. The code 304 is  a response to a conditional request. e.g. The proxy estimates there  should be a newer version of that file available, then it issues a  conditional GET request, with the file name, date, time and  size. The web server returns the file requested if it has changed, or  code 304 if it hasn't. Anyone having more info on how to calculate  caching from the proxy log files, please contact me.
  
    BRDSTATS has been tested with BorderManager 3.5 through 3.8. Some users  have tested it on BM version 3.0. Any other proxy server welcome, as I  use the common log format.
  
    BRDSTATS uses DBF file format to sum stats. These files are left there  after the program is run and that can be imported in any database or  spreadsheet to get more detailed analysis. There are 3  files that are always overwritten for each log analyzed, so if BRDSTATS  analyzes 2 or more logs, only the last analysis is left. In short,  there's BRDURL, which contains a record for each specific  URL, BRDUSR with a record for each unique user, and finally BRDBOTH  which has a record for each unique USER And URL.
  
    Send any Comments / Suggestions / Ask for source code (Clipper 5.3) to:
  
  Simon Begin
History
Version 1.50 (20011127):
- Top xx IP Adresses - Enabled at 20 by default.
- Engine is now able to filter any garbage in the logs, which occurs when the proxy isn't shut down properly. The program now tells when there's garbage in the input log file and how many bytes were skipped.
- Skipping file clip$err.log if there. This file contains run-time errors of Brdstats.
Version 1.40 (20010528):
- Parsing of lines revised and optimized: more speed and precision.
- New Top file types statistics (for example .gif, .html, .jpg)
- New Top file sizes
- New global parameter to select default sort order for statistics. "DefaultSort" can be set to Hits or MB, and affects all statistics that contains MB and Hits data. Default is Hits, and reflects more a "time spent" on the internet. If set to "MB", statistics will be sorted on file size, for those like me who prefer to check who's using all bandwidth, and not who's passing all his time on the net...
- There is now only 1 "Top nn URL" and 1 "Top nn User" section, which are sorted depending on the DefaultSort parameter.
- "Clickable" URLs
- Reverse order in the index.htm file, so the newer entries are at the top of the file.
- Some minor bugs fixed.
Version 1.30 (20001213):
- New INI file to setup report output
- 24x7 traffic analysis
- Readme updated and put into BRDSTATS.HTM.. Troubleshooting tips added
Version 1.23 (20001019):
- First published version, translated to english
- Analyzes all logs within the current directory that doesn't already have a .HTM equivalent
- New INDEX.HTM with links to all reports in the directory
Future enhancements:
Filter options. The desired result is to include and/or  exclude some string from the log files. It could be a url, a user, or  everything which is in the log.
    
    Get the BRDSTATS.ZIP v1.50 file (149k) by clicking HERE. <--You may need to hold the shift key down when you click on this link.

