|
Web Log Analysis
Part:
1
2
3
This chapter describes:
- Web Log File
- Analog - Web Log File Analysis Tool
- Configuring Analog to Run My Logs
Web Log File
Web Log File: A file produced by a Web server to record activities on the Web server.
It usually has the following features:
- The log file is text file. Its records are identical in format.
- Each record in the log file represents a single HTTP request.
- A log file record contains important information about a request: the client side host name or IP address,
the date and time of the request, the requested file name, the HTTP response status and size, the referring URL,
and the browser information.
- A browser may fire multiple HTTP requests to Web server to display a single Web page. This is
because a Web page not only needs the main HTML document, it may also need additional files, like images
and JavaScript files. The main HTML document and additional files all require HTTP requests.
- Each Web server has its own log file format, see log file examples below.
- If your Web site is hosted by an ISP (Internet Service Provider), they may not keep the log files for you,
because log files can be very huge if the site is very busy. Instead, they only give you statistics reports generated
from the logs files.
1. IIS (Internet Information Service) Samples: Here are some sample records from an IIS server log file:
02:49:12 127.0.0.1 GET / 200
02:49:35 127.0.0.1 GET /index.html 200
03:01:06 127.0.0.1 GET /images/sponsered.gif 304
03:52:36 127.0.0.1 GET /search.php 200
04:17:03 127.0.0.1 GET /admin/style.css 200
05:04:54 127.0.0.1 GET /favicon.ico 404
05:38:07 127.0.0.1 GET /js/ads.js 200
The record format is very simple. It has fields for: time, client IP address, request command, requested file,
and response status code.
2. Apache Samples: Here are some sample records from an Apache server log file:
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET
/ HTTP/1.1" 200 6394 www.yahoo.com
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET
/images/logo.gif HTTP/1.1" 200 807 www.yahoo.com
"http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET
/news/sports.html HTTP/1.1" 200 3500 www.yahoo.com
"http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET
/favicon.ico HTTP/1.1" 404 1997 www.yahoo.com
"-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET
/style.css HTTP/1.1" 200 4138 www.yahoo.com
"http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET
/js/ads.js HTTP/1.1" 200 10229 www.yahoo.com
"http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET
/search.php HTTP/1.1" 400 1997 www.yahoo.com
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ...)" "-"
The record format is more complex. The records are also very long. I am breaking them into multiple lines.
Some fields are easy to understand, like client IP address, date and time, request command line, response
status and size, referring URL, and browser name. I don't know what the other fields are.
(Continued on next part...)
Part:
1
2
3
|