My Blog

Welcome! What is this blog for/What will be blogged here? Well, frankly, only time will tell that ... ;-) The plan is to have stuff like ... what I plan to do/Interesting things that I want to share. My interests and hobbies ... and what I'm doing/not doing for it. Random thoughts/Opinions. Just about anything I feel like writing! have fun!

Thursday, May 12, 2005

Web Path Analysis

I am performing some log analysis of my project web logs - to determine the path most users take through my web site. As part of that I am looking at the entities involved.

The Web Log contains a log of the activity of the web server.
The Web Log may be in one or more log files.
-> The Web Log of a web server is typically backed up periodically (usually nightly). So the Web log spans across multiple files.

The Web log contains multiple Hits.
-> A hit is a user/agent accessing one file on the web server.

A set of hits could be grouped together as a "Page Hit"
-> a Page hit is the hits to all the files that were needed to present a specific web page. [for e.g., the html page + all the images.]

The Web Log has one Base URL.
-> I'm assuming that the web server works on one alias.

The Web Log has a specific format
-> This is typically the Apache log format (either the Common Log Format or Combined Log Format)

Each hit in the web log has a specific date-time stamp.
Each hit is from a specific user/agent.

The user/agent could be a human being or a bot.
The user/agent has an ip address.
The user/agent has a "User Agent" string that identifies the browser/bot.

A set of hits from the user forms a session if the difference between two consecutive hits is no more than the session timeout limit.

A hit may or may not have a referer [sic] url.
-> The referrer url identifies the page from which the user requested the current page.

The referrer url may be an internal page or an external page.
The external referrer url could be a search engine.
A search engine url typically contains the search terms used by the user.

A page could be referred to by a name. For instance, the base url could correspond to the "Home Page".
A page could be Tagged with a class (as in classification).
A Tag may or may not have a parent Tag.

A hit could be a rogue hit. For e.g., referrer spam.
A rogue hit could be identified by the user ip address, the referrer url or the user agent.

Technorati Tags: , , .


Post a Comment

<< Home