Re: Question for web site gurus

Subject: Re: Question for web site gurus
From: Lou Quillio <public -at- quillio -dot- com>
To: svi -at- ieee -dot- org
Date: Mon, 07 Aug 2006 20:25:22 -0400

Svi Ben-Elya wrote:
> I'm not really sure what the statistics indicate and I don't have
> a benchmark to compare it against.

In general, you benchmark against yourself. Even if you knew what
sort of traffic other sites see, I doubt a comparison would tell you
anything truly valuable.

> In July traffic there was an average of 67 visits per day (including
> weekends) and 1131 unique visitors. The number of pages visited exceeded
> 22,000 and the number of hits was nearly double that.
>
> Can someone explain to me the difference between the number of pages visited
> and number of hits?

Depends quite a bit on how you're tabulating stats. Here's a primer:

http://en.wikipedia.org/wiki/Web_analytics

"Hits" tends to mean any request for a file, as logged by the
webserver. A single request for a page that contains, say, ten
linked objects (maybe eight images and two stylesheets) records
eleven hits. It's an undifferentiated stat that doesn't mean much.

"Page views" are typically a more meaningful subset of "hits",
tallying only files thought to be webpages by the server (or other
logging scheme). Your server (Apache 1.4, on UNIX) is very clear
about what's an HTML page -- but you're using PHP script to serve
dynamic pages, so server logs _may_ be skewed re "page views". Or not.

There are two ways to guess at who's a "unique visitor". One is to
record visitor IP addresses and presume requests from that IP during
a certain date range were made by the same person. The same method
can be used to guess at who's a returning visitor. It's imperfect,
since we're not each issued an IP at birth -- but this is art, and
presuming an IP is a unique person _generally_ works.

The second, surer way to judge uniqueness is by setting a cookie in
the user's browser. Needn't be a privacy compromise, just an
anonymous, unique ID of some kind. A substantial benefit of this
method is that we can now follow a user's movements through the site
(again, without knowing exactly _who_ they are). That tells us
truly useful things: where do folks enter (not always the home
page); where do they seem to leave from; how many and which pages do
they view during a typical visit; how long was their visit. It's
very good to know, for example, if the exciting new feature you've
added is under-performing and should be more prominent.

# # #

At STCForum.org I use two traffic measurement tools (in addition to
log analysis). One is SlimStat:

http://wettone.com/code/slimstat

I see that you have PHP 4.4.2 on your Apache server. If you also
have at least one MySQL database, you might as well start using
SlimStat. View the SlimStat output for STCForum.org here:

http://stcforum.org/slimstat/

SlimStat does not set a cookie, so has the limitations described
above. It also has no JavaScript component, and therefore can't
learn certain particulars about users' browser environments: screen
resolution, viewport size, etc.

Importantly, SlimStat has a respectably good mechanism for
identifying crawlers and robots, which skew your traffic stats
dreadfully (we're only interested in what _humans_ do). It also has
a means for excluding your own hits -- important for low-traffic
sites with fawning admins. ;)

I also use Google Analytics at STCForum.org and elsewhere. GA is
completely JavaScript-based, sets a user cookie, calls upon the
substantial Google traffic-analysis brain trust, and produces very
attractive and useful output. Examples:

http://stcforum.org/viewtopic.php?id=402
http://www.google.com/analytics/

SlimStat and GA are both free. You'll have to decide for yourself
if making a partner of Google is a good idea. Frankly, they won't
learn much more than they already can by other means.

A hybrid approach might be Shaun Inman's Mint, for US$30 per site:

http://www.haveamint.com/

Nice product, nice price.

Nevertheless, I find that between SlimStat, Google Analytics, and
server logs I have more data than I care to ponder, and they cost
nothing.

Performance overhead is very light in both cases. Just make sure to
add the SlimStat `include` and GA <script> block at the very end of
pages you wish to track, and also be sure to exclude your admins'
usual IPs from both tools' consideration.

As mentioned, though, use them to benchmark against your own prior
performance. The patterns are fascinating. Wednesdays are always
the weekly peak. Nobody knows why.

A caution: You're serving full content items via RSS, which means
that RSS consumers are *completely* excluded from your traffic stats
-- because they don't have to visit the site.

I know only two solutions to this problem (which grows as RSS
popularity grows): serve only slugs and a link to full content via
RSS, or expose your feeds exclusively through a third-party service
like FeedBurner:

http://www.feedburner.com/

Personally I don't like my feeds being owned by FeedBurner (or
similar), yet without a middleman you can't know who's reading your
feeds -- and perhaps not visiting the site as a result.

One more thing. Your site may be accessed with or without the
"www." subdomain. In some cases that can bifurcate your stats,
cause "cookie domain" confusion, and other problems. Best thing is
to leave both operable, but to use Apache's mod_rewrite directive to
shunt every request to either the "www." subdomain or the naked
equivalent. It's an annoying bit of housekeeping, I know, but
should be taken care of.

Example problem: When I visit http://www.stc.org/, my browser of
choice selects the persistent login cookie that www.stc.org sets --
in other words, it remembers me. When I visit http://stc.org/ it
doesn't. That's caused by STC.org not being thorough about "cookie
domain" (in this case it should optimally be the wildcard
".stc.org"). Better solution is to always rewrite URLs to one or
the other. Then it's taken care of, no matter who later administers
the site or adds G-D knows what functionality.

Try both of these links and watch your browser's location bar:

http://stcforum.org/
http://www.stcforum.org/

Always sends you to the equivalent "www."-less page, no matter where
you enter.

Depending on your traffic analysis tools, it's likewise possible for
them to judge http://elephant.org.il/ and
http://www.elephant.org.il/ (and all children) as two different
pages. Better to nip that on the server side.

That's all I know, Svi. Hope it helps.

LQ
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

WebWorks ePublisher Pro for Word features support for every major Help
format plus PDF, HTML and more. Flexible, precise, and efficient content
delivery. Try it today! http://www.webworks.com/techwr-l

Easily create HTML or Microsoft Word content and convert to any popular Help file format or printed documentation. Learn more at http://www.DocToHelp.com/TechwrlList

---
You are currently subscribed to TECHWR-L as archive -at- infoinfocus -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit http://lists.techwr-l.com/mailman/options/techwr-l/archive%40infoinfocus.com


To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit
http://www.techwr-l.com/techwhirl/ for more resources and info.


Follow-Ups:

References:
Question for web site gurus: From: Svi Ben-Elya

Previous by Author: Re: Issue about logo
Next by Author: Re: Question for web site gurus
Previous by Thread: Re: Question for web site gurus
Next by Thread: Re: Question for web site gurus


What this post helpful? Share it with friends and colleagues:


Sponsored Ads