Re: two URLs point to same file

Subject: Re: two URLs point to same file
From: Stan Brown <stbrown -at- NACS -dot- NET>
Date: Sun, 24 Mar 1996 23:00:29 -0500

Tim Altom <taltom -at- IQUEST -dot- NET> wrote:
> http://stc.org/region2/phi/n%26v/soft0196.html.
and Emily M. Skarzenski <eskarzenski -at- dttus -dot- com> replied:
> The URL above is incorrect. The correct
> URL is http://stc.org/region2/phi/n&v/soft0196.html.
and Tim surreplied:
>Nevertheless, I checked the two URLs, and found that BOTH of them work. It's
>the first time in my surfing that I've ever seen this happen. Domain names
>versus URLs, yes, but they're just restatements of the same thing. This is
>obviously going into a different directory, yet it's the exact same HTML
>document, so far as I can tell. So if you're searching for the review, use
>either one.

The HTML specs recognize that certain characters should be used only
in certain contexts. For instance, the < character normally signals
the beginning of an HTML keyword. So if you want a < character in
plain text, there has to be some way to tell the browsers that it's
text and not the beginning of a command. (As a practical matter, many
browsers recognize < as plain text unless it's folowed by something
that looks like HTML, but not all do.) The proper way to represent a <
in text on a Web page is with the characters &lt; -- yes, including
the semicolon (not always required but always permitted). Similarly,
the copyright symbol is "&copy;", the lowercase e-with-acute accent is
"&eacute;", and so on. (The ampersand itself is "&amp;", by the way.)

So much for plain text. But what about special characters in a URL?
For instance, if > or " is part of a file name (bad practice, but
possible), the browser may think that character is a delimiter after
the URL rather than part of the URL. Within URLs, special characters
are shown in hex with a leading percent sign. Thus > and " would be
%3E and %22, respectively. (The percent sign itself is %25.)

Those of you who haven't nodded off yet may have noticed that the
"entity names" I listed two paragraphs ago all begin with an
ampersand. This leads to the question whether an ampersand might
sometimes confuse a browser. To guard against this, in a URL the
ampersand character itself can be represented by its hex
representation, %26. (Emily, I think I said "should" in private mail
to you; "can" is a better verb.)

So the answer is that the two URLs are not different: they are the
same, just represented slightly differently. Whoever chose to have a
directory name including the character & made a poor choice, as this
confusion shows.

A list of entity names may be found at
http://www.sandia.gov/sci_compute/symbols.html .

One final note: it's not "domain names versus URLs": the URL is the whole
specification and includes _either_ a numeric IP address _or_
something alphabetic that may be called a domain name. Either one is a
component of a URL.

Stan Brown, Oak Road Systems, Cleveland, Ohio USA +1 216 371-0043
email: stbrown -at- nacs -dot- net Web: http://www.nacs.net/~stbrown/
Can't find FAQ lists? See my Web page for instructions, or email me.


Previous by Author: Re: HTML vs. Adobe Acrobat
Next by Author: Re: Clickable bitmaps in online help: op
Previous by Thread: Tests followed over a long period
Next by Thread: To Do Archives


What this post helpful? Share it with friends and colleagues:


Sponsored Ads