Re: html, http, urls and internationalisation
Keld J|rn Simonsen (keld@dkuug.dk)
Sun, 28 Jan 1996 22:54:03 +0100
Larry Masinter writes:
A number of things, which I can agree to, including that URLs are
described in (abstract) characters, independent of encoding.
Then he writes further comments to my initial mail:
> > I would propose that URLs be written in the charset of the
> > document that references the url,
>
> This is exactly the situation. URLs are sequences of characters, can
> be written in newspapers or on business cards (which, not being
> computer encodings, don't have a 'charset'). For those situations
> where URLs are embedded in other documents, that embedding should use
> the charset of the containing document. The repertoire of characters
> allowed within URLs is intentionally restricted to allow such
> embedding in almost all contexts.
>
> > possibly enhanced with
> > the extensions that we make to get further characters,
> > for example &a-ring; or &#xxxx;
>
> this is the part that's impossible. You might imagine doing such a
> thing, but it doesn't work if you then try to use URLs for the purpose
> for which they are functional.
>
> Some folks want to deal with the variability of how particular
> implementations of HTTP or FTP might use sequences of octets to
> represent characters, and, in particular, the characters that appear
> before the local user behind the HTTP or FTP server. So, if you have a
> FTP or HTTP server that serves out files in your file server, and your
> file server uses Big5 or Unicode for the representation of file names,
> you have to choose an encoding of Big5 or Unicode as octets in order
> to deal with the FTP or HTTP protocols. It would be useful to
> standardize that encoding, because there are new HTTP implementations
> being delivered all the time, and even new FTP implementations.
I do not see that I need to have the same encoding as the server,
iff the server had adequate charset translation software applicable.
This could be a requirement if we allowed extended charsets beyond
ASCII in URLs. And it is nicer than requiring URLs always to
be written in some UCS encoding, say UTF-8.
Keld