Re: Comments on the HTTP/1.0 draft.
Marc VanHeyningen (mvanheyn@cs.indiana.edu)
Thu, 01 Dec 1994 22:50:50 -0500
Highlighting a few issues, which I hope will not create the image that
I am just trying to disagree with Roy on everything :-)...
- If-Modified-Since. Part of the whole point of how this mechanism
was defined is that servers that don't support it will just ignore
it and return the whole object, which may sometimes be inefficient
but won't break anything. I think servers "should" implement this
feature. "Must" is too strong for a feature that increases
efficiency but won't break anything by its absence.
- Non-ASCII characters in headers. I don't think this is a big deal
at all, though I'll be surprised if there isn't already somebody
somewhere using non-ASCII in the comment section of the From: line
or something, and I hope it's being done according to 1522 instead
of somebody assuming the character set used in his particular nation
is the universal character set for the whole world.
- HTTP-Dates. It's not that including the day of the week is
unfathomably difficult, but changing things in general. It's
confusing to say "An rfc1123-date in HTTP actually only allows a
restrictive subset of what RFC 1123 specifies," and for little if
any gain.
I am uncomfortable with deviating from existing specifications
without more compelling reasons for doing so. I mean, heck, if we
just want a date that's easy to parse, how about an integer of the
number of seconds since the beginning of 1970? Easy to implement,
at least under UNIX. :-)
- Canonicalization of content.
I'll drop this if everyone else thinks I'm just being a pedantic
dork, but I really believe the purpose of a specification is to
establish precise, correct behavior in which neither clients nor
servers need to do heuristic guessing about what means what.
Chuck Sutton suggests:
> IMHO, it should state, and CRLF should all be interpreted
> equally as EOL when used as line ends. This avoids any problems with
> machine dependent EOL symbols, and fairly represents the current practice.
> (It also avoids forcing clients and especially servers to do line-by-line
> translations of EOL for all outgoing response information, which is a BIG
> performance hit.)
(Aside: Does somebody have benchmarks to establish the magnitude of
this "big performance hit"?)
This is probably sensible behavior, and something along these lines
(possibly modulo the suggested changes from Ari) should go in an
appendix on tolerant, robust implementations. This is in keeping
with the oft-cited philosophy of "be liberal in what you accept."
However, the other half of that is "be conservative in what you
send." Being conservative means sending objects in canonical form
only, and not assuming the program on the other end will be clever
enough to guess what you really meant. The spec should say this.
How about with new developments? If UNICODE support is desired, how
should line breaks be represented and detected in a robust fashion?
Do we really want to have to include low-level stuff like this in
the spec, instead of just saying "do it in canonical form"?
Aside: The issue of canonicalization is, in principle, not wedded
to any particular content-type family, but in practice seems almost
exclusive associated with line endings. In principle, this isn't
really true; for instance, discarding the resource fork from a Mac
file and sending on the data could be considered converting it to
canonical form, and obviously that's needed. Or should we expect
all clients to be clever enough to recognize that and discard it? :-)
OK, end of tirade (maybe.) If people simply must ship around
objects with different ways of representing the same thing, there
should be an out-of-bandwidth way to indicate that. A
Content-Encoding of "unix-text", for instance, could indicate that
line breaks are represented with LF. Obviously a provision for
multiple C-Es would be needed to describe things like "gzipped UNIX
text". This should be a C-E, though, not a C-T-E. A proposed C-T-E
for UNIX text would probably trigger an uproar of laughter on the
MIME mailing list (and rightly so.)
- Passing thought: If a request contains a Message-ID header, should
the server include that message-ID in the response, maybe in an
In-Reply-To: header?
- Marc
--
Marc VanHeyningen <URL:http://www.cs.indiana.edu/hyplan/mvanheyn.html>