Re: Multiple Content-Location headers
Jim Gettys (jg@pa.dec.com)
Thu, 15 Jan 1998 12:57:56 -0800
> From: Jacob Palme <jpalme@dsv.su.se> > Date: Thu, 15 Jan 1998 20:55:42
+0100 > To: Nick Shelness <shelness@lotus.com>, jg@pa.dec.com (Jim Gettys)
> Cc: IETF working group on HTML in e-mail <mhtml@SEGATE.SUNET.SE>, >
http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com > Subject: Re: Multiple
Content-Location headers > > At 17.21 +0000 98-01-15,
Nick_Shelness@motorcity2.lotus.com wrote: > > Could I suggest that to break
this impasse, that MHTML switches to a new > > header field Content-Label
to replace its use of Content-Location. This > > would better capture the
MHTML role of the header field, and would also > > allow the simplifications
I argued for last week on the MHTML list to > > proceed. I.e., Content-Label
could only specify an absolute URI, and would > > not establish a base.
> > I am not very happy with changing an existing and already implemented
> IETF proposed standard in such a radical way. But maybe it is necessary.
> Let us examine the differences between how MHTML and HTTP uses Content-
> Location to see if they really need to be split into two different >
header fields. > > HTTP 1.1 spec says MHTML spec says
(I have removed > the controversial
text allowing > multiple Content-Location
headers, > since we all agree to remove
> this.) > > In HTTP, multipart
body-parts MAY A Content-Location header > contain header fields which
are specifies an URI that labels the > significant to the meaning of
that content of a body part in whose > part. A Content-Location header
heading it is placed. Its value > field SHOULD be included in the CAN
be an absolute or a relative > body-part of each enclosed entity URI.
> that can be identified by a URL. >
A Content-Location header field is >
allowed in any message or content >
heading, in addition to one > Content-ID
header (as specified in > [MIME1])
and, in Message headings, > one Message-ID
(as specified in > [RFC822]) > >
The Content-Location entity-header An URI in a Content-Location > field
MAY be used to supply the header need not refer to an > resource location
for the entity resource which is globally > enclosed in the message
when that available for retrieval using this > entity is accessible from
a URI (after resolution of relative > location separate from
the URIs). However, URI-s in > requested resource's URI.
Content-Location headers (if > absolute,
or resolvable to > absolute URIs) SHOULD
still be > globally unique. > > A
cache cannot assume that an When processing (rendering) a > entity
with a Content-Location text/html body part in an MHTML > different
from the URI used to multipart/related structure, all > retrieve it
can be used to respond URIs in that text/html body part > to later requests
on that Content- which reference subsidiary > Location URI. However, the
Content- resources within the same > Location can be used to
multipart/related structure SHALL > differentiate between multiple
be satisfied by those resources > entities retrieved from a single and
not by resources from any > requested resource, as described another
local or remote source. > in section Caching Negotiated >
Responses. Therefore, If a sender wishes a
> recipient to always retrieve an >
... URI referenced resource from its
> source, an URI labeled copy of >
If a single server supports that resource MUST NOT be included >
multiple organizations that do not in the same multipart/related > trust
one another, then it must structure. > check the values of Location
and > Content-Location headers in In addition, since the source
of a > responses that are generated under resource received in > control
of said organizations to multipart/related structure can be > make sure
that they do not attempt misrepresented (see 12.1 above), > to invalidate
resources over which if a resource received in > they have no
authority. multipart/related structure is
> stored in a cache, it MUST NOT be
> retrieved from that cache other
> than by a reference contained in
a > body part of the same
> multipart/related structure.
> Failure to honor this directive
> will allow a multipart/related
> structure to be employed as a
> Trojan Horse. For example, to
> inject bogus resources (i.e. a
> misrepresentation of a
> competitor's Web site) into a
> recipient's generally accessible
> Web cache.
>
> My feeling is that the use of Content-Location as defined in the HTTP
> and MHTML spec is not so different as to require us to use different
> headers. But could the HTTP people please examine the quotes above
> and check what you feel about this.
>
The problem we have is syntax and implementation, not semantics.
Lets clear this hurdle before we get into the meat of what you are trying
to achieve, and whether your suggestion fits into the architecture of the
Web, and my apologies of jumping into the meat in some of my early messages
on this topic.
Roy Fielding's point is that the syntax change required to allow the header
name Content-Location to have multiple fields (needed as that is what proxies
typically do if they find multiple headers of the same name), is a problem,
and one that may (likely) break exisiting implementations. It is also
possible/likely this would break existing applications of HTTP, particularly
clients and proxies. To include the URI in a comma separated list would
require quoting of the URI's, as Roy points out; parsers may not be coded
correctly to deal with this. It is quite likely that existing implementations
will get the wrong answer, or even die, if one attempts to have multiple
Content-Location headers, or that would not understand the quoting that
this would require. And then there are the proxy issues....
To quote from section 4.2 of the HTTP spec:
"Multiple message-header fields with the same field-name may be present in
a message if and only if the entire field-value for that header field is
defined as a comma-separated list [i.e., #(values)]. It MUST be possible
to combine the multiple header fields into one "field-name: field-value"
pair, without changing the semantics of the message, by appending each
subsequent field-value to the first, each separated by a comma. The order
in which header fields with the same field-name are received is therefore
significant to the interpretation of the combined field value, and thus
a proxy MUST NOT change the order of these field values when a message is
forwarded."
These are the cruxes of the problem. So we're trying to follow the doctor's
maxim "first, do no harm". We aren't worrying (yet) about the semantic issues
that may or may not exist between how Content-Location is defined in the
two different specs, but pointing out that allowing multiple of
Content-Location headers is an incompatible change which may break
implementations, and we have no data which shows this change is harmless.
So until it is shown to be harmless, we must presume harm. IETF process
attempts to avoid regression; we're worried that existing, deployed software
would stop working, possibly in significant ways.
So, please, as in my previous message, either present data that it
doesn't break implementations, or don't argue about the name. Otherwise
we're going to continue to bog down. I think that will let us all
make faster progress.
I hope this clarifies where the difficulty lies.
- Jim Gettys
--
Jim Gettys
Industry Standards and Consortia
Digital Equipment Corporation
Visting Scientist, World Wide Web Consortium, M.I.T.
http://www.w3.org/People/Gettys/
jg@w3.org, jg@pa.dec.com