Re: MHTML/HTTP 1.1 Conflicts

Albert Lunde (Albert-Lunde@nwu.edu)
Sat, 24 Jan 1998 23:45:02 -0600


I'm not a great protocol maven, but I'm going to put in my two cents worth...

It seems like the issues you are raising are central to why HTTP is
referred to as "MIME-like" and contrasted with srict MIME in the specs.

>I am reading the HTTP spec now, to check for possible problems with
>MHTML. Since I have not read all of it yet, can you say if there is
>anything at all in the HTTP spec which says anything about the format
>of bodies, i.e. of what comes after the blank line which ends the
>HTTP heading. If the HTTP spec just regards this as an arbitrary string
>of octets, formatted according to its MIME content type, then there
>will probably not be any risk of conflict between MHTML and HTTP.
>
>In particular, does specifications about header line length, header folding,
>end-of-line characters, etc., in the HTTP spec clearly say that these
>specs only apply to lines in the HTTP header! If it does not say so,
>but this is the intention, you need only say this more clearly, and
>all conflicts with MHTML will disappear.

In _most_ respects, I think HTTP regards the body as a stream of bytes...
but a big exception and an important difference from MIME is the treatment
of end-of-line for text/* types.

See RFC2068 sections  3.7.1 and 19.4.1 (which I see you've read..)

In section 3.7.1 it says "This flexibility regarding line breaks applies
only to text media in the entity-body; a bare CR or LF MUST NOT be
substituted for CRLF within any of the HTTP control structures (such as
header fields and multipart boundaries)."

So the HTTP spec says that one of its wacky non-MIME rules applies only to
the entity body.

>It is e-mail, rather than MHTML, which has limitiations. You could
>write something like this, perhaps as added text in chapter 3.7.1
>of the HTTP spec?
>
>   The same content may sometimes be sent through e-mail, sometimes
>   through http. E-mail has different rules than http regarding
>   line length (preferred less than 76 characters in headings,
>   long lines are more often folded, in particular long URLs
>   are sometimes folded by inserting LWS which must be removed
>   before using the URL, line breaks must be CRLF, not bare CR
>   or bare LF). If an object is retrieved through http and then
>   forwarded through e-mail, this may require conversion. Such
>   conversion may invalidate checksums used for digital seals,
>   digitals signatures, etc. This can be avoided if the resource
>   is formatted, also in its http version, according to e-mail
>   rules.
>
>> We can't at this date even contemplate splitting long URL's; it would break
>> huge numbers of implementations.  You need to get in your head that HTTP
>> is a binary, 8 bit clean transport (streaming RPC system) of arbitrary
>> datatypes; it uses MIME like message syntax, but isn't really MIME.
>
>Certainly not in HTTP headings. But what about headings inside multipart
>bodies, transported through HTTP?
>
>> The long line problem really doesn't apply to HTTP at all.
[..]
>Is there no user requirement among http users to be able to retrieve
>resources through http and forward them through e-mail? If there is such
>a user requirement, and if there is another user requirement that
>security checksums should work accross such forwarding, then you do
>have a problem with long lines, even if I can understand that you would
>much prefer that there was no such problem.

I think HTTP makes a distinction between its requirements and those of a
pure MIME environment. Thus these quotes from 19.4.1:

>Where it is possible, a proxy or gateway from HTTP to a strict MIME
>environment SHOULD translate all line breaks within the text media
>types described in section 3.7.1 of this document to the MIME
>canonical form of CRLF. Note, however, that this may be complicated
>by the presence of a Content-Encoding and by the fact that HTTP
>allows the use of some character sets which do not use octets 13 and
>10 to represent CR and LF, as is the case for some multi-byte
>character sets.

and from 19.4.4:

>Proxies and gateways from HTTP to MIME-compliant protocols are
>responsible for ensuring that the message is in the correct format
>and encoding for safe transport on that protocol, where "safe
>transport" is defined by the limitations of the protocol being used.
>Such a proxy or gateway SHOULD label the data with an appropriate
>Content-Transfer-Encoding if doing so will improve the likelihood of
>safe transport over the destination protocol.

My reading of this is that HTTP only imposes its own requirements on the
HTTP headers and body: which are those of an almost-binary transport
(almost because of the CR/LF/CRLF rules), with no line length limits.

Especially the paragraph from 19.4.4 puts the responsibity on HTTP-> mail
(and mail-> HTTP) gateways for unscrewing the real incompatabilties with
MIME.

I'm not sure what the best fix is for some of the issues you raise, but I
don't think you will be able to completely allign HTTP and pure MIME
requirements on message bodies. HTTP is not going to start line wrapping
everything on the off-chance responses (even signed ones) will get
gatewayed to mail somewhere.

Some of the HTTP-> mail gateway problems might be solved by applying a
base64 encoding of the whole thing... but this may not solve everything;
I'm not sure.

Maybe it is desirable to be more explict about what such gateways could do.


---
    Albert Lunde                      Albert-Lunde@nwu.edu