RE: non-ascii user name & password
Chris Newman (Chris.Newman@innosoft.com)
Tue, 22 Sep 1998 01:20:28 +0100 (BST)
On Mon, 21 Sep 1998, Paul Leach wrote:
> Or, I could say:
> ------------------
> passwd = *OCTET
>
> It is the responsibility of the client implementation to make sure that the
> user can generate any octet string when providing the password. A protocol
> for setting and changing passwords (which is beyond the scope of this
> document) must specify how what the user provides maps to the actual octet
> string "on the wire".
> -------------------
That's the right formal syntax (although a recommendation to exclude NUL
might be wise). However, most passwords are typed as characters, so an
interoperable spec (at least to get multi-client interoperability) has to
say how typed characters in a password are encoded. The difference is
visible on the wire even though the server should not care.
My suggestion:
When a password is typed by a user, the characters are encoded in
US-ASCII. Encoding of non-US-ASCII characters is not specified at this
time, but use of localized character sets such as ISO-8859-1 for this
purpose is forbidden. Clients are encouraged to provide a facility for
entry of uninterpreted binary passwords.
I think this is the best interoperability we can get at this time without
creating an implementation burden.
The user name is TEXT (in a quoted string) which is trickier to deal with
since it is in the base protocol which by context defaults to ISO-8859-1.
The problem is that no other protocol made the mistake of using ISO-8859-1
for user names, and they are likely to be amended to use UTF-8. Since
HTTP digest includes the username in the one-way-function used to store
the password on the server, this would mean that non-ASCII usernames
wouldn't be able to share authentication backend databases between HTTP
digest and other protocols. I think that would be a serious mistake.
I'd say that any server which permits non-ASCII characters in the username
field SHOULD convert them to UTF-8 prior to including them in the
one-way-function. The conversion from ISO-8859-1 to UTF-8 is trivial, so
that's not a significant burden in exchange for multi-protocol
interoperability of digest authentication.
RFC 2047 encoding is forbidden in a quoted-string, so that can't be used
on the user name. As it stands, user names in HTTP are Euro-centric
unless RFC 2231 encoding were to be permitted at some future date.
- Chris