Language negotiation and robots
Andrew Daviel (andrew@vancouver-webpages.com)
Sun, 14 Jul 1996 11:47:43 -0700 (PDT)
Language negotiation seems to work quite well (at least for
Western languages) between eg. the Apache server and
Mosaic-l10n, Netscape 3 and (I presume) Lynx 2.5).
A description of this may be found at
http://vancouver-webpages.com/multilingual/howto.html
and elsewhere.
I am interested in the interaction between this and web robots;
specifically, how to tell the robot that the document is available
in more than one language.
Using the Apache type-map model, if no Accept-language
field is sent, one gets the first URL (assuming qs is not used or are equal).
Would it be valid to do something like this:
URI: start; vary="language"
URI: unknown.html
Content-type: text/html
Content-language: en,fr,de
URI: german.html
Content-type: text/html
Content-language: de
URI: english.html
Content-type: text/html
Content-language: en
etc.
so that a robot with language-accept unset would retrieve the
"Content-language: en,fr,de" header, and then retry with
Accept-language set to en, fr, de in turn?
As this multiple-language header seems to be valid in the current draft
spec., I presume it should not break (future) browsers.
With my current setup (Mosaic-l10n2.6, Apache 0.6) this doesn't quite work,
as unknown.html is returned when accept-language is set to "en". Putting
a bogus language first has the desired effect, but this is obviously a
kludge.
http://vancouver-webpages.com/multilingual/samples8.var, samples8.var.txt
Alternatively, I could just add all the other languages to the english
one, so that I have:
URI: english-ca.html
Content-type: text/html
Content-language: en-CA,en-US,en-GB,de,fr-FR
which would work, but would mean I would have to clutter up the "English"
page with instructions about setting up Accept-language.
Andrew Daviel
andrew@vancouver-webpages.com
http://vancouver-webpages.com : home of searchBC