Re: Indexing non-HTML objects
Andrew Daviel (andrew@andrew.triumf.ca)
Fri, 2 May 1997 18:15:19 -0700 (PDT)
On Fri, 2 May 1997, David W. Morris wrote:
> Yes, but as has been already pointed out, the LINK is a subpart of
> the HTML and thus doesn't provide for describing arbtrary www content,
> in the case of this thread, for purposes of representing the arbitrary
> www content in a suitable fashion for indexing.
The idea is that the HTML document includes the metadata (as META tags,
PICS label headers, or just plain HTML). The LINK references the resource
(PDF file, MPEG, or perhaps gopher or telnet port). It can
point to arbitrary content. An indexing agent uses the metadata to index
the resource, so when I go to a search engine and click on "68040
dataheet" I get the PDF document, not a document pointing to the PDF
document. Link in HTTP is, as I understand it, the same conceptual
mechanism and so could be used to provide a forward relationship from the
resource back to the metadata, or where a metadata standard defines a
plain text file (as FGDC, I think), provide the reverse relationship from
metadata to resource. In most cases HTML <LINK> would be used as it is
easier for authors to use, at least on existing older servers.
(with a new header, I guess <META HTTP-EQUIV could be used ...)
> Transparent Content Negotiation would provide the ideal infrastructure via
> which the URL/URN/URI identified resource listed in the proposed metainfo
> header could have the appropriate variant delivered based on the
> specific needs of a particular indexing service. Then some content
> could have multiple descriptive documents for indexing purposes if the
> publisher so chose.
Mm, sounds exciting if people will use it. I suspect not ..
Any idea how many people are using content-negotiation at this point ?
I've been waiting for HTTP/1.1 to address the cacheing issues (which it
has), but I don't really have much negotiable content anyway and haven't
updated yet.
I would anticipate that people would submit the metadata HTML, or an
index of metadata, so that robots would originally discover the metadata
rather than the resource. Different HTML metadata would be identified by
schema while plaintext metadata may be structured in some other way
interally to allow identification.
Andrew Daviel