hp home products & services support solutions how to buy
spacer
hp logo - invent
corner jeremy carroll corner
contact hp contact hp
hp labs hp labs
jjc home
reports & papers reports & papers
another rdf parser


another rdf parser
arp download arp download
arp documentation arp documentation
snail
download download
dependencies dependencies
rules rules
triple generation triple generation
venn triangles venn triangles
applet help
  
  
  
  
  
  
  
applet help
geometry geometry
e-mail e-mail
corner corner
spacer
 
Snail

Excruciatingly Slow RDF Parsing

Features

  • Glacial, even with the fastest CPU
  • Excessive use of temporary disk space
  • Pathetic error detection
  • Non-conformant N-Triples output.

And …

  • XSLT based, tree transforming, RDF/XML to N-triple converter.
  • Principled non-deterministic use of abbreviation rules.
  • Detailed derivation trail for every parse.

Snail is not a production parser, if that's what you want try ARP. It's design goal is to give a principled account of how the RDF/XML abbreviated syntax is indeed a set of abbreviation rules, and to show that these rules can, if you're so inclined be used in anger.

Each step of a snail derivation applies one rule to one top-level element in the RDF/XML file. Each step is taken by a freshly generated XSLT program with the rules randomly reordered. The directory snail-tmp is used for all the temporary files, a detailed inspection of this directory gives a blow by blow explanation for where the triples came from.

The complete execution is given as follows:

  • Transform input to make scope of xml attributes explicit.
  • Repeatedly apply abbreviation rules until no more apply.
  • Apply aboutEach transform.
  • Transform resulting triples from XML to non-conformant N-Triples

As well as not being efficient, Snail does not attempt to have other features that may be expected from a parser. It does not do error detection, and will often not notice that it has been given wholly unacceptable input.

The N-triples output is defective in three ways:

  • The literal strings are in UTF-8, and not US-ASCII.
  • Relative URIs have not been made absolute.
  • The URI's have not been encoded as in RFC2396 in US-ASCII, but any non-ASCII characters are still given as they were in the input file.

The last two points mean that aboutEach processing is defective in that different strings that map to the same URI are treated as distinct by the aboutEach processor.

printing icon
printing instructions printing instructions
Privacy Statement Legal Notices © 1994-2001 Hewlett-Packard Company