Universal Enumerative Coding for Tree Models
Martin, Alvaro; Seroussi, Gadiel; Weinberger, Marcelo J.
Keyword(s): universal data compression; enumerative coding; tree models; Markov sources; method of types
Abstract: Efficient enumerative coding for tree sources is, in general, surprisingly intricate?a simple uniform encoding of type classes, which is asymptotically optimal in expectation for many classical models such as FSMs, turns out not to be so in this case. We describe an efficiently computable enumerative code that is universal in the family of tree models in the sense that, for a string emitted by an unknown source whose model is supported on a known tree, the expected normalized code length of the encoding approaches the entropy rate of the source with a convergence rate (K/2)(log n)/n, where K is the number of free parameters of the model family. Based on recent results characterizing type classes of context trees, the code consists of the index of the sequence in the tree type class, and an efficient description of the class itself using a non-uniform encoding of selected string counts. The results are extended to a twice- universal setting, where the tree underlying the source model is unknown.
External Posting Date: November 6, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: November 6, 2011 [Fulltext]