Previous |  Up |  Next

Article

Keywords:
trees; pushdown automata; compression; indexing trees; arbology
Summary:
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.
References:
[1] Arbology www pages:. Available on http://www.arbology.org/, March 2012.
[2] Aho, A. V., Ullman, J. D.: The Theory of Parsing, Translation, and Compiling. Prentice-Hall Englewood Cliffs, N.J. 1972. MR 0408321
[3] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E.: The protein data bank. Nucleic Acids Res. 28 (2000), 235–242. DOI 10.1093/nar/28.1.235
[4] Bille, P.: Pattern Matching in Trees and Strings. Ph.D. Thesis, FIT University of Copenhagen 2008.
[5] Busatto, G., Lohrey, M., Maneth, S.: Grammar-based Tree Compression. Technical Report 2004.
[6] Cleophas, L.: Tree Algorithms. Two Taxonomies and a Toolkit. Ph.D. Thesis, Technische Universiteit Eindhoven 2008.
[7] Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic $log^3$-time. SODA (1999), 245–254.
[8] Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications. Available on: http://www.grappa.univ-lille3.fr/tata (2007).
[9] Gecseg, F., Steinby, M.: Tree languages. In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1–68. MR 1470018
[10] Flouri, T., Janoušek, J., Melichar, B.: Subtree and tree pattern pushdown automata for trees in prefix notation (2009). Comput. Sci. Inform. Systems 7 (2010), 2, 331–357.
[11] Hoffmann, Ch., O'Donnell, M.: Pattern matching in trees. J. Assoc. Comput. Mach. 29 (1982), 1, 68–95. DOI 10.1145/322290.322295 | MR 0662611 | Zbl 0477.68067
[12] Hopcroft, J. E., Motwani, R., Ullman, J. D.: Introduction to Automata Theory, languages, and Computation. Second edition. Addison-Wesley, Boston 2001. MR 0645539
[13] Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez et al, R.: The Universal Protein Resource (UniProt) – An Expanding Universe of Protein Information. Database issue, Nucleic Acids Research, Oxford University Press, D187–D191 2006.
[14] Schmidt, A. R., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., Busse, R.: XMark – A benchmark for XML data management. In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974–985.
[15] Van Der Beek, L., Bouma, G., Malouf, R., Van Noord, G., Rijksuniversiteit Groningen: The Alpino dependency treebank. In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686–1691.
[16] Welch, T.: A technique for high-performance data compression. Computer 17 (1984), 6, 8–19. DOI 10.1109/MC.1984.1659158
[17] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23 (1977), 3, 337–343. DOI 10.1109/TIT.1977.1055714 | MR 0530215 | Zbl 0379.94010
Partner of
EuDML logo