Previous |  Up |  Next

Article

Title: Tree compression pushdown automaton (English)
Author: Janoušek, Jan
Author: Melichar, Bořivoj
Author: Poliak, Martin
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 48
Issue: 3
Year: 2012
Pages: 429-452
Summary lang: English
.
Category: math
.
Summary: A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity. (English)
Keyword: trees
Keyword: pushdown automata
Keyword: compression
Keyword: indexing trees
Keyword: arbology
MSC: 05C05
MSC: 68Q68
idMR: MR2975798
.
Date available: 2012-08-31T15:53:39Z
Last updated: 2013-09-24
Stable URL: http://hdl.handle.net/10338.dmlcz/142947
.
Reference: [1] : Arbology www pages:.Available on http://www.arbology.org/, March 2012.
Reference: [2] Aho, A. V., Ullman, J. D.: The Theory of Parsing, Translation, and Compiling.Prentice-Hall Englewood Cliffs, N.J. 1972. MR 0408321
Reference: [3] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E.: The protein data bank.Nucleic Acids Res. 28 (2000), 235–242. 10.1093/nar/28.1.235
Reference: [4] Bille, P.: Pattern Matching in Trees and Strings.Ph.D. Thesis, FIT University of Copenhagen 2008.
Reference: [5] Busatto, G., Lohrey, M., Maneth, S.: Grammar-based Tree Compression.Technical Report 2004.
Reference: [6] Cleophas, L.: Tree Algorithms. Two Taxonomies and a Toolkit.Ph.D. Thesis, Technische Universiteit Eindhoven 2008.
Reference: [7] Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic $log^3$-time.SODA (1999), 245–254.
Reference: [8] Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications.Available on: http://www.grappa.univ-lille3.fr/tata (2007).
Reference: [9] Gecseg, F., Steinby, M.: Tree languages.In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1–68. MR 1470018
Reference: [10] Flouri, T., Janoušek, J., Melichar, B.: Subtree and tree pattern pushdown automata for trees in prefix notation (2009).Comput. Sci. Inform. Systems 7 (2010), 2, 331–357.
Reference: [11] Hoffmann, Ch., O'Donnell, M.: Pattern matching in trees.J. Assoc. Comput. Mach. 29 (1982), 1, 68–95. Zbl 0477.68067, MR 0662611, 10.1145/322290.322295
Reference: [12] Hopcroft, J. E., Motwani, R., Ullman, J. D.: Introduction to Automata Theory, languages, and Computation. Second edition.Addison-Wesley, Boston 2001. MR 0645539
Reference: [13] Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez et al, R.: The Universal Protein Resource (UniProt) – An Expanding Universe of Protein Information.Database issue, Nucleic Acids Research, Oxford University Press, D187–D191 2006.
Reference: [14] Schmidt, A. R., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., Busse, R.: XMark – A benchmark for XML data management.In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974–985.
Reference: [15] Van Der Beek, L., Bouma, G., Malouf, R., Van Noord, G., Rijksuniversiteit Groningen: The Alpino dependency treebank.In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686–1691.
Reference: [16] Welch, T.: A technique for high-performance data compression.Computer 17 (1984), 6, 8–19. 10.1109/MC.1984.1659158
Reference: [17] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression.IEEE Trans. Inform. Theory 23 (1977), 3, 337–343. Zbl 0379.94010, MR 0530215, 10.1109/TIT.1977.1055714
.

Files

Files Size Format View
Kybernetika_48-2012-3_6.pdf 419.6Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo