Title:
|
Tree compression pushdown automaton (English) |
Author:
|
Janoušek, Jan |
Author:
|
Melichar, Bořivoj |
Author:
|
Poliak, Martin |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
48 |
Issue:
|
3 |
Year:
|
2012 |
Pages:
|
429-452 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity. (English) |
Keyword:
|
trees |
Keyword:
|
pushdown automata |
Keyword:
|
compression |
Keyword:
|
indexing trees |
Keyword:
|
arbology |
MSC:
|
05C05 |
MSC:
|
68Q68 |
idMR:
|
MR2975798 |
. |
Date available:
|
2012-08-31T15:53:39Z |
Last updated:
|
2013-09-24 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/142947 |
. |
Reference:
|
[1]
: Arbology www pages:.Available on http://www.arbology.org/, March 2012. |
Reference:
|
[2] Aho, A. V., Ullman, J. D.: The Theory of Parsing, Translation, and Compiling.Prentice-Hall Englewood Cliffs, N.J. 1972. MR 0408321 |
Reference:
|
[3] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E.: The protein data bank.Nucleic Acids Res. 28 (2000), 235–242. 10.1093/nar/28.1.235 |
Reference:
|
[4] Bille, P.: Pattern Matching in Trees and Strings.Ph.D. Thesis, FIT University of Copenhagen 2008. |
Reference:
|
[5] Busatto, G., Lohrey, M., Maneth, S.: Grammar-based Tree Compression.Technical Report 2004. |
Reference:
|
[6] Cleophas, L.: Tree Algorithms. Two Taxonomies and a Toolkit.Ph.D. Thesis, Technische Universiteit Eindhoven 2008. |
Reference:
|
[7] Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic $log^3$-time.SODA (1999), 245–254. |
Reference:
|
[8] Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications.Available on: http://www.grappa.univ-lille3.fr/tata (2007). |
Reference:
|
[9] Gecseg, F., Steinby, M.: Tree languages.In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1–68. MR 1470018 |
Reference:
|
[10] Flouri, T., Janoušek, J., Melichar, B.: Subtree and tree pattern pushdown automata for trees in prefix notation (2009).Comput. Sci. Inform. Systems 7 (2010), 2, 331–357. |
Reference:
|
[11] Hoffmann, Ch., O'Donnell, M.: Pattern matching in trees.J. Assoc. Comput. Mach. 29 (1982), 1, 68–95. Zbl 0477.68067, MR 0662611, 10.1145/322290.322295 |
Reference:
|
[12] Hopcroft, J. E., Motwani, R., Ullman, J. D.: Introduction to Automata Theory, languages, and Computation. Second edition.Addison-Wesley, Boston 2001. MR 0645539 |
Reference:
|
[13] Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez et al, R.: The Universal Protein Resource (UniProt) – An Expanding Universe of Protein Information.Database issue, Nucleic Acids Research, Oxford University Press, D187–D191 2006. |
Reference:
|
[14] Schmidt, A. R., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., Busse, R.: XMark – A benchmark for XML data management.In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974–985. |
Reference:
|
[15] Van Der Beek, L., Bouma, G., Malouf, R., Van Noord, G., Rijksuniversiteit Groningen: The Alpino dependency treebank.In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686–1691. |
Reference:
|
[16] Welch, T.: A technique for high-performance data compression.Computer 17 (1984), 6, 8–19. 10.1109/MC.1984.1659158 |
Reference:
|
[17] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression.IEEE Trans. Inform. Theory 23 (1977), 3, 337–343. Zbl 0379.94010, MR 0530215, 10.1109/TIT.1977.1055714 |
. |