Previous |  Up |  Next

Article

Keywords:
directed acyclic word graph automaton; string matching; data structures
Summary:
An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
References:
[1] Adámek J.: Coding. MVŠT XXXI, SNTL, Prague 1989 (in Czech)
[2] Anderson A., Nilson S.: Efficient implementation of suffix trees. Software–Practice and Expirience 25 (1995), 129–141 DOI 10.1002/spe.4380250203
[3] Balík M.: String Matching in a Text. Diploma Thesis, CTU, Dept. of Computer Science and Engineering, Prague 1998
[4] Crochemore M., Rytter W.: Text Algorithms. Oxford University Press, New York 1994 MR 1307378 | Zbl 1078.68151
[5] Crochemore M., Vérin R.: Direct construction of compact directed acyclic word graphs. In: CPM97 (A. Apostolico and J. Hein, eds., Lecture Notes in Computer Science 1264), Springer–Verlag, Berlin 1997, pp. 116–129 MR 1608404
[6] Gonnet G. H., Baeza–Yates R.: Handbook of Algorithms and Data Structures. Pascal and C. Addison–Wesley, Wokingham 1991 Zbl 0719.68001
[7] Huffman D. A.: A method for construction of minimum redundancy codes. Proc. IRE 40 (1952), 9, 1098–1101
[8] Irving R. W.: Suffix Binary Search Trees, Technical Report TR-1995-7, Computing Science Department, University of Glasgow 199.
[9] Kärkkäinen J.: Suffix cactus: A cross between suffix tree and suffix array. In: Proc. 6th Symposium on Combinatorial Pattern Matching, CPM95, 1995, pp. 191–204 MR 1467515
[10] Kurtz S.: Reducing the Space Requirment of Suffix Trees. Software–Practice and Experience 29 (1999), 13, 1149-1171 DOI 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
[11] Melichar B.: Approximate string matching by finite automata. In: Computer Analysis of Images and Patterns (Lecture Notes in Computer Science 970), Springer–Verlag, Berlin 1995
[12] Melichar B.: Fulltext Systems. Publishing House CTU, Prague 1996 (in Czech)
[13] Melichar B.: Pattern matching and finite automata. In: Proceedings of the Prague Stringology Club Workshop ’97, Prague 1997
Partner of
EuDML logo