Previous |  Up |  Next

Article

Title: A new practical linear space algorithm for the longest common subsequence problem (English)
Author: Goeman, Heiko
Author: Clausen, Michael
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 38
Issue: 1
Year: 2002
Pages: [45]-66
Summary lang: English
.
Category: math
.
Summary: This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method. (English)
Keyword: longest common subsequence problem
Keyword: Rick’s algorithm
MSC: 68Q25
MSC: 68W05
MSC: 68W32
idZBL: Zbl 1265.68363
idMR: MR1899846
.
Date available: 2009-09-24T19:43:44Z
Last updated: 2015-03-24
Stable URL: http://hdl.handle.net/10338.dmlcz/135445
.
Reference: [1] Aho A. V., Hirschberg D. S., Ullman J. D.: Bounds on the complexity of the longest common subsequence problem.J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 Zbl 0316.68027, MR 0398159, 10.1145/321921.321922
Reference: [2] Apostolico A.: Improving the worst–case performance of the Hunt–Szymanski strategy for the longest common subsequence of two strings.Inform. Process. Lett. 23 (1986), 63–69 Zbl 0608.68057, MR 0858225, 10.1016/0020-0190(86)90044-X
Reference: [3] Apostolico A.: Remarks on the Hsu–Du new algorithm for the longest common subsequence problem.Inform. Process. Lett. 25 (1987), 235–236 MR 0896141, 10.1016/0020-0190(87)90167-0
Reference: [4] Apostolico A., Guerra G.: The longest common subsequence problem revisited.Algorithmica 2 (1987), 315–336 Zbl 0636.68083, MR 0911954, 10.1007/BF01840365
Reference: [5] Apostolico A., Browne, S., Guerra C.: Fast linear–space computations of longest common subsequences.Theoret. Comput. Sci. 92 (1992), 3–17 Zbl 0747.68019, MR 1143128, 10.1016/0304-3975(92)90132-Y
Reference: [6] Chin F. Y. L., Poon C. K.: A fast algorithm for computing longest common subsequences of small alphabet size.J. Inform. Process. 13 (1990), 4, 463–469 Zbl 0768.68020
Reference: [7] Chvátal V., Sankoff D.: Longest common subsequences of two random strings.J. Appl. Probab. 12 (1975), 306–315 MR 0405531, 10.2307/3212444
Reference: [8] Dančík V., Paterson M.: Upper bounds for the expected length of a longest common subsequence of two binary sequences.In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 Zbl 0941.68812, MR 1288571
Reference: [9] Dayhoff M. O.: Computer aids to protein sequence determination.J. Theoret. Biol. 8 (1965), 97–112 10.1016/0022-5193(65)90096-2
Reference: [10] Dayhoff M. O.: Computer analysis of protein evolution.Sci. Amer. 221 (1969), 1, 86–95 10.1038/scientificamerican0769-86
Reference: [11] Deken J. G.: Some limit results for longest common subsequences.Discrete Math. 26 (1979), 17–31 Zbl 0403.68064, MR 0535080, 10.1016/0012-365X(79)90057-8
Reference: [12] Dilworth R. P.: A decomposition theorem for partially ordered sets.Ann. of Math. 51 (1950), 161–166 Zbl 0038.02003, MR 0032578, 10.2307/1969503
Reference: [13] Fu K. S., Bhargava B. K.: Tree systems for syntactic pattern recognition.IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 Zbl 0269.68059, MR 0359424
Reference: [14] Gallant J., Maier, D., Storer J. A.: On finding minimal length superstrings.J. Comput. System Sci. 20 (1980), 50–58 Zbl 0436.68043, MR 0566641, 10.1016/0022-0000(80)90004-5
Reference: [15] Goeman H.: Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets.PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften MR 1942444
Reference: [16] Hirschberg D. S.: A linear space algorithm for computing maximal common subsequences.Comm. ACM 18 (1975), 6, 341–343 Zbl 0301.68042, MR 0375829, 10.1145/360825.360861
Reference: [17] Hirschberg D. S.: Algorithms for the longest common subsequence problem.J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 Zbl 0402.68041, MR 0461985, 10.1145/322033.322044
Reference: [18] Hsu W. J., Du M. W.: New algorithms for the LCS problem.J. Comput. System Sci. 29 (1984), 133–152 Zbl 0587.68045, MR 0773417, 10.1016/0022-0000(84)90025-4
Reference: [19] Hunt J. W., Szymanski T. G.: A fast algorithm for computing longest common subsequences.Comm. ACM 20 (1977), 5, 350–353 Zbl 0354.68078, MR 0436655, 10.1145/359581.359603
Reference: [20] Kumar S. K., Rangan C. P.: A linear space algorithm for the LCS problem.Acta Inform. 24 (1987), 353–363 Zbl 0642.68066, MR 0894559, 10.1007/BF00265993
Reference: [21] Lowrance R., Wagner R. A.: An extension of the string–to–string correction problem.J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 Zbl 0301.68028, MR 0366088, 10.1145/321879.321880
Reference: [22] Lu S. Y., Fu K. S.: A sentence–to–sentence clustering procedure for pattern analysis.IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 Zbl 0378.68048, MR 0471478, 10.1109/TSMC.1978.4309979
Reference: [23] Maier D.: The complexity of some problem on subsequences and supersequences.J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 MR 0483700, 10.1145/322063.322075
Reference: [24] Masek W. J., Paterson M. S.: A faster algorithm for computing string edit distances.J. Comput. System Sci. 20 (1980), 1, 18–31 MR 0566639, 10.1016/0022-0000(80)90002-1
Reference: [25] Myers E. W.: An $O(ND)$ difference algorithm and its variations.Algorithmica 1 (1986), 251–266 Zbl 0639.68054, MR 1554305, 10.1007/BF01840446
Reference: [26] Nakatsu N., Kambayashi, Y., Yajima S.: A longest common subsequence algorithm suitable for similar text strings.Acta Inform. 18 (1982), 171–179 Zbl 0493.68041, MR 0687701, 10.1007/BF00264437
Reference: [27] Needleman S. B., Wunsch C. S.: A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Molecular Biol. 48 (1970), 443–453 10.1016/0022-2836(70)90057-4
Reference: [28] Paterson M., Dančík V.: Longest common subsequences.In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 MR 1319827
Reference: [29] Rick C.: New Algorithms for the Longest Common Subsequence Problem.Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994
Reference: [30] Rick C.: A new flexible algorithm for the longest common subsequence problem.In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 Zbl 0841.68051, MR 1467525
Reference: [31] Sankoff D., Cedergren R. J.: A test for nucleotide sequence homology.J. Molecular Biol. 77 (1973), 159–164 10.1016/0022-2836(73)90369-0
Reference: [32] Sankoff D., Kruskal J. B.: Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison.Addison–Wesley, Reading, MA 1983 MR 0726027
Reference: [33] Ukkonen E.: Algorithms for approximate string matching.Inform. and Control 64 (1985), 100–118 Zbl 0575.68090, MR 0837093, 10.1016/S0019-9958(85)80046-2
Reference: [34] Wagner R. A.: On the complexity of the extended string–to–string correction problem.In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 Zbl 0357.68047, MR 0445910
Reference: [35] Wagner R. A., Fischer M. J.: The string–to–string correction problem.J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 Zbl 0278.68032, MR 0356576, 10.1145/321796.321811
Reference: [36] Wong C. K., Chandra A. K.: Bounds for the string editing problem.J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 Zbl 0316.68019, MR 0398173, 10.1145/321921.321923
Reference: [37] Wu S., Manber U., Myers, G., Miller W.: An $O(NP)$ sequence comparison algorithm.Inform. Process. Lett. 35 (1990), 317–323 Zbl 0698.68055, MR 1079428, 10.1016/0020-0190(90)90035-V
.

Files

Files Size Format View
Kybernetika_38-2002-1_3.pdf 3.181Mb application/pdf View/Open
Back to standard record
Partner of
EuDML logo