Previous |  Up |  Next

Article

Keywords:
longest common subsequence problem; Rick’s algorithm
Summary:
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.
References:
[1] Aho A. V., Hirschberg D. S., Ullman J. D.: Bounds on the complexity of the longest common subsequence problem. J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 DOI 10.1145/321921.321922 | MR 0398159 | Zbl 0316.68027
[2] Apostolico A.: Improving the worst–case performance of the Hunt–Szymanski strategy for the longest common subsequence of two strings. Inform. Process. Lett. 23 (1986), 63–69 DOI 10.1016/0020-0190(86)90044-X | MR 0858225 | Zbl 0608.68057
[3] Apostolico A.: Remarks on the Hsu–Du new algorithm for the longest common subsequence problem. Inform. Process. Lett. 25 (1987), 235–236 DOI 10.1016/0020-0190(87)90167-0 | MR 0896141
[4] Apostolico A., Guerra G.: The longest common subsequence problem revisited. Algorithmica 2 (1987), 315–336 DOI 10.1007/BF01840365 | MR 0911954 | Zbl 0636.68083
[5] Apostolico A., Browne, S., Guerra C.: Fast linear–space computations of longest common subsequences. Theoret. Comput. Sci. 92 (1992), 3–17 DOI 10.1016/0304-3975(92)90132-Y | MR 1143128 | Zbl 0747.68019
[6] Chin F. Y. L., Poon C. K.: A fast algorithm for computing longest common subsequences of small alphabet size. J. Inform. Process. 13 (1990), 4, 463–469 Zbl 0768.68020
[7] Chvátal V., Sankoff D.: Longest common subsequences of two random strings. J. Appl. Probab. 12 (1975), 306–315 DOI 10.2307/3212444 | MR 0405531
[8] Dančík V., Paterson M.: Upper bounds for the expected length of a longest common subsequence of two binary sequences. In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 MR 1288571 | Zbl 0941.68812
[9] Dayhoff M. O.: Computer aids to protein sequence determination. J. Theoret. Biol. 8 (1965), 97–112 DOI 10.1016/0022-5193(65)90096-2
[10] Dayhoff M. O.: Computer analysis of protein evolution. Sci. Amer. 221 (1969), 1, 86–95 DOI 10.1038/scientificamerican0769-86
[11] Deken J. G.: Some limit results for longest common subsequences. Discrete Math. 26 (1979), 17–31 DOI 10.1016/0012-365X(79)90057-8 | MR 0535080 | Zbl 0403.68064
[12] Dilworth R. P.: A decomposition theorem for partially ordered sets. Ann. of Math. 51 (1950), 161–166 DOI 10.2307/1969503 | MR 0032578 | Zbl 0038.02003
[13] Fu K. S., Bhargava B. K.: Tree systems for syntactic pattern recognition. IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 MR 0359424 | Zbl 0269.68059
[14] Gallant J., Maier, D., Storer J. A.: On finding minimal length superstrings. J. Comput. System Sci. 20 (1980), 50–58 DOI 10.1016/0022-0000(80)90004-5 | MR 0566641 | Zbl 0436.68043
[15] Goeman H.: Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets. PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften MR 1942444
[16] Hirschberg D. S.: A linear space algorithm for computing maximal common subsequences. Comm. ACM 18 (1975), 6, 341–343 DOI 10.1145/360825.360861 | MR 0375829 | Zbl 0301.68042
[17] Hirschberg D. S.: Algorithms for the longest common subsequence problem. J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 DOI 10.1145/322033.322044 | MR 0461985 | Zbl 0402.68041
[18] Hsu W. J., Du M. W.: New algorithms for the LCS problem. J. Comput. System Sci. 29 (1984), 133–152 DOI 10.1016/0022-0000(84)90025-4 | MR 0773417 | Zbl 0587.68045
[19] Hunt J. W., Szymanski T. G.: A fast algorithm for computing longest common subsequences. Comm. ACM 20 (1977), 5, 350–353 DOI 10.1145/359581.359603 | MR 0436655 | Zbl 0354.68078
[20] Kumar S. K., Rangan C. P.: A linear space algorithm for the LCS problem. Acta Inform. 24 (1987), 353–363 DOI 10.1007/BF00265993 | MR 0894559 | Zbl 0642.68066
[21] Lowrance R., Wagner R. A.: An extension of the string–to–string correction problem. J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 DOI 10.1145/321879.321880 | MR 0366088 | Zbl 0301.68028
[22] Lu S. Y., Fu K. S.: A sentence–to–sentence clustering procedure for pattern analysis. IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 DOI 10.1109/TSMC.1978.4309979 | MR 0471478 | Zbl 0378.68048
[23] Maier D.: The complexity of some problem on subsequences and supersequences. J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 DOI 10.1145/322063.322075 | MR 0483700
[24] Masek W. J., Paterson M. S.: A faster algorithm for computing string edit distances. J. Comput. System Sci. 20 (1980), 1, 18–31 DOI 10.1016/0022-0000(80)90002-1 | MR 0566639
[25] Myers E. W.: An $O(ND)$ difference algorithm and its variations. Algorithmica 1 (1986), 251–266 DOI 10.1007/BF01840446 | MR 1554305 | Zbl 0639.68054
[26] Nakatsu N., Kambayashi, Y., Yajima S.: A longest common subsequence algorithm suitable for similar text strings. Acta Inform. 18 (1982), 171–179 DOI 10.1007/BF00264437 | MR 0687701 | Zbl 0493.68041
[27] Needleman S. B., Wunsch C. S.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molecular Biol. 48 (1970), 443–453 DOI 10.1016/0022-2836(70)90057-4
[28] Paterson M., Dančík V.: Longest common subsequences. In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 MR 1319827
[29] Rick C.: New Algorithms for the Longest Common Subsequence Problem. Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994
[30] Rick C.: A new flexible algorithm for the longest common subsequence problem. In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 MR 1467525 | Zbl 0841.68051
[31] Sankoff D., Cedergren R. J.: A test for nucleotide sequence homology. J. Molecular Biol. 77 (1973), 159–164 DOI 10.1016/0022-2836(73)90369-0
[32] Sankoff D., Kruskal J. B.: Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison. Addison–Wesley, Reading, MA 1983 MR 0726027
[33] Ukkonen E.: Algorithms for approximate string matching. Inform. and Control 64 (1985), 100–118 DOI 10.1016/S0019-9958(85)80046-2 | MR 0837093 | Zbl 0575.68090
[34] Wagner R. A.: On the complexity of the extended string–to–string correction problem. In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 MR 0445910 | Zbl 0357.68047
[35] Wagner R. A., Fischer M. J.: The string–to–string correction problem. J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 DOI 10.1145/321796.321811 | MR 0356576 | Zbl 0278.68032
[36] Wong C. K., Chandra A. K.: Bounds for the string editing problem. J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 DOI 10.1145/321921.321923 | MR 0398173 | Zbl 0316.68019
[37] Wu S., Manber U., Myers, G., Miller W.: An $O(NP)$ sequence comparison algorithm. Inform. Process. Lett. 35 (1990), 317–323 DOI 10.1016/0020-0190(90)90035-V | MR 1079428 | Zbl 0698.68055
Partner of
EuDML logo