Title:
|
A new practical linear space algorithm for the longest common subsequence problem (English) |
Author:
|
Goeman, Heiko |
Author:
|
Clausen, Michael |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
38 |
Issue:
|
1 |
Year:
|
2002 |
Pages:
|
[45]-66 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method. (English) |
Keyword:
|
longest common subsequence problem |
Keyword:
|
Rick’s algorithm |
MSC:
|
68Q25 |
MSC:
|
68W05 |
MSC:
|
68W32 |
idZBL:
|
Zbl 1265.68363 |
idMR:
|
MR1899846 |
. |
Date available:
|
2009-09-24T19:43:44Z |
Last updated:
|
2015-03-24 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/135445 |
. |
Reference:
|
[1] Aho A. V., Hirschberg D. S., Ullman J. D.: Bounds on the complexity of the longest common subsequence problem.J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 Zbl 0316.68027, MR 0398159, 10.1145/321921.321922 |
Reference:
|
[2] Apostolico A.: Improving the worst–case performance of the Hunt–Szymanski strategy for the longest common subsequence of two strings.Inform. Process. Lett. 23 (1986), 63–69 Zbl 0608.68057, MR 0858225, 10.1016/0020-0190(86)90044-X |
Reference:
|
[3] Apostolico A.: Remarks on the Hsu–Du new algorithm for the longest common subsequence problem.Inform. Process. Lett. 25 (1987), 235–236 MR 0896141, 10.1016/0020-0190(87)90167-0 |
Reference:
|
[4] Apostolico A., Guerra G.: The longest common subsequence problem revisited.Algorithmica 2 (1987), 315–336 Zbl 0636.68083, MR 0911954, 10.1007/BF01840365 |
Reference:
|
[5] Apostolico A., Browne, S., Guerra C.: Fast linear–space computations of longest common subsequences.Theoret. Comput. Sci. 92 (1992), 3–17 Zbl 0747.68019, MR 1143128, 10.1016/0304-3975(92)90132-Y |
Reference:
|
[6] Chin F. Y. L., Poon C. K.: A fast algorithm for computing longest common subsequences of small alphabet size.J. Inform. Process. 13 (1990), 4, 463–469 Zbl 0768.68020 |
Reference:
|
[7] Chvátal V., Sankoff D.: Longest common subsequences of two random strings.J. Appl. Probab. 12 (1975), 306–315 MR 0405531, 10.2307/3212444 |
Reference:
|
[8] Dančík V., Paterson M.: Upper bounds for the expected length of a longest common subsequence of two binary sequences.In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 Zbl 0941.68812, MR 1288571 |
Reference:
|
[9] Dayhoff M. O.: Computer aids to protein sequence determination.J. Theoret. Biol. 8 (1965), 97–112 10.1016/0022-5193(65)90096-2 |
Reference:
|
[10] Dayhoff M. O.: Computer analysis of protein evolution.Sci. Amer. 221 (1969), 1, 86–95 10.1038/scientificamerican0769-86 |
Reference:
|
[11] Deken J. G.: Some limit results for longest common subsequences.Discrete Math. 26 (1979), 17–31 Zbl 0403.68064, MR 0535080, 10.1016/0012-365X(79)90057-8 |
Reference:
|
[12] Dilworth R. P.: A decomposition theorem for partially ordered sets.Ann. of Math. 51 (1950), 161–166 Zbl 0038.02003, MR 0032578, 10.2307/1969503 |
Reference:
|
[13] Fu K. S., Bhargava B. K.: Tree systems for syntactic pattern recognition.IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 Zbl 0269.68059, MR 0359424 |
Reference:
|
[14] Gallant J., Maier, D., Storer J. A.: On finding minimal length superstrings.J. Comput. System Sci. 20 (1980), 50–58 Zbl 0436.68043, MR 0566641, 10.1016/0022-0000(80)90004-5 |
Reference:
|
[15] Goeman H.: Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets.PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften MR 1942444 |
Reference:
|
[16] Hirschberg D. S.: A linear space algorithm for computing maximal common subsequences.Comm. ACM 18 (1975), 6, 341–343 Zbl 0301.68042, MR 0375829, 10.1145/360825.360861 |
Reference:
|
[17] Hirschberg D. S.: Algorithms for the longest common subsequence problem.J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 Zbl 0402.68041, MR 0461985, 10.1145/322033.322044 |
Reference:
|
[18] Hsu W. J., Du M. W.: New algorithms for the LCS problem.J. Comput. System Sci. 29 (1984), 133–152 Zbl 0587.68045, MR 0773417, 10.1016/0022-0000(84)90025-4 |
Reference:
|
[19] Hunt J. W., Szymanski T. G.: A fast algorithm for computing longest common subsequences.Comm. ACM 20 (1977), 5, 350–353 Zbl 0354.68078, MR 0436655, 10.1145/359581.359603 |
Reference:
|
[20] Kumar S. K., Rangan C. P.: A linear space algorithm for the LCS problem.Acta Inform. 24 (1987), 353–363 Zbl 0642.68066, MR 0894559, 10.1007/BF00265993 |
Reference:
|
[21] Lowrance R., Wagner R. A.: An extension of the string–to–string correction problem.J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 Zbl 0301.68028, MR 0366088, 10.1145/321879.321880 |
Reference:
|
[22] Lu S. Y., Fu K. S.: A sentence–to–sentence clustering procedure for pattern analysis.IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 Zbl 0378.68048, MR 0471478, 10.1109/TSMC.1978.4309979 |
Reference:
|
[23] Maier D.: The complexity of some problem on subsequences and supersequences.J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 MR 0483700, 10.1145/322063.322075 |
Reference:
|
[24] Masek W. J., Paterson M. S.: A faster algorithm for computing string edit distances.J. Comput. System Sci. 20 (1980), 1, 18–31 MR 0566639, 10.1016/0022-0000(80)90002-1 |
Reference:
|
[25] Myers E. W.: An $O(ND)$ difference algorithm and its variations.Algorithmica 1 (1986), 251–266 Zbl 0639.68054, MR 1554305, 10.1007/BF01840446 |
Reference:
|
[26] Nakatsu N., Kambayashi, Y., Yajima S.: A longest common subsequence algorithm suitable for similar text strings.Acta Inform. 18 (1982), 171–179 Zbl 0493.68041, MR 0687701, 10.1007/BF00264437 |
Reference:
|
[27] Needleman S. B., Wunsch C. S.: A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Molecular Biol. 48 (1970), 443–453 10.1016/0022-2836(70)90057-4 |
Reference:
|
[28] Paterson M., Dančík V.: Longest common subsequences.In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 MR 1319827 |
Reference:
|
[29] Rick C.: New Algorithms for the Longest Common Subsequence Problem.Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994 |
Reference:
|
[30] Rick C.: A new flexible algorithm for the longest common subsequence problem.In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 Zbl 0841.68051, MR 1467525 |
Reference:
|
[31] Sankoff D., Cedergren R. J.: A test for nucleotide sequence homology.J. Molecular Biol. 77 (1973), 159–164 10.1016/0022-2836(73)90369-0 |
Reference:
|
[32] Sankoff D., Kruskal J. B.: Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison.Addison–Wesley, Reading, MA 1983 MR 0726027 |
Reference:
|
[33] Ukkonen E.: Algorithms for approximate string matching.Inform. and Control 64 (1985), 100–118 Zbl 0575.68090, MR 0837093, 10.1016/S0019-9958(85)80046-2 |
Reference:
|
[34] Wagner R. A.: On the complexity of the extended string–to–string correction problem.In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 Zbl 0357.68047, MR 0445910 |
Reference:
|
[35] Wagner R. A., Fischer M. J.: The string–to–string correction problem.J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 Zbl 0278.68032, MR 0356576, 10.1145/321796.321811 |
Reference:
|
[36] Wong C. K., Chandra A. K.: Bounds for the string editing problem.J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 Zbl 0316.68019, MR 0398173, 10.1145/321921.321923 |
Reference:
|
[37] Wu S., Manber U., Myers, G., Miller W.: An $O(NP)$ sequence comparison algorithm.Inform. Process. Lett. 35 (1990), 317–323 Zbl 0698.68055, MR 1079428, 10.1016/0020-0190(90)90035-V |
. |