A new practical linear space algorithm for the longest common subsequence problem

Goeman, Heiko; Clausen, Michael

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Title:	A new practical linear space algorithm for the longest common subsequence problem (English)
Author:	Goeman, Heiko
Author:	Clausen, Michael
Language:	English
Journal:	Kybernetika
ISSN:	0023-5954
Volume:	38
Issue:	1
Year:	2002
Pages:	[45]-66
Summary lang:	English
.
Category:	math
.
Summary:	This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method. (English)
Keyword:	longest common subsequence problem
Keyword:	Rick’s algorithm
MSC:	68Q25
MSC:	68W05
MSC:	68W32
idZBL:	Zbl 1265.68363
idMR:	MR1899846
.
Date available:	2009-09-24T19:43:44Z
Last updated:	2015-03-24
Stable URL:	http://hdl.handle.net/10338.dmlcz/135445
.
Reference:	[1] Aho A. V., Hirschberg D. S., Ullman J. D.: Bounds on the complexity of the longest common subsequence problem.J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 Zbl 0316.68027, MR 0398159, 10.1145/321921.321922
Reference:	[2] Apostolico A.: Improving the worst–case performance of the Hunt–Szymanski strategy for the longest common subsequence of two strings.Inform. Process. Lett. 23 (1986), 63–69 Zbl 0608.68057, MR 0858225, 10.1016/0020-0190(86)90044-X
Reference:	[3] Apostolico A.: Remarks on the Hsu–Du new algorithm for the longest common subsequence problem.Inform. Process. Lett. 25 (1987), 235–236 MR 0896141, 10.1016/0020-0190(87)90167-0
Reference:	[4] Apostolico A., Guerra G.: The longest common subsequence problem revisited.Algorithmica 2 (1987), 315–336 Zbl 0636.68083, MR 0911954, 10.1007/BF01840365
Reference:	[5] Apostolico A., Browne, S., Guerra C.: Fast linear–space computations of longest common subsequences.Theoret. Comput. Sci. 92 (1992), 3–17 Zbl 0747.68019, MR 1143128, 10.1016/0304-3975(92)90132-Y
Reference:	[6] Chin F. Y. L., Poon C. K.: A fast algorithm for computing longest common subsequences of small alphabet size.J. Inform. Process. 13 (1990), 4, 463–469 Zbl 0768.68020
Reference:	[7] Chvátal V., Sankoff D.: Longest common subsequences of two random strings.J. Appl. Probab. 12 (1975), 306–315 MR 0405531, 10.2307/3212444
Reference:	[8] Dančík V., Paterson M.: Upper bounds for the expected length of a longest common subsequence of two binary sequences.In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 Zbl 0941.68812, MR 1288571
Reference:	[9] Dayhoff M. O.: Computer aids to protein sequence determination.J. Theoret. Biol. 8 (1965), 97–112 10.1016/0022-5193(65)90096-2
Reference:	[10] Dayhoff M. O.: Computer analysis of protein evolution.Sci. Amer. 221 (1969), 1, 86–95 10.1038/scientificamerican0769-86
Reference:	[11] Deken J. G.: Some limit results for longest common subsequences.Discrete Math. 26 (1979), 17–31 Zbl 0403.68064, MR 0535080, 10.1016/0012-365X(79)90057-8
Reference:	[12] Dilworth R. P.: A decomposition theorem for partially ordered sets.Ann. of Math. 51 (1950), 161–166 Zbl 0038.02003, MR 0032578, 10.2307/1969503
Reference:	[13] Fu K. S., Bhargava B. K.: Tree systems for syntactic pattern recognition.IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 Zbl 0269.68059, MR 0359424
Reference:	[14] Gallant J., Maier, D., Storer J. A.: On finding minimal length superstrings.J. Comput. System Sci. 20 (1980), 50–58 Zbl 0436.68043, MR 0566641, 10.1016/0022-0000(80)90004-5
Reference:	[15] Goeman H.: Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets.PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften MR 1942444
Reference:	[16] Hirschberg D. S.: A linear space algorithm for computing maximal common subsequences.Comm. ACM 18 (1975), 6, 341–343 Zbl 0301.68042, MR 0375829, 10.1145/360825.360861
Reference:	[17] Hirschberg D. S.: Algorithms for the longest common subsequence problem.J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 Zbl 0402.68041, MR 0461985, 10.1145/322033.322044
Reference:	[18] Hsu W. J., Du M. W.: New algorithms for the LCS problem.J. Comput. System Sci. 29 (1984), 133–152 Zbl 0587.68045, MR 0773417, 10.1016/0022-0000(84)90025-4
Reference:	[19] Hunt J. W., Szymanski T. G.: A fast algorithm for computing longest common subsequences.Comm. ACM 20 (1977), 5, 350–353 Zbl 0354.68078, MR 0436655, 10.1145/359581.359603
Reference:	[20] Kumar S. K., Rangan C. P.: A linear space algorithm for the LCS problem.Acta Inform. 24 (1987), 353–363 Zbl 0642.68066, MR 0894559, 10.1007/BF00265993
Reference:	[21] Lowrance R., Wagner R. A.: An extension of the string–to–string correction problem.J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 Zbl 0301.68028, MR 0366088, 10.1145/321879.321880
Reference:	[22] Lu S. Y., Fu K. S.: A sentence–to–sentence clustering procedure for pattern analysis.IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 Zbl 0378.68048, MR 0471478, 10.1109/TSMC.1978.4309979
Reference:	[23] Maier D.: The complexity of some problem on subsequences and supersequences.J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 MR 0483700, 10.1145/322063.322075
Reference:	[24] Masek W. J., Paterson M. S.: A faster algorithm for computing string edit distances.J. Comput. System Sci. 20 (1980), 1, 18–31 MR 0566639, 10.1016/0022-0000(80)90002-1
Reference:	[25] Myers E. W.: An $O(ND)$ difference algorithm and its variations.Algorithmica 1 (1986), 251–266 Zbl 0639.68054, MR 1554305, 10.1007/BF01840446
Reference:	[26] Nakatsu N., Kambayashi, Y., Yajima S.: A longest common subsequence algorithm suitable for similar text strings.Acta Inform. 18 (1982), 171–179 Zbl 0493.68041, MR 0687701, 10.1007/BF00264437
Reference:	[27] Needleman S. B., Wunsch C. S.: A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Molecular Biol. 48 (1970), 443–453 10.1016/0022-2836(70)90057-4
Reference:	[28] Paterson M., Dančík V.: Longest common subsequences.In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 MR 1319827
Reference:	[29] Rick C.: New Algorithms for the Longest Common Subsequence Problem.Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994
Reference:	[30] Rick C.: A new flexible algorithm for the longest common subsequence problem.In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 Zbl 0841.68051, MR 1467525
Reference:	[31] Sankoff D., Cedergren R. J.: A test for nucleotide sequence homology.J. Molecular Biol. 77 (1973), 159–164 10.1016/0022-2836(73)90369-0
Reference:	[32] Sankoff D., Kruskal J. B.: Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison.Addison–Wesley, Reading, MA 1983 MR 0726027
Reference:	[33] Ukkonen E.: Algorithms for approximate string matching.Inform. and Control 64 (1985), 100–118 Zbl 0575.68090, MR 0837093, 10.1016/S0019-9958(85)80046-2
Reference:	[34] Wagner R. A.: On the complexity of the extended string–to–string correction problem.In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 Zbl 0357.68047, MR 0445910
Reference:	[35] Wagner R. A., Fischer M. J.: The string–to–string correction problem.J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 Zbl 0278.68032, MR 0356576, 10.1145/321796.321811
Reference:	[36] Wong C. K., Chandra A. K.: Bounds for the string editing problem.J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 Zbl 0316.68019, MR 0398173, 10.1145/321921.321923
Reference:	[37] Wu S., Manber U., Myers, G., Miller W.: An $O(NP)$ sequence comparison algorithm.Inform. Process. Lett. 35 (1990), 317–323 Zbl 0698.68055, MR 1079428, 10.1016/0020-0190(90)90035-V
.

Files

Files	Size	Format	View
Kybernetika_38-2002-1_3.pdf	3.181Mb	application/pdf	View/Open

Back to standard record

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Files

Search

Browse