Title:
|
Multidimensional term indexing for efficient processing of complex queries (English) |
Author:
|
Krátký, Michal |
Author:
|
Skopal, Tomáš |
Author:
|
Snášel, Václav |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
40 |
Issue:
|
3 |
Year:
|
2004 |
Pages:
|
[381]-396 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval. (English) |
Keyword:
|
term indexing |
Keyword:
|
complex queries |
Keyword:
|
multidimensional data structures |
Keyword:
|
BUB-forest |
MSC:
|
14Q15 |
MSC:
|
68P05 |
MSC:
|
68P10 |
MSC:
|
68P20 |
idZBL:
|
Zbl 1249.68042 |
. |
Date available:
|
2009-09-24T20:02:13Z |
Last updated:
|
2015-03-23 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/135602 |
. |
Reference:
|
[1] Baeza-Yates R., Ribeiro-Neto B.: Modern Information Retrieval.Addison Wesley, New York 1999 |
Reference:
|
[2] Bayer R.: The iniversal B-tree for multidimensional indexing: General concepts.In: Proc. World-Wide Computing and its Applications’97, WWCA’97, Tsukuba 1997 |
Reference:
|
[3] Beckmann N., Kriegel H., Schneider, R., Seeger B.: The R*-tree: An effcient and robust access method for points and rectangles.In: Proc. 1990 ACM SIGMOD Internat. Conference on Management of Data, Atlantic City, NJ 1990, pp. 322–331 |
Reference:
|
[4] Böhm C., Berchtold, S., Keim D. A.: Searching in high-dimensional spaces – Index structures for improving the performance of multimedia databases.ACM, 2002 |
Reference:
|
[5] Crochemore M., Rytter W.: Text Algorithms.Oxford University Press, Oxford 1994 Zbl 1078.68151, MR 1307378 |
Reference:
|
[6] Deppisch U.: S-tree: a dynamic balanced signature index for office retrieval.In: Proc. 9th ACM SIGIR Conference, Pisa 1986, pp. 77–87 |
Reference:
|
[7] Dvorský J., Krátký M., Skopal, T., Snášel V.: Term indexing in information retrieval systems.In: Proc. CIC’03, CSREA Press, Las Vegas 2003 |
Reference:
|
[8] Faloutsos C.: Gray codes for partial match and range queries.IEEE Trans. Software Engrg. 14 (1988), 10, xxx–xxx Zbl 0659.68125, MR 0962343, 10.1109/32.6184 |
Reference:
|
[9] Fenk R.: The BUB-Tree.In: Proc. 28rd VLDB Internat. Conference on VLDB. Hongkong 2002 |
Reference:
|
[10] Fenk R., Markl, V., Bayer R.: Improving multidimensional range queries of non rectangular volumes specified by a query box set.In: Proc. Internat. Symposium on Database, Web and Cooperative Systems (DWACOS), Baden-Baden 1999 |
Reference:
|
[11] Ferragina P., Grossi R.: A fully-dynamic data structure for external substring search.In: Proc. ACM Symposium on Theory of Computing, 1995 Zbl 0978.68513 |
Reference:
|
[12] Guttman A.: R-Trees: A dynamic index structure for spatial searching.In: Proc. ACM SIGMOD 1984, ACM Press, Boston 1984, pp. 47–57 |
Reference:
|
[13] Manolopoulos Y., Theodoridis, Y., Tsotras V.: Advanced Database Indexing.Kluwer, Dordrecht 2001 Zbl 0943.68049 |
Reference:
|
[14] Markl V.: Mistral: Processing Relational Queries using a Multidimensional Access Technique.Ph.D. Thesis. Technical University München 1999,http://mistral.in.tum.de/results/publications/Mar99.pdf |
Reference:
|
[15] NIST: Text REtrieval Conference (TREC).2003, http://trec.nist.gov/ |
Reference:
|
[16] Ramsak F., Markl V., Fenk R., Zirkel M., Elhardt, K., Bayer R.: Integrating the UB-tree into a database system kernel.In: Proc. 26th VLDB Internat. Conference, Morgan Kaufmann, Cairo 2000 |
Reference:
|
[17] Roman S.: Advanced Linear Algebra.Springer Verlag, Berlin 1995 Zbl 1132.15002 |
Reference:
|
[18] Salton G., McGill M. J.: Introduction to Modern Information Retrieval.First edition. McGraw Hill, New York 1983 Zbl 0523.68084 |
Reference:
|
[19] Skopal T., Krátký M., Snášel, V., Pokorný J.: On Range Queries in Universal B-trees.Technical Report No. ARG-TR-01-2003, Department of Computer Science, VŠB-Technical University of Ostrava 2003,http://www.cs.vsb.cz/arg/techreports/range.pdf |
Reference:
|
[20] Stephen G. A.: String Searching Algorithms.Lecture Notes Series on Computing, World Scientific, 1998 Zbl 0831.68028, MR 1313914 |
Reference:
|
[21] Wirth N.: Algorithms and Data Structures.Prentice Hall, Englewood Cliffs, N. J. 1984 Zbl 0910.68053, MR 0808586 |
Reference:
|
[22] Witten I. H., Moffat, A., Bell T. C.: Managing Gigabytes, Compressing and Indexing Documents and Images.Van Nostrand Reinhold, New York 1994 Zbl 0821.68051 |
Reference:
|
[23] Consortium, W3: Extensible Markup Language (XML) 1.0. 1998, http://www.w3.org/ TR/REC-xml |
. |