Previous |  Up |  Next

Article

Title: Mathematical Document Classification via Symbol Frequency Analysis (English)
Author: Watt, Stephen M.
Language: English
Journal: Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008
Volume:
Issue: 2008
Year:
Pages: 29-40
.
Category: math
.
Summary: Earlier work has examined the frequency of symbol and expression use in mathematical documents for various purposes including mathematical handwriting recognition and forming the most natural output from computer algebra systems. This work has found, unsurprisingly, that the particulars of symbol and expression vary from area to area and, in particular, between different top-level subjects of the 2000 Mathematical Subject Classification. If the area of mathematics is known in advance, then an area-specific information can be used for the recognition or output problem. What is more interesting is that although the specifics of which symbols are ranked as most frequent vary from area to area, the shape of the relative frequency curve remains the same. The present work examines the inverse problem: Given the relative frequencies of symbols in a document, is it possible to classify the document and determine the most likely area of mathematics of the work? We examine the symbol frequency “fingerprints” for the different areas of the Mathematical Subject Classification. (English)
Keyword: mathematical document classification
Keyword: Mathematical Subject Classification
MSC: 68P99
MSC: 68U10
MSC: 68U15
idZBL: Zbl 1170.68494
.
Date available: 2011-07-18T09:18:51Z
Last updated: 2012-08-27
Stable URL: http://hdl.handle.net/10338.dmlcz/702543
.
Reference: 1. : arXiv e-Print archive., http://arxiv.org.
Reference: 2. : 2000 Mathematics Subject Classification.. American Mathematical Society, http://www.ams.org/msc.
Reference: 3. Garain, U., Chaudhuri, B. B.: A corpus for OCR research on mathematical expressions., International Journal on Document Analysis and Recognition, Vol. 7, Issue 4, pp. 241–259. (September 2005).
Reference: 4. Uchida, S., Nomura, A., Suzuki, M.: Quantitative analysis of mathematical documents., International Journal on Document Analysis and Recognition, Vol. 7, Issue 4, pp. 211–218. (September 2005).
Reference: 5. Clare, M. So, Watt, S. M.: Determining Empirical Properties of Mathematical Expression Use., Proc. Fourth International Conference on Mathematical Knowledge Management, (MKM 2005), July 15–17, 2005, Bremen Germany, Springer Verlag LNCS 3863, pp. 361–375.
Reference: 6. Clare, M. So: An Analysis of Mathematical Expressions Used in Practice., Masters Thesis, University of Western Ontario, 2005.
Reference: 7. Watt, S. M.: Exploiting Implicit Mathematical Semantics in Conversion between TeX and MathML., Proc. Internet Accessible Mathematical Communication,http://www.symbolicnet.org/conferences/iamc02, July 7, 2002, Lille, France.
Reference: 8. Watt, S. M.: An Empirical Measure on the Set of Symbols Occurring in Engineering Mathematics Texts., Proc. 8$th$ IAPR International Workshop on Document Analysis Systems, (DAS 2008), Sept 17–19, 2008, Nara, Japan, (IEEE, to appear).
Reference: 9. Kreyszig, E.: Advanced Engineering Mathematics, 8$th$ ed.., Wiley & Sons 1999. MR 1665766
Reference: 10. Kreyszig, E.: Advanced Engineering Mathematics, 9$th$ ed.., Wiley & Sons 2006.
Reference: 11. Greenberg, M.: Advanced Engineering Mathematics, 2$nd$ ed.., Prentice Hall 1998.
Reference: 12. O’Neil, P.: Advanced Engineering Mathematics, 5$th$ ed.., Thomson-Nelson 2003.
Reference: 13. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty—an integrated OCR system for mathematical documents., Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, 2003, pp. 95–104.
Reference: 14. Smirnova, E., Watt, S. M.: Context-Sensitive Mathematical Character Recognition., August 19–21, 2008, Montreal, Canada, (IEEE, to appear).
Reference: 15. Zipf, G. K.: Human Behavior and the Principle of Least-Effort., Addison-Wesley, 1949.
.

Files

Files Size Format View
DML_001-2008-1_5.pdf 829.5Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo