Previous |  Up |  Next


mathematical document classification; Mathematical Subject Classification
Earlier work has examined the frequency of symbol and expression use in mathematical documents for various purposes including mathematical handwriting recognition and forming the most natural output from computer algebra systems. This work has found, unsurprisingly, that the particulars of symbol and expression vary from area to area and, in particular, between different top-level subjects of the 2000 Mathematical Subject Classification. If the area of mathematics is known in advance, then an area-specific information can be used for the recognition or output problem. What is more interesting is that although the specifics of which symbols are ranked as most frequent vary from area to area, the shape of the relative frequency curve remains the same. The present work examines the inverse problem: Given the relative frequencies of symbols in a document, is it possible to classify the document and determine the most likely area of mathematics of the work? We examine the symbol frequency “fingerprints” for the different areas of the Mathematical Subject Classification.
1. arXiv e-Print archive.
2. 2000 Mathematics Subject Classification. American Mathematical Society,
3. Garain, U., Chaudhuri, B. B.: A corpus for OCR research on mathematical expressions. International Journal on Document Analysis and Recognition, Vol. 7, Issue 4, pp. 241–259. (September 2005).
4. Uchida, S., Nomura, A., Suzuki, M.: Quantitative analysis of mathematical documents. International Journal on Document Analysis and Recognition, Vol. 7, Issue 4, pp. 211–218. (September 2005).
5. Clare, M. So, Watt, S. M.: Determining Empirical Properties of Mathematical Expression Use. Proc. Fourth International Conference on Mathematical Knowledge Management, (MKM 2005), July 15–17, 2005, Bremen Germany, Springer Verlag LNCS 3863, pp. 361–375.
6. Clare, M. So: An Analysis of Mathematical Expressions Used in Practice. Masters Thesis, University of Western Ontario, 2005.
7. Watt, S. M.: Exploiting Implicit Mathematical Semantics in Conversion between TeX and MathML. Proc. Internet Accessible Mathematical Communication,, July 7, 2002, Lille, France.
8. Watt, S. M.: An Empirical Measure on the Set of Symbols Occurring in Engineering Mathematics Texts. Proc. 8$th$ IAPR International Workshop on Document Analysis Systems, (DAS 2008), Sept 17–19, 2008, Nara, Japan, (IEEE, to appear).
9. Kreyszig, E.: Advanced Engineering Mathematics, 8$th$ ed. Wiley & Sons 1999. MR 1665766
10. Kreyszig, E.: Advanced Engineering Mathematics, 9$th$ ed. Wiley & Sons 2006.
11. Greenberg, M.: Advanced Engineering Mathematics, 2$nd$ ed. Prentice Hall 1998.
12. O’Neil, P.: Advanced Engineering Mathematics, 5$th$ ed. Thomson-Nelson 2003.
13. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty—an integrated OCR system for mathematical documents. Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, 2003, pp. 95–104.
14. Smirnova, E., Watt, S. M.: Context-Sensitive Mathematical Character Recognition. August 19–21, 2008, Montreal, Canada, (IEEE, to appear).
15. Zipf, G. K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, 1949.
Partner of
EuDML logo