Previous |  Up |  Next

Article

Keywords:
Content MathML; OCR
Summary:
We report on a new project to design a semantic ground truth set for mathematical document analysis. The ground truth set will be generated by annotating recognised mathematical symbols with respect to both their global meaning in the context of the considered documents and their local function within the particular mathematical formula they occur. The aim of our work is to have a reliable database available for semantic classification during the formula recognition process with the aim of enabling correct interpretations of mathematical formulae and generating semantic markup such as Content MathML.
References:
1. Aly, W., Uchida, S., Fujiyoshi, A., Suzuki, M.: Statistical classification of spatial relationships among mathematical symbols. In: Proceedings of ICDAR 2009, pages 1350–1354. IEEE Society Press, 2009.
2. Baker, J., Sexton, A., Sorge, V.: A linear grammar approach to mathematical formula recognition from PDF. In: Proceedings of Intelligent Computer Mathematics, LNAI. Springer Verlag, Germany, 2009.
3. Baker, J., Sexton, A., Sorge, V.: Faithful mathematical formula recognition from PDF documents. In: Proceedings of DAS 2010, 2010. Forthcoming.
4. Buswell, S., Caprotti, O., Carlisle, D. P., Dewar, M. C., Gaëtano, M., Kohlhase, M.: The OpenMath Standard. The OpenMath Society, June 2004.
5. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty—an integrated OCR system for mathematical documents. In: Proceedings of ACM Symposium on Document Engineering, pages 95–104. ACM Press, 2003.
6. Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Proceedings of ICDAR 2005, pages 675–679. IEEE Society Press, 2005.
7. The American Mathematical Society: 2000 Mathematics Subject Classification. 2000. http://www.ams.org/msc/
8. Beusekom, J. van, Shafait, F., Breuel, T. M.: Automated OCR ground truth generation. In: Proceedings of DAS 2008, Sep 2008.
Partner of
EuDML logo