Previous |  Up |  Next

Article

Title: Workflow of Metadata Extraction from Retro-Born Digital Documents (English)
Author: Tkaczyk, Dominika
Author: Bolikowski, Łukasz
Language: English
Journal: Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011
Volume:
Issue: 2011
Year:
Pages: 39-44
.
Category: math
.
Summary: In this work-in-progress report we propose a workflow for metadata extraction from articles in a digital form. We decompose the problem into clearly defined sub-tasks and outline possible implementations of the sub-tasks. We report the progress of implementation and tests, and state future work. (English)
Keyword: metadata extraction
Keyword: page segmentation
Keyword: zone classification
Keyword: Hidden Markov Model
MSC: 68-06
MSC: 68U10
MSC: 68U15
MSC: 68U99
.
Date available: 2011-07-15T09:26:55Z
Last updated: 2012-08-27
Stable URL: http://hdl.handle.net/10338.dmlcz/702601
.
Reference: 1. : iText.http://itextpdf.com/.
Reference: 2. : MARG.http://marg.nlm.nih.gov/. Zbl 1143.68407
Reference: 3. : PDFBox.http://pdfbox.apache.org/
Reference: 4. : Automating the production of bibliographic records for MEDLINE.Tech. rep. (2001).
Reference: 5. Cui, B., Chen, X.: An improved hidden Markov model for literature metadata extraction.Advanced Intelligent Computing Theories and Applications. pp. 205–212 (2010).
Reference: 6. Hetzner, E.: A simple method for citation metadata extraction using Hidden Markov Models.In: JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries. pp. 280–284. ACM, New York, NY, USA (2008).
Reference: 7. Marinai, S.: Metadata Extraction from PDF Papers for Digital Library Ingest.10th International Conference on Document Analysis and Recognition. pp. 251–255 (2009).
Reference: 8. Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals.Computer 25(7), 10–22 (1992).
Reference: 9. O’Gorman, L.: The document spectrum for page layout analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1162–1173 (1993).
Reference: 10. Sojka, P.: An Experience with Building Digital Open Access Repository DML-CZ.In: Proceedings of CASLIN 2009. pp. 74–78 (2009).
Reference: 11. Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning.(2006).
.

Files

Files Size Format View
DML_004-2011-1_8.pdf 203.9Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo