Previous |  Up |  Next


Title: Towards a Flexible Author Name Disambiguation Framework (English)
Author: Bolikowski, Łukasz
Author: Dendek, Piotr Jan
Language: English
Journal: Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011
Issue: 2011
Pages: 27-37
Category: math
Summary: In this paper we propose a flexible, modular framework for author name disambiguation. Our solution consists of the core which orchestrates the disambiguation process, and replaceable modules performing concrete tasks. The approach is suitable for distributed computing, in particular it maps well to the MapReduce framework. We describe each component in detail and discuss possible alternatives. Finally, we propose procedures for calibration and evaluation of the described system. (English)
Keyword: name disambiguation
Keyword: problem decomposition
Keyword: scoring functions
Keyword: single-linkage clustering
Keyword: MapReduce framework
Keyword: machine learning
MSC: 68-06
MSC: 68U10
MSC: 68U15
MSC: 68U99
Date available: 2011-07-15T09:26:05Z
Last updated: 2012-08-27
Stable URL:
Reference: 1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules.In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB. vol. 1215, pp. 487–499. Citeseer (1994).
Reference: 2. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters.Communications of the ACM 51(1), 1–13 (2004).
Reference: 3. Galvez, C., Moya-Anegón, F.: Approximate personal name-matching through finite-state graphs.Journal of the American Society for Information Science and Technology 58(13), 1960–1976 (Nov 2007).
Reference: 4. Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations.Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries – JCDL ’04, p. 296 (2004).
Reference: 5. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a Kway spectral clustering method.In: JCDL ’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries. pp. 334–343. ACM, New York, NY, USA (2005).
Reference: 6. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning.Springer (2009). MR 2722294
Reference: 7. Kang, I., Na, S., Lee, S., Jung, H., Kim, P., Sung, W., Lee, J.: On co-authorship for author disambiguation..Information Processing & Management 45(1), 84–97 (Jan 2009).
Reference: 8. Mann, G. S., Yarowsky, D.: Unsupervised personal name disambiguation.In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. pp. 33–40. Association for Computational Linguistics, Morristown, NJ, USA (2003).
Reference: 9. Manning, C. D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval.(2008). Zbl 1160.68008
Reference: 10. Pavelec, D., Oliveira, L. S., Justino, E., Nobre Neto, F. D., Batista, L.V.: Compression and stylometry for author identification.2009 International Joint Conference on Neural Networks, pp. 2445–2450 (Jun 2009).
Reference: 11. Pedersen, T., Kulkarni, A., Angheluta, R., Kozareva, Z., Solorio, T.: An unsupervised language independent method of name discrimination using second order cooccurrence features.pp. 208–222 (2006).
Reference: 12. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and mining of academic social networks.In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 990–998. ACM (2008).
Reference: 13. Torvik, V. I., Smalheiser, N. R.: Author name disambiguation in MEDLINE.ACM Transactions on Knowledge Discovery from Data 3(3), 1–29 (Jul 2009).


Files Size Format View
DML_004-2011-1_7.pdf 420.5Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo