Previous |  Up |  Next

Article

Keywords:
name disambiguation; problem decomposition; scoring functions; single-linkage clustering; MapReduce framework; machine learning
Summary:
In this paper we propose a flexible, modular framework for author name disambiguation. Our solution consists of the core which orchestrates the disambiguation process, and replaceable modules performing concrete tasks. The approach is suitable for distributed computing, in particular it maps well to the MapReduce framework. We describe each component in detail and discuss possible alternatives. Finally, we propose procedures for calibration and evaluation of the described system.
References:
1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB. vol. 1215, pp. 487–499. Citeseer (1994).
2. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 1–13 (2004).
3. Galvez, C., Moya-Anegón, F.: Approximate personal name-matching through finite-state graphs. Journal of the American Society for Information Science and Technology 58(13), 1960–1976 (Nov 2007).
4. Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries – JCDL ’04, p. 296 (2004).
5. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a Kway spectral clustering method. In: JCDL ’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries. pp. 334–343. ACM, New York, NY, USA (2005).
6. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning. Springer (2009). MR 2722294
7. Kang, I., Na, S., Lee, S., Jung, H., Kim, P., Sung, W., Lee, J.: On co-authorship for author disambiguation. Information Processing & Management 45(1), 84–97 (Jan 2009).
8. Mann, G. S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. pp. 33–40. Association for Computational Linguistics, Morristown, NJ, USA (2003).
9. Manning, C. D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. (2008). Zbl 1160.68008
10. Pavelec, D., Oliveira, L. S., Justino, E., Nobre Neto, F. D., Batista, L.V.: Compression and stylometry for author identification. 2009 International Joint Conference on Neural Networks, pp. 2445–2450 (Jun 2009).
11. Pedersen, T., Kulkarni, A., Angheluta, R., Kozareva, Z., Solorio, T.: An unsupervised language independent method of name discrimination using second order cooccurrence features. pp. 208–222 (2006).
12. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and mining of academic social networks. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 990–998. ACM (2008).
13. Torvik, V. I., Smalheiser, N. R.: Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data 3(3), 1–29 (Jul 2009).
Partner of
EuDML logo