Evaluation and Monitoring of Entity Resolution in Production Environments

Main Article Content

Abstract

This paper presents a production-oriented evaluation framework for entity resolution that operates without traditional ground truth data. We address the challenge of evaluating ER quality in production environments where ground truth data are unavailable, by combining continuous monitoring, domain constraints, and synthetic data generation. Our experiments show that the system has very high precision (0.99). However, the recall is low (0.41), many true matches are missed, resulting in an F-measure of 0.58. Our approach combines string similarity function optimization, adaptive blocking key design, and domain constraint validation to improve recall while maintaining high precision. The framework has been validated in a large-scale production environment processing millions of entity records daily, demonstrating practical applicability for industrial ER systems.

Article Details

Section
Articles