Evaluation and Monitoring of Entity Resolution in Production Environments
Main Article Content
Abstract
This paper presents a production-oriented evaluation framework for entity resolution that operates without traditional ground truth data. We address the challenge of evaluating ER quality in production environments where ground truth data are unavailable, by combining continuous monitoring, domain constraints, and synthetic data generation. Our experiments show that the system has very high precision (0.99). However, the recall is low (0.41), many true matches are missed, resulting in an F-measure of 0.58. Our approach combines string similarity function optimization, adaptive blocking key design, and domain constraint validation to improve recall while maintaining high precision. The framework has been validated in a large-scale production environment processing millions of entity records daily, demonstrating practical applicability for industrial ER systems.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.