Big data management promises to bring a significant improvement in people’s lives, accelerating knowledge discovery, research and innovation. However, in the last few years, there is an increasing concern regarding the lack of fairness (leading to bias), diversity (leading to exclusion), and transparency (leading to opacity) of datadriven algorithms supporting decision-making, raising a call for responsible by design automated decision-making systems. So far, efforts for responsible decision making have mostly focused on Machine Learning algorithms, assuming that they have been trained on high-quality data, ignoring the underlying complex pipelines that may have produced such data. A core data pipeline for producing such data is entity resolution (ER), which discovers and unifies descriptions that correspond to the same real-world entities.
In this project, we target ER systems that are responsible by design, in particular when decisions about which entity descriptions should be resolved first need to be made with respect to a given budget. The objectives of ReponsibleER are: (a) to enrich the diversity of resolved entities, (b) to ensure fairness of resolved entities, and (c) to enhance the transparency of ER systems. For (a), we are interested in formalizing progressive ER as an optimization problem, with the objective of maximizing the diversity. For (b), we are interested in measures of centrality for matching candidates in an entity graph processed by a progressive ER algorithm, then ensuring that all groups are fairly represented in the results. For (c), we need to extend the indices used for (a) and (b) and provide meaningful explanations regarding the intermediate decisions taken throughout an ER process (e.g., indexing, matching). While all three problems have been defined as major challenges recently by EU and US regulators, to the best of our knowledge, there is no other work in ER that has ever studied any of those objectives.