Database repair

Summary

The problem of database repair is a question about relational databases which has been studied in database theory, and which is a particular kind of data cleansing. The problem asks about how we can "repair" an input relational database in order to make it satisfy integrity constraints. The goal of the problem is to be able to work with data that is "dirty", i.e., does not satisfy the right integrity constraints, by reasoning about all possible repairs of the data, i.e., all possible ways to change the data to make it satisfy the integrity constraints, without committing to a specific choice.

Several variations of the problem exist, depending on:

  • what we intend to figure out about the dirty data: figuring out if some database tuple is certain (i.e., is in every repaired database), figuring out if some query answer is certain (i.e., the answer is returned when evaluating the query on every repaired database)
  • which kinds of ways are allowed to repair the database: can we insert new facts, remove facts (so-called subset repairs), and so on
  • which repaired databases do we study: those where we only change a minimal subset of the database tuples (e.g., minimal subset repairs), those where we only change a minimal number of database tuples (e.g., minimal cardinality repairs)

The problem of database repair has been studied to understand what is the complexity of these different problem variants, i.e., can we efficiently determine information about the state of the repairs, without explicitly materializing all of these repairs.

References edit

  • Arenas, Marcelo; Bertossi, Leopoldo; Chomicki, Jan (1999). Consistent Query Answers in Inconsistent Databases (PDF). PODS.

See also edit