Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR555:
On an Information Theoretic Approximation Measure for Functional Dependencies

Chris Giannella and Edward Robertson
(Aug 2001), 12 pages pages
Abstract:
We investigate the problem of defining an approximation measure for functional dependencies (FDs). For fixed sets of attributes, X and Y, an approximation measure is a function which maps relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency, X --> Y, in that instance. We define an approximation measure for FDs based on a connection between Shannon's information theory and relational database theory. Our measure is normalized to lie between zero and one (inclusive), and maps a relation instance to zero if and only if X --> Y holds in the instance. Hence, the smaller the number to which an instance is mapped, the ``closer'' X --> Y is to being an FD in the instance.

To put our measure in context, we compare it to a slight variation of a measure previously defined by Kivinen and Mannila, g_3. We denote the variation as \hat{g_3}, although, our results, essentially, apply unchanged to g_3. The purpose of comparing our measure with \hat{g_3} is to develop a deeper understanding of not only our measure, but also, \hat{g_3}. Moreover, we gain a deeper understanding of the natural intuitive notion of an approximate FD. We observe that our measure and \hat{g_3} agree at their extremes but are quite different in-between. As a result, we conclude that our measure and \hat{g_3} are significantly different. An interesting question emerges from this conclusion: is there a rigorous way to determine when one measure better captures the meaning of the degree to which an FD is approximate?

Available as: