So I am looking over my database notes and material trying to refresh myself on the general concepts and terminology for upcoming interviews. I have gotten stuck at dependency however and lossless-join decompositions though. I have searched all over and see lots of mathy equations but I am looking for a plain and simple English response or example.
I have a found a powerpoint from http://www.cs.kent.edu/~jin/DM09Fall/lecture6.ppt which illustrates an example that I cannot fully understand. It is posted below.
R = (A, B, C)F = {A → B, B → C)
Can be decomposed in two different ways
R1 = (A, B), R2 = (B, C)
Lossless-join decomposition:
R1 ∩ R2 = {B} and B → BC
Dependency preserving
R1 = (A, B), R2 = (A, C)
Lossless-join decomposition:
R1 ∩ R2 = {A} and A → AB
Not dependency preserving (cannot check B -> C without computing R1 ⋈ R2)
So I understand that having A → B and B → C means that you have a "reference" in each other, whereas A → B and A → C means there is no reference or link between B and C.
So,
Does Lossless-join decomposition mean that the data overall is still intact? In both scenarios, you can still eventually get both data, right? If this is wrong, please correct me! :)
What is the significance of having that connection B to C in the second decomposition, and how does that make it not dependency preserving?
If A is deleted you will simply have B and C with no relations.
If B is deleted you will still have A → C.
If C is deleted you will still have A → B.
Because even in the first example you will still end up with similiar results upon deleting values.
If A is deleted you will still have a relation of B → C.
If B is deleted you will simply have A and C with no relations.
If C is deleted you will have a relation of A → B.
So either way, if you delete each item you will still have two instances of a relation and one instance of two items having no relations
My guess would be that in deleting the "middle man relation" (is there a term for that), B in example 1 and A in example 2, is that you can still relate example 1's A → B then B → C, while in example 2 you are stuck with A → B with no connection to A → C.
But as you can see I am now back to square one as to why this causes data dependency and while googling "what is data dependency" or "examples of data dependency" it is simply not making any sense to me.
If anyone could help clarify this for me it would be greatly appreciated.