APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design.We first present the theory of multivalued dependencies,including a set of sound and complete inference rules for multivalued dependencies.We then present PJNF and DKNF,two normal forms based on classes of constraints that generalize multivalued dependencies. In this chapter we illustrate our concepts using a bank enterprise with the schema shown in Figure 2.15. B.1 Multivalued Dependencies As we did for functional dependencies and 3NF and BCNF,we shall need to determine all the multivalued dependencies that are logically implied by a given set of multivalued dependencies. B.1.1 Theory of Multivalued Dependencies We take the same approach here that we did earlier for functional dependencies. Let D denote a set of functional and multivalued dependencies.The closure D+of D is the set of all functional and multivalued dependencies logically implied by D.As we did for functional dependencies,we can compute D+from D,using the formal definitions of functional dependencies and multivalued dependencies. However,it is usually easier to reason about sets of dependencies by using a system of inference rules. The following list of inference rules for functional and multivalued dependen- cies is sound and complete.Recall that sound rules do not generate any dependencies that are not logically implied by D,and complete rules allow us to generate all dependencies in D+.The first three rules are Armstrong's axioms,which we saw earlier in Chapter 8. 1.Reflexivity rule.If a is a set of attributes,and B C a,then aB holds
APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design. We first present the theory of multivalued dependencies, including a set of sound and complete inference rules for multivalued dependencies. We then present PJNF and DKNF, two normal forms based on classes of constraints that generalize multivalued dependencies. In this chapter we illustrate our concepts using a bank enterprise with the schema shown in Figure 2.15. B.1 Multivalued Dependencies As we did for functional dependencies and 3NF and BCNF, we shall need to determine all the multivalued dependencies that are logically implied by a given set of multivalued dependencies. B.1.1 Theory of Multivalued Dependencies We take the same approach here that we did earlier for functional dependencies. Let D denote a set of functional and multivalued dependencies. The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. As we did for functional dependencies, we can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies. However, it is usually easier to reason about sets of dependencies by using a system of inference rules. The following list of inference rules forfunctional and multivalued dependencies is sound and complete. Recall thatsound rules do not generate any dependencies that are not logically implied by D, and complete rules allow us to generate all dependencies in D+. The first three rules are Armstrong’s axioms, which we saw earlier in Chapter 8. 1. Reflexivity rule. If is a set of attributes, and ⊆ , then → holds. 1
Appendix B Advanced Relational Database Design 2.Augmentation rule.If aB holds,and y is a set of attributes,then ya→yβholds. 3.Transitivity rule.Ifa→B holds,.andB→y holds,then o→y holds. 4.Complementation rule..Ifa→→B holds.,then a→yR-B-a holds.. 5.Multivalued augmentation rule..Ifa→B holds,andy≤RandδsY, then yo→8B holds. 6.Multivalued transitivity rule.fa→→B holds,andB→→y holds,then a→→y-B holds. 7.Replication rule.Ifa→B holds,.then a→→B. &.Coalescence rule.Ifa→B holds,,and y B,and there is aδsuch thatδ sR,andδnB=i,andδ→Y,then a→y holds.. The bibliographical notes provide references to proofs that the preceding rules are sound and complete.The following examples provide insight into how the formal proofs proceed. Let R =(A,B,C,G,H,I)be a relation schema.Suppose that ABC holds.The definition of multivalued dependencies implies that,if h[A]=t2[A], then there exist tuples t3 and t4 such that [A]=2[A]=t[A=t4[A] t3[BC]=t[BC] tIGHI=GHI] t[GHI=t[GHI t4[BC]=[BC] The complementation rule states that,if A>BC,then A>GHI.Observe that f3 and t4 satisfy the definition of A>GHI if we simply change the subscripts. We can provide similar justification for rules 5 and 6(see Exercise B.2)using the definition of multivalued dependencies. Rule 7,the replication rule,involves functional and multivalued dependen- cies.Suppose that A-BC holds on R.If h[A]t2[A]and h[BC]t2[BC],then h and t2 themselves serve as the tuples t3 and t4 required by the definition of the multivalued dependency A→→BC. Rule 8,the coalescence rule,is the most difficult of the eight rules to verify (see Exercise B.4). We can simplify the computation of the closure of D by using the following rules,which we can prove using rules 1 to 8(see Exercise B.5): ·Multivalued union rule.fa→B holds,anda→→y holds,then a→→βy holds ·Intersection rule.fa→B holds,,anda→y holds,then o→→Bny holds
2 Appendix B Advanced Relational Database Design 2. Augmentation rule. If → holds, and is a set of attributes, then → holds. 3. Transitivity rule. If → holds, and → holds, then → holds. 4. Complementation rule. If →→ holds, then →→ R − − holds. 5. Multivalued augmentation rule. If →→ holds, and ⊆ R and ⊆ , then →→ holds. 6. Multivalued transitivity rule. If →→ holds, and →→ holds, then →→ − holds. 7. Replication rule. If → holds, then →→ . 8. Coalescence rule. If →→ holds, and ⊆ , and there is a such that ⊆ R, and ∩ = ∅, and → , then → holds. The bibliographical notes provide references to proofs that the preceding rules are sound and complete. The following examples provide insight into how the formal proofs proceed. Let R = (A, B,C, G, H, I) be a relation schema. Suppose that A →→ BC holds. The definition of multivalued dependencies implies that, if t1[A] = t2[A], then there exist tuples t3 and t4 such that t1[A] = t2[A] = t3[A] = t4[A] t3[BC] = t1[BC] t3[GHI] = t2[GHI] t4[GHI] = t1[GHI] t4[BC] = t2[BC] The complementation rule states that, if A →→ BC , then A →→ GHI. Observe that t3 and t4 satisfy the definition of A→→GHIif we simply change the subscripts. We can provide similar justification for rules 5 and 6 (see Exercise B.2) using the definition of multivalued dependencies. Rule 7, the replication rule, involves functional and multivalued dependencies. Suppose that A → BC holds on R. If t1[A] = t2[A] and t1[BC] = t2[BC], then t1 and t2 themselves serve as the tuples t3 and t4 required by the definition of the multivalued dependency A →→ BC. Rule 8, the coalescence rule, is the most difficult of the eight rules to verify (see Exercise B.4). We can simplify the computation of the closure of D by using the following rules, which we can prove using rules 1 to 8 (see Exercise B.5): • Multivalued union rule. If →→ holds, and →→ holds, then →→ holds. • Intersection rule. If →→ holds, and →→ holds, then →→ ∩ holds.
B.1 Multivalued Dependencies 3 ·Difference rule.Ifa→→B holds,anda→y holds,then o→→B-y holds anda→yy-B holds. Let us apply our rules to the following example.Let R=(A,B,C,G.H.I) with the following set of dependencies D given: A→→B B→→H1 CG→H We list several members of D+here: ·A→CGHI:Since A→B,the complementation rule(rule4)implies that A→→R-B-AR-B-A=CGHI,s0A→→CGHI. 。A→→Hl:Since A→B and B→→Hl,the multivalued transitivity rule(rule 6)implies that A→→HI-B.Since H-B=HI,A→→Hl. .B-H:To show this fact,we need to apply the coalescence rule (rule 8). B→→HI holds.Since HC HI and CG→H and CGnHI=O,we satisfy the statement of the coalescence rule,with a being B,B being HI,being CG,and y being H.We conclude that BH. ·A→→CG:We already know that A→→CGHI and A→HL.By the difference rule,A→→CGHI-HI.Since CGHI-HI=CG,A→→CG. B.1.2 Dependency Preservation The question of dependency preservation when we have multivalued dependen- cies is not as simple as it is when we have only functional dependencies. A decomposition of schema R into schemas R1,R2,...,R is a dependency- preserving decomposition with respect to a set D of functional and multivalued dependencies if,for every set of relationsr(R1).r2(R2)....,rn(R)such that for all i,ri satisfies D:(the restriction of D to R),there exists a relationr(R)that satisfies D and for which ri nIg(r)for all i. Let us apply the 4NF decomposition algorithm of Figure 8.16 to the schema R=(A,B,C,G,H,I)with D={A→yB,B→→Hl,CG→H.We shall then test the resulting decomposition for dependency preservation.We first need to compute the closure of D.The nontrivial dependencies in closure include all the dependencies in D,and the multivalued dependency A->HI,as we saw in Section B.1.1. R is not in 4NF.Observe that AB is not trivial,yet A is not a superkey. Using AB in the first iteration of the while loop,we replace R with two schemas,(A,B)and (A,C.G.H.1).It is easy to see that (A,B)is in 4NF since all multivalued dependencies that hold on (A,B)are trivial.However,the schema (A.C,G.H.I)is not in 4NF.Applying the multivalued dependency CG-H
B.1 Multivalued Dependencies 3 • Difference rule. If →→ holds, and →→ holds, then →→ − holds and →→ − holds. Let us apply our rules to the following example. Let R = (A, B,C, G, H, I) with the following set of dependencies D given: A →→ B B →→ HI CG → H We list several members of D+ here: • A →→ CGHI: Since A →→ B, the complementation rule (rule 4) implies that A →→ R − B − A. R − B − A = CGHI, so A →→ CGHI. • A →→ HI: Since A →→ B and B →→ HI, the multivalued transitivity rule (rule 6) implies that A →→ HI − B. Since HI − B = HI, A →→ HI. • B → H: To show this fact, we need to apply the coalescence rule (rule 8). B →→ HI holds. Since H ⊆ HI and CG → H and CG ∩ HI = ∅, we satisfy the statement of the coalescence rule, with being B, being HI, being CG, and being H. We conclude that B → H. • A →→ CG: We already know that A →→CGHI and A →→ HI. By the difference rule, A →→ CGHI − HI. Since CGHI − HI = CG, A →→ CG. B.1.2 Dependency Preservation The question of dependency preservation when we have multivalued dependencies is not as simple as it is when we have only functional dependencies. A decomposition of schema R into schemas R1, R2,..., Rn is a dependencypreserving decomposition with respect to a set D of functional and multivalued dependencies if, for every set of relations r1(R1),r2(R2),...,rn(Rn) such that for all i, ri satisfies Di (the restriction of D to Ri), there exists a relation r(R) that satisfies D and for which ri = Ri(r) for all i. Let us apply the 4NF decomposition algorithm of Figure 8.16 to the schema R = (A, B,C, G, H, I) with D = {A →→ B, B →→ HI, CG → H}. We shall then test the resulting decomposition for dependency preservation. We first need to compute the closure of D. The nontrivial dependencies in closure include all the dependencies in D, and the multivalued dependency A →→ HI, as we saw in Section B.1.1. R is not in 4NF. Observe that A →→ B is not trivial, yet A is not a superkey. Using A →→ B in the first iteration of the while loop, we replace R with two schemas, (A, B) and (A,C, G, H, I). It is easy to see that (A, B) is in 4NF since all multivalued dependencies that hold on (A, B) are trivial. However, the schema (A,C, G, H, I) is not in 4NF. Applying the multivalued dependency CG→→H
Appendix B Advanced Relational Database Design r1:A B a1 b1 a2 b1 T2:CGH c181h1 c2&2h2 r3:A1 a111 a212 r4: ACG a1c181 a2C282 Figure B.1 Projection of relation r onto a 4NF decomposition of R. (which follows from the given functional dependency CG->H by the replication rule),we replace(A,C,G,H.,I)with the two schemas(C,G,H)and (A,C,G,I). Schema(C,G,H)is in 4NF,but schema(A,C,G,I)is not.To see that(A,C,G,I) is not in4NF,we note that since A→→HI is in D+,A→→I is in the restriction of D to(A,C,G,I).Thus,in a third iteration of the while loop,we replace(A,C,G,I) with two schemas (A.1)and (A.C.G).The algorithm then terminates and the resulting 4NF decomposition is {(A,B).(C.G.H),(A,I),(A,C,G)}. This 4NF decomposition is not dependency preserving,since it fails to preserve the multivalued dependency BHI.Consider Figure B.1,which shows the four relations that may result from the projection of a relation on(A,B.C,G,H,I) onto the four schemas of our decomposition.The restriction of D to (A,B)is A>B and some trivial dependencies.It is easy to see that ri satisfies AB, because there is no pair of tuples with the same A value.Observe that r2 satisfies all functional and multivalued dependencies,since no two tuples in r2 have the same value on any attribute.A similar statement can be made for r3 and r4. Therefore,the decomposed version of our database satisfies all the dependencies in the restriction of D.However,there is no relation r on (A,B,C,G,H,I)that satisfies D and decomposes into ri.r2,r3,and r4.Figure B.2 shows the relation r=rh凶r2r3r4.Relation r does not satisfy B→→Hl.Any relation s containing r and satisfying B>HI must include the tuple(a2,b1,c2,82,hi,i1). However,IcGH (s)includes a tuple (c2.82.h)that is not in r2.Thus,our decomposition fails to detect a violation of BHI. We have seen that,if we are given a set of multivalued and functional depen- dencies,it is advantageous to find a database design that meets the three criteria of
4 Appendix B Advanced Relational Database Design r1 : A B 1 b1 2 b a a a a a a 1 r2 : C G H c1 g1 h1 c2 g2 h2 r3 : A I 1 1 2 2 r4 : A C G 1 c1 g1 2 c2 g2 i i Figure B.1 Projection of relation r onto a 4NF decomposition of R. (which follows from the given functional dependency CG → H by the replication rule), we replace (A,C, G, H, I) with the two schemas (C, G, H) and (A,C, G, I). Schema (C, G, H) is in 4NF, but schema (A,C, G, I) is not. To see that (A,C, G, I) is not in 4NF, we note that since A →→ HI is in D+, A →→ I is in the restriction of D to (A,C, G, I). Thus, in a third iteration of the while loop, we replace (A,C, G, I) with two schemas (A, I) and (A,C, G). The algorithm then terminates and the resulting 4NF decomposition is {(A, B), (C, G, H), (A, I), (A,C, G)}. This 4NFdecomposition is not dependency preserving, since it fails to preserve the multivalued dependency B →→ H I. Consider Figure B.1, which shows the four relations that may result from the projection of a relation on (A, B,C, G, H, I) onto the four schemas of our decomposition. The restriction of D to (A, B) is A →→ B and some trivial dependencies. It is easy to see that r1 satisfies A →→ B, because there is no pair of tuples with the same A value. Observe that r2 satisfies all functional and multivalued dependencies, since no two tuples in r2 have the same value on any attribute. A similar statement can be made for r3 and r4. Therefore, the decomposed version of our database satisfies all the dependencies in the restriction of D. However, there is no relation r on (A, B,C, G, H, I) that satisfies D and decomposes into r1,r2,r3, and r4. Figure B.2 shows the relation r = r1 ✶ r2 ✶ r3 ✶ r4. Relation r does not satisfy B →→ HI. Any relation s containing r and satisfying B →→ HI must include the tuple (a2, b1, c2, g2, h1, i1). However, CGH (s) includes a tuple (c2, g2, h1) that is not in r2. Thus, our decomposition fails to detect a violation of B →→ HI. We have seen that, if we are given a set of multivalued and functional dependencies, it is advantageous to find a database design that meets the three criteria of
B.2 Join Dependencies 1.4NF 2.Dependency preservation 3.Lossless join If all we have are functional dependencies,then the first criterion is just BCNF. We have seen also that it is not always possible to meet all three of these criteria.We succeeded in finding such a decomposition for the bank example,but failed for the example of schema R =(A,B,C,G,H,I). When we cannot achieve our three goals,we have to compromise on 4NF or dependency preservation. B.2 Join Dependencies We have seen that the lossless-join property is one of several properties of a good database design.Indeed,this property is essential:Without it,information is lost. When we restrict the set of legal relations to those satisfying a set of functional and multivalued dependencies,we are able to use these dependencies to show that certain decompositions are lossless-join decompositions. Because of the importance of the concept of lossless join,it is useful to be able to constrain the set of legal relations over a schema R to those relations for which a given decomposition is a lossless-join decomposition.In this section,we define such a constraint,called a join dependency.Just as types of dependency led to other normal forms,join dependencies will lead to a normal form called project-join normal form(PJNF). B.2.1 Definition of Join Dependencies Let R be a relation schema and R1,R2.....R be a decomposition of R.The join dependency *(R1,R2,...,R)is used to restrict the set of legal relations to those for which Ri,R2.....Rn is a lossless-join decomposition of R.Formally,if R R1U R2 U...U Rn,we say that a relation r(R)satisfies the join dependency *(R1,R2,,R)if r=ΠR1(r)凶ΠR2(r)·凶ΠR(r) A join dependency is trivial if one of the R;is R itself. ABCG HI a1 b1 c1 81 h1 i1 a201 C2 82 h2i2 Figure B.2 A relation r(R)that does not satisfy BHI
B.2 Join Dependencies 5 1. 4NF 2. Dependency preservation 3. Lossless join If all we have are functional dependencies, then the first criterion is just BCNF. We have seen also that it is not always possible to meet all three of these criteria. We succeeded in finding such a decomposition for the bank example, but failed for the example of schema R = (A, B, C, G, H, I). When we cannot achieve our three goals, we have to compromise on 4NF or dependency preservation. B.2 Join Dependencies We have seen that the lossless-join property is one of several properties of a good database design. Indeed, this property is essential: Without it, information is lost. When we restrict the set of legal relations to those satisfying a set of functional and multivalued dependencies, we are able to use these dependencies to show that certain decompositions are lossless-join decompositions. Because of the importance of the concept of lossless join, it is useful to be able to constrain the set of legal relations over a schema R to those relations for which a given decomposition is a lossless-join decomposition. In this section, we define such a constraint, called a join dependency. Just as types of dependency led to other normal forms, join dependencies will lead to a normal form called project-join normal form (PJNF). B.2.1 Definition of Join Dependencies Let R be a relation schema and R1, R2,..., Rn be a decomposition of R. The join dependency *(R1, R2,..., Rn) is used to restrict the set of legal relations to those for which R1, R2,..., Rn is a lossless-join decomposition of R. Formally, if R = R1 ∪ R2 ∪ ... ∪ Rn, we say that a relation r(R) satisfies the join dependency *(R1, R2,..., Rn) if r = R1 (r) ✶ R2 (r) ✶ ··· ✶ Rn (r) A join dependency is trivial if one of the Ri is R itself. A B C G H I a1 b1 c1 g1 h1 i i 1 a2 b1 c2 g2 h2 2 Figure B.2 A relation r(R) that does not satisfy B →→ HI.
Appendix B Advanced Relational Database Design R1-R2 R1∩R2 ΠR,(G) a...ai a+1…0j ΠR(t) b1·.b 4i+1·.。a R1∩R2 R2-R1 ΠR2() ai+1···aj aj+1···an ΠR2(t2) a+1·· bi+1...bn Figure B.3 I(r)and I(r). Consider the join dependency *(R,R2)on schema R.This dependency re- quires that,for all legal r(R), r=ΠR(r))凶ΠR2(r) Let r contain the two tuples h and f2,defined as follows: t[R1-R2]=(a1,a2,,a)t2[R1-2]=(b1,b2,,b) [RnR]=(a+1,,a)[R∩R]=(ai+1,,0j) t[R2-R]=(aj+1,,an)[R2-R]=(bj+1,,bn) Thus,h[Rn R2]t2[Rn R2],but h and t2 have different values on all other attributes.Let us compute IIR(r)IIR,(r).Figure B.3 shows IIR(r)and IIR(r).When we compute the join,we get two tuples in addition to h and t2, shown by t3 and ta in Figure B.4. If *(R1.R2)holds,then,whenever we have tuples h and b2,we must also have t3 and t4.Thus,Figure B.4 shows a tabular representation of the join dependency *(R1.R2).Compare Figure B.4 with Figure 8.13,in which we gave a tabular representation of oB.If we let o=Rin R2 and B R1,then we can see that the two tabular representations in these figures are the same.Indeed,*(R1,R2) is just another way of stating Rin R2R1.Using the complementation and augmentation rules for multivalued dependencies,we can show that Rin R2 →R1 implies Rin R2-→→ R2.Thus,*(R1,Rz)is equivalent to Rin R2→→R2: This observation is not surprising in light of the fact we noted earlier that Ri and R1-R2 R1∩R2 R2-R1 t a1...a 01+1···a ai+1···am t2 b1… bi ai+1...a bj+1...On a1··a a+1..·a bi+1...bn b1...b a+1···a g+1··an Figure B.4 Tabular representation of *(R1.R2)
6 Appendix B Advanced Relational Database Design ΠR1 (t1) ΠR1 (t2) ΠR2 (t1) ΠR2 (t2) R1 R2 R1 R2 R1 – R2 a1 . . . ai b1 . . . bi ai + 1 . . . aj ai + 1 . . . aj ai + 1 . . . aj ai + 1 . . . aj aj + 1 . . . an bj + 1 . . . bn R2 – R1 Figure B.3 R1 (r) and R2 (r). Consider the join dependency *(R1, R2) on schema R. This dependency requires that, for all legal r(R), r = R1 (r) ✶ R2 (r) Let r contain the two tuples t1 and t2, defined as follows: t1[R1 − R2] = (a1, a2,..., ai) t2[R1 − R2] = (b1, b2,..., bi) t1[R1 ∩ R2] = (ai + 1,..., a j) t2[R1 ∩ R2] = (ai + 1,..., a j) t1[R2 − R1] = (a j + 1,..., an) t2[R2 − R1] = (b j + 1,..., bn) Thus, t1[R1 ∩ R2] = t2[R1 ∩ R2], but t1 and t2 have different values on all other attributes. Let us compute R1 (r) ✶ R2 (r). Figure B.3 shows R1 (r) and R2 (r). When we compute the join, we get two tuples in addition to t1 and t2, shown by t3 and t4 in Figure B.4. If *(R1, R2) holds, then, whenever we have tuples t1 and t2, we must also have t3 and t4. Thus, Figure B.4 shows a tabular representation of the join dependency *(R1, R2). Compare Figure B.4 with Figure 8.13, in which we gave a tabular representation of →→ . If we let = R1 ∩ R2 and = R1, then we can see that the two tabular representations in these figures are the same. Indeed, *(R1, R2) is just another way of stating R1 ∩ R2 →→ R1. Using the complementation and augmentation rules for multivalued dependencies, we can show that R1 ∩ R2 → → R1 implies R1 ∩ R2 →→ R2. Thus, *(R1, R2) is equivalent to R1 ∩ R2 →→ R2. This observation is not surprising in light of the fact we noted earlier that R1 and R1 R1 R2 – R2 R2 – R1 a1 . . . ai t1 t2 b1 . . . bi ai + 1 . . . aj ai + 1 . . . aj a1 . . . ai t3 t4 b1 . . . bi aj + 1 . . . an bj + 1 . . . bn bj + 1 . . . bn aj + 1 . . . an ai + 1 . . . aj ai + 1 . . . aj Figure B.4 Tabular representation of *(R1, R2).
B.2 Join Dependencies 7 AB C a1 b1 C2 a2 b ci a1 b2 ci a1 b1 C1 Figure B.5 Tabular representation of*((A.B).(B.C).(A.C)). R2 form a lossless-join decomposition of R if and only if Rn R2 R2 or R∩R2→→R1. Every join dependency of the form*(R1,R2)is therefore equivalent to a multi- valued dependency.However,there are join dependencies that are not equivalent to any multivalued dependency.The simplest example of such a dependency is on schema R =(A,B,C).The join dependency *(A,B),(B,C),(A,C) is not equivalent to any collection of multivalued dependencies.Figure B.5 shows a tabular representation of this join dependency.To see that no set of multivalued dependencies logically implies*((A,B).(B.C),(A,C)),we consider Figure B.5 as a relation r(A,B,C),as in Figure B.6.Relation r satisfies the join dependency ((A.B),(B,C),(A,C)),as we can verify by computing nAB(r)凶ΠBc(r)凶ΠAc(r) and by showing that the result is exactly r.However,r does not satisfy any nontrivial multivalued dependency.To see that it does not,we verify that r fails to satisfy any of A→B,A→→C,B→A,B→→C,C→→A,orC→→B. Just as a multivalued dependency is a way of stating the independence of a pair of relationships,a join dependency is a way of stating that the members of a set of relationships are all independent.This notion of independence of relationships is a natural consequence of the way that we generally define a relation.Consider Loan info schema =(branch name,customer name,loan number,amount) a b C2 a2 a1 b2 C1 a1 b1 C1 Figure B.6 Relation r (A.B,C)
B.2 Join Dependencies 7 ABC 1 b1 c2 2 b1 c1 1 b2 c1 1 b a a a a 1 c1 Figure B.5 Tabular representation of *((A, B), (B, C), (A, C)). R2 form a lossless-join decomposition of R if and only if R1 ∩ R2 →→ R2 or R1 ∩ R2 →→ R1. Every join dependency of the form *(R1, R2) is therefore equivalent to a multivalued dependency. However, there are join dependencies that are not equivalent to any multivalued dependency. The simplest example of such a dependency is on schema R = (A, B, C). The join dependency *((A, B), (B, C), (A, C)) is not equivalent to any collection of multivalued dependencies. Figure B.5 shows a tabular representation of this join dependency. To see that no set of multivalued dependencies logically implies *((A, B), (B, C), (A, C)), we consider Figure B.5 as a relation r (A, B, C), as in Figure B.6. Relation r satisfies the join dependency *((A, B), (B, C), (A, C)), as we can verify by computing AB (r) ✶ BC (r) ✶ AC (r) and by showing that the result is exactly r. However, r does not satisfy any nontrivial multivalued dependency. To see that it does not, we verify that r fails to satisfy any of A →→ B, A →→ C, B →→ A, B →→ C, C →→ A, or C →→ B. Just as a multivalued dependency is a way of stating the independence of a pair of relationships, a join dependency is a way of stating that the members of a set of relationships are all independent. This notion of independence of relationships is a natural consequence of the way that we generally define a relation. Consider Loan info schema = (branch name, customer name, loan number, amount) ABC 1 b1 c2 2 b1 c1 1 b2 c1 1 b a a a a 1 c1 Figure B.6 Relation r (A, B, C).
Appendix B Advanced Relational Database Design from our banking example.We can define a relation loan_info (Loan info_schema)as the set of all tuples on Loan_info schema such that The loan represented by loan number is made by the branch named branch name. The loan represented by loan-number is made to the customer named customer 1a1e. The loan represented by loan number is in the amount given by amount. The preceding definition of the loan_info relation is a conjunction of three predi- cates:one on loan number and branch name,one on loan number and customer name, and one on loan_number and amount.Surprisingly,it can be shown that the preced- ing intuitive definition of loan_info logically implies the join dependency *((loan number,branch-name),(loan-number,customer name),(loan_number,amount)). Thus,join dependencies have an intuitive appeal and correspond to one of our three criteria for a good database design. For functional and multivalued dependencies,we were able to give a system of inference rules that are sound and complete.Unfortunately,no such set of rules is known for join dependencies.It appears that we must consider more general classes of dependencies than join dependencies to construct a sound and complete set of inference rules.The bibliographical notes contain references to research in this area. B.2.2 Project-Join Normal Form Project-join normal form (PINF)is defined in the same way as BCNF and 4NF,except that join dependencies are used.A relation schema R is in PJNF with respect to a set D of functional,multivalued,and join dependencies if,for all join dependencies in D+of the form*(R1,R2....,R),where each Ri Rand R R1 UR2U...U R, at least one of the following holds: *(R1,R2,....Rn)is a trivial join dependency. Every Ri is a superkey for R. A database design is in PINF if each member of the set of relation schemas that constitutes the design is in PJNF.PJNF is called fifth normal form(5NF)in some of the literature on database normalization. Consider again our banking example.Given the join dependency *((loan number,branch name),(loan number,customer name),(loan number,amount)),Loan info schema is not in PJNF.To put Loan info schema into PINF,we must decompose it into the three schemas specified by the join dependency:(loan umber,branch name),(loan number,customer name),and (loan number,amount). Because every multivalued dependency is also a join dependency,it is easy to see that every PINF schema is also in 4NF.Thus,in general,we may not be able to find a dependency-preserving decomposition into PINF for a given schema
8 Appendix B Advanced Relational Database Design from our banking example. We can define a relation loan info (Loan info schema) as the set of all tuples on Loan info schema such that • The loan represented by loan number is made by the branch named branch name. • The loan represented by loan number is made to the customer named customer name. • The loan represented by loan number is in the amount given by amount. The preceding definition of the loan info relation is a conjunction of three predicates: one on loan number and branch name, one on loan number and customer name, and one on loan number and amount. Surprisingly, it can be shown that the preceding intuitive definition of loan info logically implies the join dependency *((loan number, branch name), (loan number, customer name), (loan number, amount)). Thus, join dependencies have an intuitive appeal and correspond to one of our three criteria for a good database design. For functional and multivalued dependencies, we were able to give a system of inference rules that are sound and complete. Unfortunately, no such set of rules is known for join dependencies. It appears that we must consider more general classes of dependencies than join dependencies to construct a sound and complete set of inference rules. The bibliographical notes contain references to research in this area. B.2.2 Project-Join Normal Form Project-join normal form (PJNF) is defined in the same way as BCNF and 4NF, except that join dependencies are used. A relation schema R is in PJNF with respect to a set D of functional, multivalued, and join dependencies if, for all join dependencies in D+ of the form *(R1, R2,..., Rn), where each Ri ⊆ Rand R = R1 ∪ R2 ∪ ... ∪ Rn, at least one of the following holds: • *(R1, R2,..., Rn) is a trivial join dependency. • Every Ri is a superkey for R. A database design is in PJNF if each member of the set of relation schemas that constitutes the design is in PJNF. PJNF is called fifth normal form (5NF) in some of the literature on database normalization. Consider again our banking example. Given the join dependency *((loan number, branch name), (loan number, customer name), (loan number, amount)), Loan info schema is not in PJNF. To put Loan info schema into PJNF, we must decompose it into the three schemas specified by the join dependency: (loan number, branch name), (loan number, customer name), and (loan number, amount). Because every multivalued dependency is also a join dependency, it is easy to see that every PJNF schema is also in 4NF. Thus, in general, we may not be able to find a dependency-preserving decomposition into PJNF for a given schema
B.3 Domain-Key Normal Form 9 B.3 Domain-Key Normal Form The approach we have taken to normalization is to define a form of constraint (functional,multivalued,or join dependency),and then to use that form of con- straint to define a normal form.Domain-key normal form(DKNF)is based on three notions 1.Domain declaration.Let A be an attribute,and let dom be a set of values. The domain declaration A dom requires that the A value of all tuples be values in dom. 2.Key declaration.Let R be a relation schema with KR.The key declaration key(K)requires that K be a superkey for schema R-that is,K-R.Note that all key declarations are functional dependencies but not all functional dependencies are key declarations. 3.General constraint.A general constraint is a predicate on the set of all re- lations on a given schema.The dependencies that we have studied in this chapter are examples of general constraints.In general,a general constraint is a predicate expressed in some agreed-on form,such as first-order logic. We now give an example of a general constraint that is not a functional, multivalued,or join dependency.Suppose that all accounts whose account number begins with the digit 9 are special high-interest accounts with a minimum balance of $2500.Then,we include as a general constraint,"If the first digit of t[account number]is9,then t[balance]≥2500.” Domain declarations and key declarations are easy to test in a practical database system.General constraints,however,may be extremely costly(in time and space)to test.The purpose of a DKNF database design is to allow us to test the general constraints using only domain and key constraints. Formally,let D be a set of domain constraints and let K be a set of key constraints for a relation schema R.Let G denote the general constraints for R. Schema R is in DKNF if D UK logically implies G. Let us return to the general constraint that we gave on accounts.The constraint implies that our database design is not in DKNF.To create a DKNF design,we need two schemas in place of Account_schema: Regular acct schema (account number,branch name,balance) Special acct schema =(account number,branch name,balance) We retain all the dependencies that we had on Account schema as general con- straints.The domain constraints for Specialacct-schema require that,for each ac- count, The account number begins with 9. The balance is greater than 2500
B.3 Domain-Key Normal Form 9 B.3 Domain-Key Normal Form The approach we have taken to normalization is to define a form of constraint (functional, multivalued, or join dependency), and then to use that form of constraint to define a normal form. Domain-key normal form (DKNF) is based on three notions. 1. Domain declaration. Let A be an attribute, and let dom be a set of values. The domain declaration A ⊆ dom requires that the A value of all tuples be values in dom. 2. Key declaration. Let R be a relation schema with K ⊆ R. The key declaration key (K) requires that K be a superkey for schema R—that is, K → R. Note that all key declarations are functional dependencies but not all functional dependencies are key declarations. 3. General constraint. A general constraint is a predicate on the set of all relations on a given schema. The dependencies that we have studied in this chapter are examples of general constraints. In general, a general constraint is a predicate expressed in some agreed-on form, such as first-order logic. We now give an example of a general constraint that is not a functional, multivalued, or join dependency. Suppose that all accounts whose account number begins with the digit 9 are special high-interest accounts with a minimum balance of $2500. Then, we include as a general constraint, “If the first digit of t[account number] is 9, then t[balance] ≥ 2500.” Domain declarations and key declarations are easy to test in a practical database system. General constraints, however, may be extremely costly (in time and space) to test. The purpose of a DKNF database design is to allow us to test the general constraints using only domain and key constraints. Formally, let D be a set of domain constraints and let K be a set of key constraints for a relation schema R. Let G denote the general constraints for R. Schema R is in DKNF if D ∪ K logically implies G. Let us return to the general constraint that we gave on accounts. The constraint implies that our database design is not in DKNF. To create a DKNF design, we need two schemas in place of Account schema: Regular acct schema = (account number, branch name, balance) Special acct schema = (account number, branch name, balance) We retain all the dependencies that we had on Account schema as general constraints. The domain constraints for Special acct schema require that, for each account, • The account number begins with 9. • The balance is greater than 2500
10 Appendix B Advanced Relational Database Design The domain constraints for Regular acct schema require that the account number does not begin with 9.The resulting design is in DKNF,although the proof of this fact is beyond the scope of this text. Let us compare DKNF to the other normal forms that we have studied.Under the other normal forms,we did not take into consideration domain constraints.We assumed(implicitly)that the domain of each attribute was some infinite domain, such as the set of all integers or the set of all character strings.We allowed key constraints (indeed,we allowed functional dependencies).For each normal form,we allowed a restricted form of general constraint (a set of functional, multivalued,or join dependencies).Thus,we can rewrite the definitions of PINF, 4NF,BCNF,and 3NF in a manner that shows them to be special cases of DKNF. We now present a DKNF-inspired rephrasing of our definition of PINF.Let R =(A1,A2,....A)be a relation schema.Let dom(A)denote the domain of attribute Ai,and let all these domains be infinite.Then all domain constraints D are of the form A dom(A).Let the general constraints be a set G of functional, multivalued,or join dependencies.If F is the set of functional dependencies in G, let the set K of key constraints be those nontrivial functional dependencies in F+ of the form a-R.Schema R is in PJNF if and only if it is in DKNF with respect to D,K,and G. A consequence of DKNF is that all insertion and deletion anomalies are elimi- nated. DKNF represents an"ultimate"normal form because it allows arbitrary con- straints,rather than dependencies,yet it allows efficient testing of these con- straints.Of course,if a schema is not in DKNF,we may be able to achieve DKNF via decomposition,but such decompositions,as we have seen,are not always dependency-preserving decompositions.Thus,although DKNF is a goal of a database designer,it may have to be sacrificed in a practical design. B.4 Summary In this chapter we presented the theory of multivalued dependencies,including a set of sound and complete inference rules for multivalued dependencies. We then presented two more normal forms based on more general classes of constraints.Join dependencies are a generalization of multivalued dependencies, and lead to the definition of PINF.DKNF is an idealized normal form that may be difficult to achieve in practice.Yet DKNF has desirable properties that should be included to the extent possible in a good database design. Exercises B.1 List all the nontrivial multivalued dependencies satisfied by the relation in Figure B.7
10 Appendix B Advanced Relational Database Design The domain constraints for Regular acct schema require that the account number does not begin with 9. The resulting design is in DKNF, although the proof of this fact is beyond the scope of this text. Let us compare DKNF to the other normal forms that we have studied. Under the other normal forms, we did not take into consideration domain constraints.We assumed (implicitly) that the domain of each attribute was some infinite domain, such as the set of all integers or the set of all character strings. We allowed key constraints (indeed, we allowed functional dependencies). For each normal form, we allowed a restricted form of general constraint (a set of functional, multivalued, or join dependencies). Thus, we can rewrite the definitions of PJNF, 4NF, BCNF, and 3NF in a manner that shows them to be special cases of DKNF. We now present a DKNF-inspired rephrasing of our definition of PJNF. Let R = (A1, A2,..., An) be a relation schema. Let dom(Ai) denote the domain of attribute Ai , and let all these domains be infinite. Then all domain constraints D are of the form Ai ⊆ dom(Ai). Let the general constraints be a set G of functional, multivalued, or join dependencies. If F is the set of functional dependencies in G, let the set K of key constraints be those nontrivial functional dependencies in F + of the form → R. Schema R is in PJNF if and only if it is in DKNF with respect to D, K, and G. A consequence of DKNF is that all insertion and deletion anomalies are eliminated. DKNF represents an “ultimate” normal form because it allows arbitrary constraints, rather than dependencies, yet it allows efficient testing of these constraints. Of course, if a schema is not in DKNF, we may be able to achieve DKNF via decomposition, but such decompositions, as we have seen, are not always dependency-preserving decompositions. Thus, although DKNF is a goal of a database designer, it may have to be sacrificed in a practical design. B.4 Summary In this chapter we presented the theory of multivalued dependencies, including a set of sound and complete inference rules for multivalued dependencies. We then presented two more normal forms based on more general classes of constraints. Join dependencies are a generalization of multivalued dependencies, and lead to the definition of PJNF. DKNF is an idealized normal form that may be difficult to achieve in practice. Yet DKNF has desirable properties that should be included to the extent possible in a good database design. Exercises B.1 List all the nontrivial multivalued dependencies satisfied by the relation in Figure B.7.