MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by dr Lei chen based on the slides provided by jiawei Han micheline Kamber and Jian pei @2012 Han, Kamber pei. all rights reserved
1 1 MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei © 2012 Han, Kamber & Pei. All rights reserved
Chapter 4: Data Warehousing, On-line Analytical Processing and Data Cube Data Warehouse Basic Concepts a Data Warehouse Modeling: Data Cube and OLAP Data Cube Computation: Preliminary Concepts Data Cube Computation Methods Summary
2 Chapter 4: Data Warehousing, On-line Analytical Processing and Data Cube ◼ Data Warehouse: Basic Concepts ◼ Data Warehouse Modeling: Data Cube and OLAP ◼ Data Cube Computation: Preliminary Concepts ◼ Data Cube Computation Methods ◼ Summary
Aspects of SQL Most common Query language -used in all commercial systems Discussion is based on the SQL92 Standard Commercial products have different features of SQL, but the basic structure is the same Data Manipulation Language Data Definition Language Constraint Specification Embedded SQL Transaction Management Security Management COMP3311 Fall 2011 CSE, HKUST Slide 3
COMP3311 Fall 2011 CSE, HKUST Slide 3 Aspects of SQL ▪ Most common Query Language – used in all commercial systems • Discussion is based on the SQL92 Standard. Commercial products have different features of SQL, but the basic structure is the same ▪ Data Manipulation Language ▪ Data Definition Language ▪ Constraint Specification ▪ Embedded SQL ▪ Transaction Management ▪ Security Management
Basic structure SQL is based on set and relational operations with certain modifications and enhancements a typical SQL query has the form select a1,A2,…,An from r1,R,…R where p A represent attributes R represent relations P is a predicate This query is equivalent to the relational algebra expression: A1.2.An(op(R1×R2×…×Rm) The result of an sQl query is a relation(but may contain duplicates). SQL statements can be nested COMP3311 Fall 2011 CSE, HKUST Slide 4
COMP3311 Fall 2011 CSE, HKUST Slide 4 Basic Structure • SQL is based on set and relational operations with certain modifications and enhancements • A typical SQL query has the form: select A1 , A2 , …, An from R1 , R2 , …, Rm where P - Ai represent attributes - Ri represent relations - P is a predicate. • This query is equivalent to the relational algebra expression: A1, A2, …, An(P (R1 R2 … Rm)) • The result of an SQL query is a relation (but may contain duplicates). SQL statements can be nested
Projection The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query Find the names of all branches in the loan relation select branch-name from loan Equivalent to: I lbranch-name (loan) An asterisk in the select clause denotes all attributes select from loan Note: for our examples we use the tables: Branch(branch-name, branch-city, assets) Customer(customer-name, customer-street, customer-city) Loan(loan- number, amount, branch-name) Account(account-number, balance branch-name Borrower(customer-name, loan-number) Depositor(customer-name account-number COMP3311 Fall 2011 CSE, HKUST Slide 5
COMP3311 Fall 2011 CSE, HKUST Slide 5 Projection • The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query. • Find the names of all branches in the loan relation select branch-name from loan Equivalent to: branch-name(loan) • An asterisk in the select clause denotes “all attributes” select * from loan • Note: for our examples we use the tables: – Branch (branch-name, branch-city, assets) – Customer (customer-name, customer-street, customer-city) – Loan (loan-number, amount, branch-name) – Account (account-number, balance, branch-name) – Borrower (customer-name, loan-number) – Depositor (customer-name, account-number)
Duplicate Removal SQL allows duplicates in relations as well as in query results. Use select distinct to force the elimination of duplicates Find the names of all branches in the loan relation and remove duplicates select distinct branch-name force the dbms to remove duplicates from loan The keyword all specifies that duplicates are not removed select all branch-name force the dbms not from loan to remove duplicates COMP3311 Fall 2011 CSE, HKUST Slide 6
COMP3311 Fall 2011 CSE, HKUST Slide 6 Duplicate Removal • SQL allows duplicates in relations as well as in query results. Use select distinct to force the elimination of duplicates. Find the names of all branches in the loan relation, and remove duplicates select distinct branch-name from loan • The keyword all specifies that duplicates are not removed. select all branch-name from loan force the DBMS to remove duplicates force the DBMS not to remove duplicates
Arithmetic Operations on Retrieved Results The select clause can contain arithmetic expressions involving the operators tr- and x, and operating on constants or attributes of tuples The query: select branch-name, loan-number amount 100 from loan would return a relation which is the same as the loan relations except that the attribute amount is multiplied by 100 COMP3311 Fall 2011 CSE, HKUST Slide 7
COMP3311 Fall 2011 CSE, HKUST Slide 7 Arithmetic Operations on Retrieved Results • The select clause can contain arithmetic expressions involving the operators,+,−, and , and operating on constants or attributes of tuples. • The query: select branch-name, loan-number, amount * 100 from loan would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by 100
The where clause The where clause specifies conditions that tuples in the relations in the from clause must satisfy Find all loan numbers for loans made at the perryridge branch with loan amounts greater than $1200 select loan-number from /oan where branch-name="perryridge"and amount>1200 SQL allows logical connectives and or, and not. arithmetic expressions can be used in the comparison operators Note: attributes used in a query(both select and where parts) must be defined in the relations in the from clause COMP3311 Fall 2011 CSE, HKUST Slide 8
COMP3311 Fall 2011 CSE, HKUST Slide 8 The where Clause • The where clause specifies conditions that tuples in the relations in the from clause must satisfy. • Find all loan numbers for loans made at the Perryridge branch with loan amounts greater than $1200. select loan-number from loan where branch-name=“Perryridge” and amount >1200 • SQL allows logical connectives and, or, and not. Arithmetic expressions can be used in the comparison operators. • Note: attributes used in a query (both select and where parts) must be defined in the relations in the from clause
The where Clause( cont SQL includes the between operator for convenience Find the loan number of those loans with loan amounts between 90,000and$100.000 that is,≥$90,000and≤$100,000) select loan-number from loan Where amount between 90000 and 100000 COMP3311 Fall 2011 CSE, HKUST Slide 9
COMP3311 Fall 2011 CSE, HKUST Slide 9 The where Clause (Cont.) • SQL includes the between operator for convenience. • Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is, $90,000 and $100,000) select loan-number from loan where amount between 90000 and 100000
The from Clause The from clause corresponds to the cartesian product operation of the relational algebra Find the cartesian product borrower x loan select x from borrower loan It is rarely used without a where clause Find the name and loan number of all customers having a loan at the perryridge branch select distinct customer-name borrower loan - number from borrower, loan where borrower,loan-number=loan, Joan-number and branch-name ="Perryridge COMP3311 Fall 2011 CSE, HKUST Slide 10
COMP3311 Fall 2011 CSE, HKUST Slide 10 The from Clause • The from clause corresponds to the Cartesian product operation of the relational algebra. • Find the Cartesian product borrower loan select * from borrower, loan It is rarely used without a where clause. • Find the name and loan number of all customers having a loan at the Perryridge branch. select distinct customer-name, borrower.loan-number from borrower, loan where borrower.loan-number = loan.loan-number and branch-name = “Perryridge