Chapter 13:Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Database System Concepts-5th Edition,Aug 27,2005. 13.2 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 13.2 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions
Basic Steps in Query Processing 1.Parsing and translation 2.Optimization 3.Evaluation parser and relational algebra query translator expression optimizer query output evaluation engine execution plan data statistics about data Database System Concepts-5th Edition,Aug 27,2005. 13.3 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 13.3 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation
Basic Steps in Query Processing (Cont.) Parsing and translation translate the query into its internal form.This is then translated into relational algebra. Parser checks syntax,verifies relations Evaluation The query-execution engine takes a query-evaluation plan,executes that plan,and returns the answers to the query. Database System Concepts-5th Edition,Aug 27,2005. 13.4 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 13.4 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Basic Steps in Query Processing (Cont.) Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query
Basic Steps in Query Processing Optimization A relational algebra expression may have many equivalent expressions E.g.,balance<2500(Ibalance(account))is equivalent to Ibalance(balance<2500(account)) Each relational algebra operation can be evaluated using one of several different algorithms Correspondingly,a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g.,can use an index on balance to find accounts with balance 2500, or can perform complete relation scan and discard accounts with balance≥2500 Database System Concepts-5th Edition,Aug 27,2005. 13.5 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 13.5 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Basic Steps in Query Processing : Optimization A relational algebra expression may have many equivalent expressions E.g., balance2500(balance(account)) is equivalent to balance(balance2500(account)) Each relational algebra operation can be evaluated using one of several different algorithms Correspondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g., can use an index on balance to find accounts with balance < 2500, or can perform complete relation scan and discard accounts with balance 2500
Basic Steps:Optimization (Cont.) Query Optimization:Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated using statistical information from the database catalog e.g.number of tuples in each relation,size of tuples,etc. In this chapter we study How to measure query costs Algorithms for evaluating relational algebra operations How to combine algorithms for individual operations in order to evaluate a complete expression In Chapter 14 We study how to optimize queries,that is,how to find an evaluation plan with lowest estimated cost Database System Concepts-5th Edition,Aug 27,2005. 13.6 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 13.6 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Basic Steps: Optimization (Cont.) Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated using statistical information from the database catalog e.g. number of tuples in each relation, size of tuples, etc. In this chapter we study How to measure query costs Algorithms for evaluating relational algebra operations How to combine algorithms for individual operations in order to evaluate a complete expression In Chapter 14 We study how to optimize queries, that is, how to find an evaluation plan with lowest estimated cost
Measures of Query Cost Cost is generally measured as total elapsed time for answering query Many factors contribute to time cost disk accesses,CPU,or even network communication Typically disk access is the predominant cost,and is also relatively easy to estimate.Measured by taking into account Number of seeks average-seek-cost Number of blocks read average-block-read-cost Number of blocks written average-block-write-cost Cost to write a block is greater than cost to read a block -data is read back after being written to ensure that the write was successful Database System Concepts-5th Edition,Aug 27,2005. 13.7 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 13.7 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Measures of Query Cost Cost is generally measured as total elapsed time for answering query Many factors contribute to time cost disk accesses, CPU, or even network communication Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account Number of seeks * average-seek-cost Number of blocks read * average-block-read-cost Number of blocks written * average-block-write-cost Cost to write a block is greater than cost to read a block – data is read back after being written to ensure that the write was successful
Measures of Query Cost (Cont.) For simplicity we just use the number of block transfers from disk and the number of seeks as the cost measures f-time to transfer one block ts-time for one seek Cost for b block transfers plus S seeks b *t+S *ts We ignore CPU costs for simplicity Real systems do take CPU cost into account We do not include cost to writing output to disk in our cost formulae Several algorithms can reduce disk IO by using extra buffer space Amount of real memory available to buffer depends on other concurrent queries and OS processes,known only during execution We often use worst case estimates,assuming only the minimum amount of memory needed for the operation is available Required data may be buffer resident already,avoiding disk l/O But hard to take into account for cost estimation Database System Concepts-5th Edition,Aug 27,2005. 13.8 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 13.8 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Measures of Query Cost (Cont.) For simplicity we just use the number of block transfers from disk and the number of seeks as the cost measures tT – time to transfer one block tS – time for one seek Cost for b block transfers plus S seeks b * tT + S * tS We ignore CPU costs for simplicity Real systems do take CPU cost into account We do not include cost to writing output to disk in our cost formulae Several algorithms can reduce disk IO by using extra buffer space Amount of real memory available to buffer depends on other concurrent queries and OS processes, known only during execution We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available Required data may be buffer resident already, avoiding disk I/O But hard to take into account for cost estimation
Selection Operation File scan-search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search).Scan each file block and test all records to see whether they satisfy the selection condition. Cost estimate b,block transfers +1 seek b,denotes number of blocks containing records from relation r If selection is on a key attribute,can stop on finding record cost =(b,/2)block transfers +1 seek Linear search can be applied regardless of selection condition or ordering of records in the file,or availability of indices Database System Concepts-5th Edition,Aug 27,2005. 13.9 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 13.9 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Selection Operation File scan – search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition. Cost estimate = br block transfers + 1 seek br denotes number of blocks containing records from relation r If selection is on a key attribute, can stop on finding record cost = (br /2) block transfers + 1 seek Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices
Selection Operation (Cont. A2(binary search).Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate(number of disk blocks to be scanned): cost of locating the first tuple by a binary search on the blocks -log2(b)*(t+ts) If there are multiple records satisfying selection -Add transfer cost of the number of blocks containing records that satisfy selection condition Will see how to estimate this cost in Chapter 14 Database System Concepts-5th Edition,Aug 27,2005. 13.10 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 13.10 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Selection Operation (Cont.) A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate (number of disk blocks to be scanned): cost of locating the first tuple by a binary search on the blocks – log2 (br ) * (tT + tS) If there are multiple records satisfying selection – Add transfer cost of the number of blocks containing records that satisfy selection condition – Will see how to estimate this cost in Chapter 14
Selections Using Indices Index scan -search algorithms that use an index selection condition must be on search-key of index A3(primary index on candidate key,equality).Retrieve a single record that satisfies the corresponding equality condition Cost=(hi+1)*(tT+ts) A4(primary index on nonkey,equality)Retrieve multiple records. Records will be on consecutive blocks Let b number of blocks containing matching records Cost=hi *(tr+ts)+ts+tr*b A5(equality on search-key of secondary index). Retrieve a single record if the search-key is a candidate key Cost =(hi+1)*(tr+ts) Retrieve multiple records if search-key is not a candidate key each of n matching records may be on a different block Cost=(h;+n)*(t+ts) Can be very expensive! Database System Concepts-5th Edition,Aug 27,2005. 13.11 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 13.11 ©Silberschatz, Korth and Sudarshan th Edition, Aug 27, 2005. Selections Using Indices Index scan – search algorithms that use an index selection condition must be on search-key of index. A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition Cost = (hi + 1) * (tT + tS) A4 (primary index on nonkey, equality) Retrieve multiple records. Records will be on consecutive blocks Let b = number of blocks containing matching records Cost = hi * (tT + tS) + tS + tT * b A5 (equality on search-key of secondary index). Retrieve a single record if the search-key is a candidate key Cost = (hi + 1) * (tT + tS) Retrieve multiple records if search-key is not a candidate key each of n matching records may be on a different block Cost = (hi + n) * (tT + tS) – Can be very expensive!