《数据库系统概念 Database System Concepts》原书教学资源（第七版，PPT课件讲稿，英文版）Chapter 22 Parallel and Distributed Query Processing

▪ Overview ▪ Parallel Sort ▪ Parallel Join ▪ Other Operations ▪ Parallel Evaluation of Query Plans ▪ Query Processing on Shared Memory ▪ Query Optimization ▪ Distributed Query Processing

团购合买资源类别：文库，文档格式：PPTX，文档页数：76，文件大小：2.24MB

Chapter 22:Parallel And Distributed Query Processing ■Overview ■Parallel Sort ■Parallel Join ■Other Operations Parallel Evaluation of Query Plans Query Processing on Shared Memory Query Optimization Distributed Query Processing Database System Concepts-7th Edition 22.2 @Silberschatz,Korth and Sudarshan

Database System Concepts - 7 22.2 ©Silberschatz, Korth and Sudarshan th Edition Chapter 22: Parallel And Distributed Query Processing ▪ Overview ▪ Parallel Sort ▪ Parallel Join ▪ Other Operations ▪ Parallel Evaluation of Query Plans ▪ Query Processing on Shared Memory ▪ Query Optimization ▪ Distributed Query Processing

Parallel Query Processing Different queries/transactions can be run in parallel with each other. Interquery parallelism Concurrency control takes care of conflicts in case of updates More on parallel transaction processing in Chapter 23 Focus in this chapter is on read-only queries Individual relational operations (e.g.,sort,join,aggregation)can be executed in parallel data can be partitioned and each processor can work independently on its own partition. Queries are expressed in high level language (SQL,translated to relational algebra) makes parallelization easier. Database System Concepts-7th Edition 22.3 ©Silberscha乜，Korth and Sudarshan

Database System Concepts - 7 22.3 ©Silberschatz, Korth and Sudarshan th Edition Parallel Query Processing ▪ Different queries/transactions can be run in parallel with each other. • Interquery parallelism • Concurrency control takes care of conflicts in case of updates • More on parallel transaction processing in Chapter 23 • Focus in this chapter is on read-only queries ▪ Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel • data can be partitioned and each processor can work independently on its own partition. ▪ Queries are expressed in high level language (SQL, translated to relational algebra) • makes parallelization easier

Intraquery Parallelism Intraquery parallelism:execution of a single query in parallel on multiple processors/disks;important for speeding up long-running queries. Two complementary forms of intraquery parallelism: Intraoperation Parallelism-parallelize the execution of each individual operation in the query Supports high degree of parallelism Interoperation Parallelism-execute the different operations in a query expression in parallel. Limited degree of parallelism Database System Concepts-7th Edition 22.4 ©Silberscha乜，Korth and Sudarshan

Database System Concepts - 7 22.4 ©Silberschatz, Korth and Sudarshan th Edition Intraquery Parallelism ▪ Intraquery parallelism: execution of a single query in parallel on multiple processors/disks; important for speeding up long-running queries. ▪ Two complementary forms of intraquery parallelism: • Intraoperation Parallelism– parallelize the execution of each individual operation in the query ▪ Supports high degree of parallelism • Interoperation Parallelism– execute the different operations in a query expression in parallel. ▪ Limited degree of parallelism

Parallel Processing of Relational Operations Our discussion of parallel algorithms assumes: ·read-only queries shared-nothing architecture ·n nodes,N1,,Nn Each assumed to have disks and processors. Initial focus on parallelization to a shared-nothing node Parallel processing within a shared memory/shared disk node discussed later Shared-nothing architectures can be efficiently simulated on shared- memory and shared-disk systems. Algorithms for shared-nothing systems can thus be run on shared-memory and shared-disk systems. However,some optimizations may be possible. Database System Concepts-7th Edition 22.5 ©Silberscha乜，Korth and Sudarshan

Database System Concepts - 7 22.5 ©Silberschatz, Korth and Sudarshan th Edition Parallel Processing of Relational Operations ▪ Our discussion of parallel algorithms assumes: • read-only queries • shared-nothing architecture • n nodes, N1 , ..., Nn ▪ Each assumed to have disks and processors. • Initial focus on parallelization to a shared-nothing node ▪ Parallel processing within a shared memory/shared disk node discussed later • Shared-nothing architectures can be efficiently simulated on sharedmemory and shared-disk systems. ▪ Algorithms for shared-nothing systems can thus be run on shared-memory and shared-disk systems. ▪ However, some optimizations may be possible

Intraoperation Parallelism Database System Concepts -7th Edition 22.6 @Silberschatz,Korth and Sudarshan

Database System Concepts - 7 22.6 ©Silberschatz, Korth and Sudarshan th Edition Intraoperation Parallelism

Range Partitioning Redistribute using partitioning vector: 100,175,300,500,800 0-100) [100-175)[175-300)[300-500)[500-800)[800-1000) Database System Concepts-7th Edition 22.7 @Silberschatz,Korth and Sudarshan

Database System Concepts - 7 22.7 ©Silberschatz, Korth and Sudarshan th Edition Range Partitioning

Range-Partitioning Parallel Sort 1)Redistribute using partitioning vector: 100.175,300,500.800 0-100) [100-175) [175-300)[300-500)[500-800)[800-1000) 2)(External)sort locally at each node 3)Merge if output required at one node Database System Concepts-7th Edition 22.8 @Silberschatz,Korth and Sudarshan

Database System Concepts - 7 22.8 ©Silberschatz, Korth and Sudarshan th Edition Range-Partitioning Parallel Sort

Parallel Sort Local Sort Local Sort Merge r Local Sort Local Sort Merge Local Sort Local Sort Merge : Local Sort Local Sort Merge Im 1.Range Partition 2.Local Sort 1.Local Sort 2.Range Partition and Merge (a)Range Partitioning Sort (b)Parallel External Sort-Merge Database System Concepts-7th Edition 22.9 @Silberschatz,Korth and Sudarshan

Parallel Sort Range-Partitioning Sort ■ Choose nodes N,...,Nmm,where m n-1 to do sorting. Create range-partition vector with m-1 entries,on the sorting attributes Redistribute the relation using range partitioning Each node N sorts its partition of the relation locally. Example of data parallelism:each node executes same operation in parallel with other nodes,without any interaction with the others. Final merge operation is trivial:range-partitioning ensures that, if i<j,all key values in node Nare all less than all key values in N Database System Concepts-7th Edition 22.10 ©Silberscha乜，Korth and Sudarshan

Database System Concepts - 7 22.10 ©Silberschatz, Korth and Sudarshan th Edition Parallel Sort Range-Partitioning Sort ▪ Choose nodes N1 , ..., Nm, where m  n -1 to do sorting. ▪ Create range-partition vector with m-1 entries, on the sorting attributes ▪ Redistribute the relation using range partitioning ▪ Each node Ni sorts its partition of the relation locally. • Example of data parallelism: each node executes same operation in parallel with other nodes, without any interaction with the others. ▪ Final merge operation is trivial: range-partitioning ensures that, if i < j, all key values in node Ni are all less than all key values in Nj

Parallel Sort (Cont.) Parallel External Sort-Merge Assume the relation has already been partitioned among nodes N1,...N (in whatever manner). Each node N locally sorts the data (using local disk as required) The sorted runs on each node are then merged in parallel: The sorted partitions at each node Ni are range-partitioned across the processors N1,...,N. Each node Ni performs a merge on the streams as they are received,to get a single sorted run. The sorted runs on nodes N1,...,N are concatenated to get the final result. Algorithm as described vulnerable to execution skew all nodes send to node 1,then all nodes send data to node 2, Can be modified so each node sends data to all other nodes in parallel (block at a time) Database System Concepts-7th Edition 22.11 ©Silberscha乜，Korth and Sudarshan

Database System Concepts - 7 22.11 ©Silberschatz, Korth and Sudarshan th Edition Parallel Sort (Cont.) Parallel External Sort-Merge ▪ Assume the relation has already been partitioned among nodes N1 , ..., Nn (in whatever manner). ▪ Each node Ni locally sorts the data (using local disk as required) ▪ The sorted runs on each node are then merged in parallel: • The sorted partitions at each node Ni are range-partitioned across the processors N1 , ..., Nn . • Each node Ni performs a merge on the streams as they are received, to get a single sorted run. • The sorted runs on nodes N1 , ..., Nn are concatenated to get the final result. ▪ Algorithm as described vulnerable to execution skew • all nodes send to node 1, then all nodes send data to node 2, … • Can be modified so each node sends data to all other nodes in parallel (block at a time)

点击进入文档下载页（PPTX格式）

共76页，可试读20页，点击继续阅读 ↓↓

点击下载（PPTX格式）

浏览记录