正在加载图片...
Application (file name,chunk index) GFS master /foo/bar GFS client File namespace chunk 2efo (chunk handle, chunk locations) Legend: → Data messages Instructions to chunkserver Control messages Chunkserver state (chunk handle,byte range) GFS chunkserver GFS chunkserver chunk data 0044 Linux file system Linux file system Figure 1:GFS Architecture and replication decisions using global knowledge.However, tent TCP connection to the chunkserver over an extended we must minimize its involvement in reads and writes so period of time.Third,it reduces the size of the metadata that it does not become a bottleneck.Clients never read stored on the master.This allows us to keep the metadata and write file data through the master.Instead,a client asks in memory,which in turn brings other advantages that we the master which chunkservers it should contact.It caches will discuss in Section 2.6.1. this information for a limited time and interacts with the On the other hand,a large chunk size,even with lazy space chunkservers directly for many subsequent operations. allocation,has its disadvantages.A small file consists of a Let us explain the interactions for a simple read with refer- small number of chunks,perhaps just one.The chunkservers ence to Figure 1.First,using the fixed chunk size,the client storing those chunks may become hot spots if many clients translates the file name and byte offset specified by the ap- are accessing the same file.In practice,hot spots have not plication into a chunk index within the file.Then,it sends been a major issue because our applications mostly read the master a request containing the file name and chunk large multi-chunk files sequentially. index.The master replies with the corresponding chunk However,hot spots did develop when GFS was first used handle and locations of the replicas.The client caches this by a batch-queue system:an executable was written to GFS information using the file name and chunk index as the key as a single-chunk file and then started on hundreds of ma- The client then sends a request to one of the replicas, chines at the same time.The few chunkservers storing this most likely the closest one.The request specifies the chunk executable were overloaded by hundreds of simultaneous re- handle and a byte range within that chunk.Further reads quests.We fixed this problem by storing such executables of the same chunk require no more client-master interaction with a higher replication factor and by making the batch- until the cached information expires or the file is reopened. queue system stagger application start times.A potential In fact,the client typically asks for multiple chunks in the long-term solution is to allow clients to read data from other same request and the master can also include the informa- clients in such situations. tion for chunks immediately following those requested.This extra information sidesteps several future client-master in- 2.6 Metadata teractions at practically no extra cost. The master stores three major types of metadata:the file 2.5 Chunk Size and chunk namespaces,the mapping from files to chunks and the locations of each chunk's replicas.All metadata is Chunk size is one of the key design parameters.We have kept in the master's memory.The first two types (names- chosen 64 MB,which is much larger than typical file sys- paces and file-to-chunk mapping)are also kept persistent by tem block sizes.Each chunk replica is stored as a plain logging mutations to an operation log stored on the mas- Linux file on a chunkserver and is extended only as needed. ter's local disk and replicated on remote machines.Using Lazy space allocation avoids wasting space due to internal a log allows us to update the master state simply,reliably, fragmentation,perhaps the greatest objection against such and without risking inconsistencies in the event of a master a large chunk size. crash.The master does not store chunk location informa- A large chunk size offers several important advantages. tion persistently.Instead.it asks each chunkserver about its First,it reduces clients'need to interact with the master chunks at master startup and whenever a chunkserver joins because reads and writes on the same chunk require only the cluster. one initial request to the master for chunk location informa- tion.The reduction is especially significant for our work- 2.6.1 In-Memory Data Structures loads because applications mostly read and write large files Since metadata is stored in memory,master operations are sequentially.Even for small random reads,the client can fast.Furthermore.it is easy and efficient for the master to comfortably cache all the chunk location information for a periodically scan through its entire state in the background. multi-TB working set.Second,since on a large chunk,a This periodic scanning is used to implement chunk garbage client is more likely to perform many operations on a given collection,re-replication in the presence of chunkserver fail- chunk,it can reduce network overhead by keeping a persis- ures,and chunk migration to balance load and disk spaceLegend: Data messages Control messages Application (file name, chunk index) (chunk handle, chunk locations) GFS master File namespace /foo/bar Instructions to chunkserver Chunkserver state GFS chunkserver GFS chunkserver (chunk handle, byte range) chunk data chunk 2ef0 Linux file system Linux file system GFS client Figure 1: GFS Architecture and replication decisions using global knowledge. However, we must minimize its involvement in reads and writes so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations. Let us explain the interactions for a simple read with refer￾ence to Figure 1. First, using the fixed chunksize, the client translates the file name and byte offset specified by the ap￾plication into a chunkindex within the file. Then, it sends the master a request containing the file name and chunk index. The master replies with the corresponding chunk handle and locations of the replicas. The client caches this information using the file name and chunkindex as the key. The client then sends a request to one of the replicas, most likely the closest one. The request specifies the chunk handle and a byte range within that chunk. Further reads of the same chunkrequire no more client-master interaction until the cached information expires or the file is reopened. In fact, the client typically asks for multiple chunks in the same request and the master can also include the informa￾tion for chunks immediately following those requested. This extra information sidesteps several future client-master in￾teractions at practically no extra cost. 2.5 Chunk Size Chunksize is one of the key design parameters. We have chosen 64 MB, which is much larger than typical file sys￾tem blocksizes. Each chunkreplica is stored as a plain Linux file on a chunkserver and is extended only as needed. Lazy space allocation avoids wasting space due to internal fragmentation, perhaps the greatest objection against such a large chunksize. A large chunksize offers several important advantages. First, it reduces clients’ need to interact with the master because reads and writes on the same chunkrequire only one initial request to the master for chunklocation informa￾tion. The reduction is especially significant for our work￾loads because applications mostly read and write large files sequentially. Even for small random reads, the client can comfortably cache all the chunklocation information for a multi-TB working set. Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persis￾tent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadata in memory, which in turn brings other advantages that we will discuss in Section 2.6.1. On the other hand, a large chunksize, even with lazy space allocation, has its disadvantages. A small file consists of a small number of chunks, perhaps just one. The chunkservers storing those chunks may become hot spots if many clients are accessing the same file. In practice, hot spots have not been a major issue because our applications mostly read large multi-chunkfiles sequentially. However, hot spots did develop when GFS was first used by a batch-queue system: an executable was written to GFS as a single-chunkfile and then started on hundreds of ma￾chines at the same time. The few chunkservers storing this executable were overloaded by hundreds of simultaneous re￾quests. We fixed this problem by storing such executables with a higher replication factor and by making the batch￾queue system stagger application start times. A potential long-term solution is to allow clients to read data from other clients in such situations. 2.6 Metadata The master stores three major types of metadata: the file and chunknamespaces, the mapping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory. The first two types (names￾paces and file-to-chunkmapping) are also kept persistent by logging mutations to an operation log stored on the mas￾ter’s local diskand replicated on remote machines. Using a log allows us to update the master state simply, reliably, and without risking inconsistencies in the event of a master crash. The master does not store chunklocation informa￾tion persistently. Instead, it asks each chunkserver about its chunks at master startup and whenever a chunkserver joins the cluster. 2.6.1 In-Memory Data Structures Since metadata is stored in memory, master operations are fast. Furthermore, it is easy and efficient for the master to periodically scan through its entire state in the background. This periodic scanning is used to implement chunkgarbage collection, re-replication in the presence of chunkserver fail￾ures, and chunkmigration to balance load and diskspace
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有