正在加载图片...
Figure 3(a)shows the aggregate read rate for N clients Cluster B and its theoretical limit.The limit peaks at an aggregate of Chunkservers 342 227 125 MB/s when the 1 Gbps link between the two switches Available disk space 72 TB 180TB is saturated,or 12.5 MB/s per client when its 100 Mbps Used disk space 55 TB 155TB Number of Files 735 K 737 network interface gets saturated,whichever applies.The Number of Dead files 22 232 observed read rate is 10 MB/s,or 80%of the per-client Number of Chunks 992 k 1550 limit,when just one client is reading.The aggregate read Metadata at chunkservers 13 GB 21 GB rate reaches 94 MB/s,about 75%of the 125 MB/s link limit. Metadata at master 48 MB 60 MB for 16 readers,or 6 MB/s per client.The efficiency drops from 80%to 75%because as the number of readers increases, so does the probability that multiple readers simultaneously Table 2:Characteristics of two GFS clusters read from the same chunkserver 6.1.2 Writes longer and continuously generate and process multi-TB data sets with only occasional human intervention.In both cases. N clients write simultaneously to N distinct files.Each a single "task"consists of many processes on many machines client writes 1 GB of data to a new file in a series of 1 MB reading and writing many files simultaneously. writes.The aggregate write rate and its theoretical limit are shown in Figure 3(b).The limit plateaus at 67 MB/s be- 6.2.1 Storage cause we need to write each byte to 3 of the 16 chunkservers, As shown by the first five entries in the table,both clusters each with a 12.5 MB/s input connection. have hundreds of chunkservers,support many TBs of disk The write rate for one client is 6.3 MB/s,about half of the space,and are fairly but not completely full."Used space" limit.The main culprit for this is our network stack.It does includes all chunk replicas.Virtually all files are replicated not interact very well with the pipelining scheme we use for three times.Therefore,the clusters store 18 TB and 52 TB pushing data to chunk replicas.Delays in propagating data of file data respectively. from one replica to another reduce the overall write rate. The two clusters have similar numbers of files,though B Aggregate write rate reaches 35 MB/s for 16 clients (or has a larger proportion of dead files,namely files which were 2.2 MB/s per client),about half the theoretical limit.As in deleted or replaced by a new version but whose storage have the case of reads,it becomes more likely that multiple clients not yet been reclaimed.It also has more chunks because its write concurrently to the same chunkserver as the number files tend to be larger. of clients increases.Moreover,collision is more likely for 16 writers than for 16 readers because each write involves three 6.2.2 Metadata different replicas The chunkservers in aggregate store tens of GBs of meta- Writes are slower than we would like.In practice this has data,mostly the checksums for 64 KB blocks of user data not been a major problem because even though it increases The only other metadata kept at the chunkservers is the the latencies as seen by individual clients,it does not sig- chunk version number discussed in Section 4.5. nificantly affect the aggregate write bandwidth delivered by The metadata kept at the master is much smaller,only the system to a large number of clients. tens of MBs,or about 100 bytes per file on average.This 6.1.3 Record Appends agrees with our assumption that the size of the master's memory does not limit the system's capacity in practice Figure 3(c)shows record append performance.N clients Most of the per-file metadata is the file names stored in a append simultaneously to a single file.Performance is lim- prefix-compressed form.Other metadata includes file own- ited by the network bandwidth of the chunkservers that ership and permissions,mapping from files to chunks,and store the last chunk of the file,independent of the num- each chunk's current version.In addition.for each chunk we ber of clients.It starts at 6.0 MB/s for one client and drops store the current replica locations and a reference count for to 4.8 MB/s for 16 clients,mostly due to congestion and implementing copy-on-write. variances in network transfer rates seen by different clients Each individual server,both chunkservers and the master, Our applications tend to produce multiple such files con- has only 50 to 100 MB of metadata.Therefore recovery is currently.In other words,N clients append to M shared fast:it takes only a few seconds to read this metadata from files simultaneously where both N and M are in the dozens disk before the server is able to answer queries.However,the or hundreds.Therefore,the chunkserver network congestion master is somewhat hobbled for a period-typically 30 to in our experiment is not a significant issue in practice be- 60 seconds-until it has fetched chunk location information cause a client can make progress on writing one file while from all chunkservers. the chunkservers for another file are busy 6.2.3 Read and Write Rates 6.2 Real World Clusters Table 3 shows read and write rates for various time pe We now examine two clusters in use within Google that riods.Both clusters had been up for about one week when are representative of several others like them.Cluster A is these measurements were taken.(The clusters had been used regularly for research and development by over a hun- restarted recently to upgrade to a new version of GFS. dred engineers.A typical task is initiated by a human user The average write rate was less than 30 MB/s since the and runs up to several hours.It reads through a few MBs restart.When we took these measurements,B was in the to a few TBs of data,transforms or analyzes the data,and middle of a burst of write activity generating about 100 MB/s writes the results back to the cluster.Cluster B is primarily of data.which produced a 300 MB/s network load because used for production data processing.The tasks last much writes are propagated to three replicas.Figure 3(a) shows the aggregate read rate for N clients and its theoretical limit. The limit peaks at an aggregate of 125 MB/s when the 1 Gbps linkbetween the two switches is saturated, or 12.5 MB/s per client when its 100 Mbps networkinterface gets saturated, whichever applies. The observed read rate is 10 MB/s, or 80% of the per-client limit, when just one client is reading. The aggregate read rate reaches 94 MB/s, about 75% of the 125 MB/s linklimit, for 16 readers, or 6 MB/s per client. The efficiency drops from 80% to 75% because as the number of readers increases, so does the probability that multiple readers simultaneously read from the same chunkserver. 6.1.2 Writes N clients write simultaneously to N distinct files. Each client writes 1 GB of data to a new file in a series of 1 MB writes. The aggregate write rate and its theoretical limit are shown in Figure 3(b). The limit plateaus at 67 MB/s be￾cause we need to write each byte to 3 of the 16 chunk servers, each with a 12.5 MB/s input connection. The write rate for one client is 6.3 MB/s, about half of the limit. The main culprit for this is our networkstack. It does not interact very well with the pipelining scheme we use for pushing data to chunkreplicas. Delays in propagating data from one replica to another reduce the overall write rate. Aggregate write rate reaches 35 MB/s for 16 clients (or 2.2 MB/s per client), about half the theoretical limit. As in the case of reads, it becomes more likely that multiple clients write concurrently to the same chunkserver as the number of clients increases. Moreover, collision is more likely for 16 writers than for 16 readers because each write involves three different replicas. Writes are slower than we would like. In practice this has not been a major problem because even though it increases the latencies as seen by individual clients, it does not sig￾nificantly affect the aggregate write bandwidth delivered by the system to a large number of clients. 6.1.3 Record Appends Figure 3(c) shows record append performance. N clients append simultaneously to a single file. Performance is lim￾ited by the networkbandwidth of the chunkservers that store the last chunkof the file, independent of the num￾ber of clients. It starts at 6.0 MB/s for one client and drops to 4.8 MB/s for 16 clients, mostly due to congestion and variances in networktransfer rates seen by different clients. Our applications tend to produce multiple such files con￾currently. In other words, N clients append to M shared files simultaneously where both N and M are in the dozens or hundreds. Therefore, the chunkserver network congestion in our experiment is not a significant issue in practice be￾cause a client can make progress on writing one file while the chunkservers for another file are busy. 6.2 Real World Clusters We now examine two clusters in use within Google that are representative of several others like them. Cluster A is used regularly for research and development by over a hun￾dred engineers. A typical taskis initiated by a human user and runs up to several hours. It reads through a few MBs to a few TBs of data, transforms or analyzes the data, and writes the results backto the cluster. Cluster B is primarily used for production data processing. The tasks last much Cluster A B Chunkservers 342 227 Available disk space 72 TB 180 TB Used disk space 55 TB 155 TB Number of Files 735 k 737 k Number of Dead files 22 k 232 k Number of Chunks 992 k 1550 k Metadata at chunkservers 13 GB 21 GB Metadata at master 48 MB 60 MB Table 2: Characteristics of two GFS clusters longer and continuously generate and process multi-TB data sets with only occasional human intervention. In both cases, a single “task” consists of many processes on many machines reading and writing many files simultaneously. 6.2.1 Storage As shown by the first five entries in the table, both clusters have hundreds of chunkservers, support many TBs of disk space, and are fairly but not completely full. “Used space” includes all chunkreplicas. Virtually all files are replicated three times. Therefore, the clusters store 18 TB and 52 TB of file data respectively. The two clusters have similar numbers of files, though B has a larger proportion of dead files, namely files which were deleted or replaced by a new version but whose storage have not yet been reclaimed. It also has more chunks because its files tend to be larger. 6.2.2 Metadata The chunkservers in aggregate store tens of GBs of meta￾data, mostly the checksums for 64 KB blocks of user data. The only other metadata kept at the chunkservers is the chunkversion number discussed in Section 4.5. The metadata kept at the master is much smaller, only tens of MBs, or about 100 bytes per file on average. This agrees with our assumption that the size of the master’s memory does not limit the system’s capacity in practice. Most of the per-file metadata is the file names stored in a prefix-compressed form. Other metadata includes file own￾ership and permissions, mapping from files to chunks, and each chunk’s current version. In addition, for each chunk we store the current replica locations and a reference count for implementing copy-on-write. Each individual server, both chunkservers and the master, has only 50 to 100 MB of metadata. Therefore recovery is fast: it takes only a few seconds to read this metadata from diskbefore the server is able to answer queries. However, the master is somewhat hobbled for a period – typically 30 to 60 seconds – until it has fetched chunklocation information from all chunkservers. 6.2.3 Read and Write Rates Table 3 shows read and write rates for various time pe￾riods. Both clusters had been up for about one weekwhen these measurements were taken. (The clusters had been restarted recently to upgrade to a new version of GFS.) The average write rate was less than 30 MB/s since the restart. When we tookthese measurements, B was in the middle of a burst of write activity generating about 100 MB/s of data, which produced a 300 MB/s networkload because writes are propagated to three replicas
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有