正在加载图片...
For example,we have relaxed GFS's consistency model to 2.2 Interface vastly simplify the file system without imposing an onerous GFS provides a familiar file system interface,though it burden on the applications.We have also introduced an does not implement a standard API such as POSIX.Files are atomic append operation so that multiple clients can append organized hierarchically in directories and identified by path- concurrently to a file without extra synchronization between names.We support the usual operations to create,delete, them.These will be discussed in more details later in the open,close,read,and write files. paper Moreover,GFS has snapshot and record append opera- Multiple GFS clusters are currently deployed for different tions.Snapshot creates a copy of a file or a directory tree purposes.The largest ones have over 1000 storage nodes, at low cost.Record append allows multiple clients to ap- over 300 TB of disk storage,and are heavily accessed by pend data to the same file concurrently while guaranteeing hundreds of clients on distinct machines on a continuous the atomicity of each individual client's append.It is use- basis ful for implementing multi-way merge results and producer- consumer queues that many clients can simultaneously ap- 2. DESIGN OVERVIEW pend to without additional locking.We have found these types of files to be invaluable in building large distributed 2.1 Assumptions applications.Snapshot and record append are discussed fur- In designing a file system for our needs,we have been ther in Sections 3.4 and 3.3 respectively. guided by assumptions that offer both challenges and op- portunities.We alluded to some key observations earlier 2.3 Architecture and now lay out our assumptions in more details. A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients,as shown The system is built from many inexpensive commodity in Figure 1.Each of these is typically a commodity Linux components that often fail.It must constantly monitor machine running a user-level server process.It is easy to run itself and detect,tolerate,and recover promptly from both a chunkserver and a client on the same machine,as long component failures on a routine basis. as machine resources permit and the lower reliability caused by running possibly flaky application code is acceptable. The system stores a modest number of large files.We Files are divided into fixed-size chunks.Each chunk is expect a few million files,each typically 100 MB or identified by an immutable and globally unique 64 bit chunk larger in size.Multi-GB files are the common case and should be managed efficiently.Small files must be handle assigned by the master at the time of chunk creation supported,but we need not optimize for them. Chunkservers store chunks on local disks as Linux files and read or write chunk data specified by a chunk handle and The workloads primarily consist of two kinds of reads: byte range.For reliability,each chunk is replicated on multi- large streaming reads and small random reads.In ple chunkservers.By default,we store three replicas,though large streaming reads,individual operations typically users can designate different replication levels for different read hundreds of KBs.more commonly 1 MB or more. regions of the file namespace. Successive operations from the same client often read The master maintains all file system metadata.This in- through a contiguous region of a file.A small ran- cludes the namespace,access control information,the map- dom read typically reads a few KBs at some arbitrary ping from files to chunks,and the current locations of chunks. offset.Performance-conscious applications often batch It also controls system-wide activities such as chunk lease and sort their small reads to advance steadily through management,garbage collection of orphaned chunks,and the file rather than go back and forth. chunk migration between chunkservers.The master peri- odically communicates with each chunkserver in HeartBeat The workloads also have many large,sequential writes messages to give it instructions and collect its state. that append data to files.Typical operation sizes are GFS client code linked into each application implements similar to those for reads.Once written,files are sel- the file system API and communicates with the master and dom modified again.Small writes at arbitrary posi- chunkservers to read or write data on behalf of the applica- tions in a file are supported but do not have to be tion.Clients interact with the master for metadata opera- efficient. tions,but all data-bearing communication goes directly to The system must efficiently implement well-defined se- the chunkservers.We do not provide the POSIX API and therefore need not hook into the Linux vnode layer. mantics for multiple clients that concurrently append Neither the client nor the chunkserver caches file data. to the same file.Our files are often used as producer- consumer queues or for many-way merging.Hundreds Client caches offer little benefit because most applications of producers,running one per machine,will concur- stream through huge files or have working sets too large to be cached.Not having them simplifies the client and rently append to a file.Atomicity with minimal syn- chronization overhead is essential.The file may be the overall system by eliminating cache coherence issues. (Clients do cache metadata,however.)Chunkservers need read later,or a consumer may be reading through the file simultaneously. not cache file data because chunks are stored as local files and so Linux's buffer cache already keeps frequently accessed High sustained bandwidth is more important than low data in memory. latency.Most of our target applications place a pre- mium on processing data in bulk at a high rate,while 2.4 Single Master few have stringent response time requirements for an Having a single master vastly simplifies our design and individual read or write. enables the master to make sophisticated chunk placementFor example, we have relaxed GFS’s consistency model to vastly simplify the file system without imposing an onerous burden on the applications. We have also introduced an atomic append operation so that multiple clients can append concurrently to a file without extra synchronization between them. These will be discussed in more details later in the paper. Multiple GFS clusters are currently deployed for different purposes. The largest ones have over 1000 storage nodes, over 300 TB of diskstorage, and are heavily accessed by hundreds of clients on distinct machines on a continuous basis. 2. DESIGN OVERVIEW 2.1 Assumptions In designing a file system for our needs, we have been guided by assumptions that offer both challenges and op￾portunities. We alluded to some key observations earlier and now lay out our assumptions in more details. • The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis. • The system stores a modest number of large files. We expect a few million files, each typically 100 MB or larger in size. Multi-GB files are the common case and should be managed efficiently. Small files must be supported, but we need not optimize for them. • The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. In large streaming reads, individual operations typically read hundreds of KBs, more commonly 1 MB or more. Successive operations from the same client often read through a contiguous region of a file. A small ran￾dom read typically reads a few KBs at some arbitrary offset. Performance-conscious applications often batch and sort their small reads to advance steadily through the file rather than go backand forth. • The workloads also have many large, sequential writes that append data to files. Typical operation sizes are similar to those for reads. Once written, files are sel￾dom modified again. Small writes at arbitrary posi￾tions in a file are supported but do not have to be efficient. • The system must efficiently implement well-defined se￾mantics for multiple clients that concurrently append to the same file. Our files are often used as producer￾consumer queues or for many-way merging. Hundreds of producers, running one per machine, will concur￾rently append to a file. Atomicity with minimal syn￾chronization overhead is essential. The file may be read later, or a consumer may be reading through the file simultaneously. • High sustained bandwidth is more important than low latency. Most of our target applications place a pre￾mium on processing data in bulkat a high rate, while few have stringent response time requirements for an individual read or write. 2.2 Interface GFS provides a familiar file system interface, though it does not implement a standard API such as POSIX. Files are organized hierarchically in directories and identified by path￾names. We support the usual operations to create, delete, open, close, read, and write files. Moreover, GFS has snapshot and record append opera￾tions. Snapshot creates a copy of a file or a directory tree at low cost. Record append allows multiple clients to ap￾pend data to the same file concurrently while guaranteeing the atomicity of each individual client’s append. It is use￾ful for implementing multi-way merge results and producer￾consumer queues that many clients can simultaneously ap￾pend to without additional locking. We have found these types of files to be invaluable in building large distributed applications. Snapshot and record append are discussed fur￾ther in Sections 3.4 and 3.3 respectively. 2.3 Architecture A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients, as shown in Figure 1. Each of these is typically a commodity Linux machine running a user-level server process. It is easy to run both a chunkserver and a client on the same machine, as long as machine resources permit and the lower reliability caused by running possibly flaky application code is acceptable. Files are divided into fixed-size chunks. Each chunkis identified by an immutable and globally unique 64 bit chunk handle assigned by the master at the time of chunkcreation. Chunkservers store chunks on local disks as Linux files and read or write chunkdata specified by a chunkhandle and byte range. For reliability, each chunkis replicated on multi￾ple chunkservers. By default, we store three replicas, though users can designate different replication levels for different regions of the file namespace. The master maintains all file system metadata. This in￾cludes the namespace, access control information, the map￾ping from files to chunks, and the current locations of chunks. It also controls system-wide activities such as chunklease management, garbage collection of orphaned chunks, and chunkmigration between chunkservers. The master peri￾odically communicates with each chunkserver in HeartBeat messages to give it instructions and collect its state. GFS client code linked into each application implements the file system API and communicates with the master and chunkservers to read or write data on behalf of the applica￾tion. Clients interact with the master for metadata opera￾tions, but all data-bearing communication goes directly to the chunkservers. We do not provide the POSIX API and therefore need not hookinto the Linux vnode layer. Neither the client nor the chunkserver caches file data. Client caches offer little benefit because most applications stream through huge files or have working sets too large to be cached. Not having them simplifies the client and the overall system by eliminating cache coherence issues. (Clients do cache metadata, however.) Chunkservers need not cache file data because chunks are stored as local files and so Linux’s buffer cache already keeps frequently accessed data in memory. 2.4 Single Master Having a single master vastly simplifies our design and enables the master to make sophisticated chunk placement
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有