正在加载图片...
The garbage collection approach to storage reclamation quantity of components together make these problems more offers several advantages over eager deletion.First,it is the norm than the exception:we cannot completely trust simple and reliable in a large-scale distributed system where the machines,nor can we completely trust the disks.Com- component failures are common.Chunk creation may suc- ponent failures can result in an unavailable system or,worse ceed on some chunkservers but not others,leaving replicas corrupted data.We discuss how we meet these challenges that the master does not know exist.Replica deletion mes- and the tools we have built into the system to diagnose prob- sages may be lost,and the master has to remember to resend lems when they inevitably occur. them across failures,both its own and the chunkserver's. Garbage collection provides a uniform and dependable way 5.1 High Availability to clean up any replicas not known to be useful.Second. Among hundreds of servers in a GFS cluster,some are it merges storage reclamation into the regular background bound to be unavailable at any given time.We keep the activities of the master,such as the regular scans of names- overall system highly available with two simple yet effective paces and handshakes with chunkservers.Thus,it is done strategies:fast recovery and replication. in batches and the cost is amortized.Moreover,it is done only when the master is relatively free.The master can re- spond more promptly to client requests that demand timely 5.1.1 Fast Recovery attention.Third,the delay in reclaiming storage provides a Both the master and the chunkserver are designed to re- safety net against accidental,irreversible deletion. store their state and start in seconds no matter how they In our experience,the main disadvantage is that the delay terminated.In fact,we do not distinguish between normal sometimes hinders user effort to fine tune usage when stor- and abnormal termination:servers are routinely shut down age is tight.Applications that repeatedly create and delete just by killing the process.Clients and other servers experi- temporary files may not be able to reuse the storage right ence a minor hiccup as they time out on their outstanding away.We address these issues by expediting storage recla- requests,reconnect to the restarted server,and retry.Sec- mation if a deleted file is explicitly deleted again.We also tion 6.2.2 reports observed startup times. allow users to apply different replication and reclamation policies to different parts of the namespace.For example, 5.1.2 Chunk Replication users can specify that all the chunks in the files within some As discussed earlier,each chunk is replicated on multiple directory tree are to be stored without replication,and any chunkservers on different racks.Users can specify different deleted files are immediately and irrevocably removed from replication levels for different parts of the file namespace. the file system state. The default is three.The master clones existing replicas as 4.5 Stale Replica Detection needed to keep each chunk fully replicated as chunkservers go offline or detect corrupted replicas through checksum ver- Chunk replicas may become stale if a chunkserver fails ification (see Section 5.2).Although replication has served and misses mutations to the chunk while it is down.For us well,we are exploring other forms of cross-server redun- each chunk.the master maintains a chunk version number dancy such as parity or erasure codes for our increasing read- to distinguish between up-to-date and stale replicas only storage requirements.We expect that it is challenging Whenever the master grants a new lease on a chunk,it but manageable to implement these more complicated re- increases the chunk version number and informs the up-to- dundancy schemes in our very loosely coupled system be- date replicas.The master and these replicas all record the cause our traffic is dominated by appends and reads rather new version number in their persistent state.This occurs than small random writes. before any client is notified and therefore before it can start writing to the chunk.If another replica is currently unavail- 5.1.3 Master Replication able,its chunk version number will not be advanced.The master will detect that this chunkserver has a stale replica The master state is replicated for reliability.Its operation when the chunkserver restarts and reports its set of chunks log and checkpoints are replicated on multiple machines.A mutation to the state is considered committed only after and their associated version numbers.If the master sees a its log record has been flushed to disk locally and on all version number greater than the one in its records,the mas- ter assumes that it failed when granting the lease and so master replicas.For simplicity,one master process remains takes the higher version to be up-to-date. in charge of all mutations as well as background activities such as garbage collection that change the system internally The master removes stale replicas in its regular garbage When it fails,it can restart almost instantly.If its machine collection.Before that,it effectively considers a stale replica not to exist at all when it replies to client requests for chunk or disk fails,monitoring infrastructure outside GFS starts a information. As another safeguard,the master includes new master process elsewhere with the replicated operation log.Clients use only the canonical name of the master (e.g. the chunk version number when it informs clients which chunkserver holds a lease on a chunk or when it instructs gfs-test),which is a DNS alias that can be changed if the master is relocated to another machine. a chunkserver to read the chunk from another chunkserver Moreover,"shadow"masters provide read-only access to in a cloning operation.The client or the chunkserver verifies the file system even when the primary master is down.They the version number when it performs the operation so that it is always accessing up-to-date data. are shadows,not mirrors,in that they may lag the primary slightly,typically fractions of a second.They enhance read availability for files that are not being actively mutated or 5.FAULT TOLERANCE AND DIAGNOSIS applications that do not mind getting slightly stale results One of our greatest challenges in designing the system is In fact,since file content is read from chunkservers,appli- dealing with frequent component failures.The quality and cations do not observe stale file content.What could beThe garbage collection approach to storage reclamation offers several advantages over eager deletion. First, it is simple and reliable in a large-scale distributed system where component failures are common. Chunkcreation may suc￾ceed on some chunkservers but not others, leaving replicas that the master does not know exist. Replica deletion mes￾sages may be lost, and the master has to remember to resend them across failures, both its own and the chunkserver’s. Garbage collection provides a uniform and dependable way to clean up any replicas not known to be useful. Second, it merges storage reclamation into the regular background activities of the master, such as the regular scans of names￾paces and handshakes with chunkservers. Thus, it is done in batches and the cost is amortized. Moreover, it is done only when the master is relatively free. The master can re￾spond more promptly to client requests that demand timely attention. Third, the delay in reclaiming storage provides a safety net against accidental, irreversible deletion. In our experience, the main disadvantage is that the delay sometimes hinders user effort to fine tune usage when stor￾age is tight. Applications that repeatedly create and delete temporary files may not be able to reuse the storage right away. We address these issues by expediting storage recla￾mation if a deleted file is explicitly deleted again. We also allow users to apply different replication and reclamation policies to different parts of the namespace. For example, users can specify that all the chunks in the files within some directory tree are to be stored without replication, and any deleted files are immediately and irrevocably removed from the file system state. 4.5 Stale Replica Detection Chunkreplicas may become stale if a chunkserver fails and misses mutations to the chunkwhile it is down. For each chunk, the master maintains a chunk version number to distinguish between up-to-date and stale replicas. Whenever the master grants a new lease on a chunk, it increases the chunkversion number and informs the up-to￾date replicas. The master and these replicas all record the new version number in their persistent state. This occurs before any client is notified and therefore before it can start writing to the chunk. If another replica is currently unavail￾able, its chunkversion number will not be advanced. The master will detect that this chunkserver has a stale replica when the chunkserver restarts and reports its set of chunks and their associated version numbers. If the master sees a version number greater than the one in its records, the mas￾ter assumes that it failed when granting the lease and so takes the higher version to be up-to-date. The master removes stale replicas in its regular garbage collection. Before that, it effectively considers a stale replica not to exist at all when it replies to client requests for chunk information. As another safeguard, the master includes the chunkversion number when it informs clients which chunkserver holds a lease on a chunk or when it instructs a chunkserver to read the chunk from another chunkserver in a cloning operation. The client or the chunkserver verifies the version number when it performs the operation so that it is always accessing up-to-date data. 5. FAULT TOLERANCE AND DIAGNOSIS One of our greatest challenges in designing the system is dealing with frequent component failures. The quality and quantity of components together make these problems more the norm than the exception: we cannot completely trust the machines, nor can we completely trust the disks. Com￾ponent failures can result in an unavailable system or, worse, corrupted data. We discuss how we meet these challenges and the tools we have built into the system to diagnose prob￾lems when they inevitably occur. 5.1 High Availability Among hundreds of servers in a GFS cluster, some are bound to be unavailable at any given time. We keep the overall system highly available with two simple yet effective strategies: fast recovery and replication. 5.1.1 Fast Recovery Both the master and the chunkserver are designed to re￾store their state and start in seconds no matter how they terminated. In fact, we do not distinguish between normal and abnormal termination; servers are routinely shut down just by killing the process. Clients and other servers experi￾ence a minor hiccup as they time out on their outstanding requests, reconnect to the restarted server, and retry. Sec￾tion 6.2.2 reports observed startup times. 5.1.2 Chunk Replication As discussed earlier, each chunkis replicated on multiple chunkservers on different racks. Users can specify different replication levels for different parts of the file namespace. The default is three. The master clones existing replicas as needed to keep each chunk fully replicated as chunkservers go offline or detect corrupted replicas through checksum ver￾ification (see Section 5.2). Although replication has served us well, we are exploring other forms of cross-server redun￾dancy such as parity or erasure codes for our increasing read￾only storage requirements. We expect that it is challenging but manageable to implement these more complicated re￾dundancy schemes in our very loosely coupled system be￾cause our traffic is dominated by appends and reads rather than small random writes. 5.1.3 Master Replication The master state is replicated for reliability. Its operation log and checkpoints are replicated on multiple machines. A mutation to the state is considered committed only after its log record has been flushed to disklocally and on all master replicas. For simplicity, one master process remains in charge of all mutations as well as background activities such as garbage collection that change the system internally. When it fails, it can restart almost instantly. If its machine or diskfails, monitoring infrastructure outside GFS starts a new master process elsewhere with the replicated operation log. Clients use only the canonical name of the master (e.g. gfs-test), which is a DNS alias that can be changed if the master is relocated to another machine. Moreover, “shadow” masters provide read-only access to the file system even when the primary master is down. They are shadows, not mirrors, in that they may lag the primary slightly, typically fractions of a second. They enhance read availability for files that are not being actively mutated or applications that do not mind getting slightly stale results. In fact, since file content is read from chunkservers, appli￾cations do not observe stale file content. What could be
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有