Computer Networks-Project 3 Congestion Control Project3TA:FanJiachen07302010028@fudan.edu.cn ZhuZhu07302010096(@fudan.edu.cn Assigned: May 23, 2011 1 Overview In this assignment, you will implement a BitTorrent-like file transfer application. The application will run on top of UDP, and you will need to implement a reliability and congestion control protocol(similar to TCP) for the application The application will be able to simultaneously download different parts, called"chunks", of a file from different servers. Please remember to read the complete assignment handout more than once so that you know exactly what is being provided and what functionality you are expected to add The project consists of a mandatory component that will be used for grading, and an optional optimi=ation com- ponent. The group/groups whose application performs the fastest file transfers (by selecting good peers from whom to download, while still performing proper congestion control) will receive high score anp.T his is a group project and you MAY find AT MOST partners to work with. Of course, you can complete this project own 1.1 Help Sessions, Checkpoints and Deadlines The timeline for the project is below, including several checkpoints. To help you pace your work, remember that checkpoints represent a date by which you should easily have completed the required functionality. Given the timeline, you can see that this means you should get started now! The late policy is explained on the course website Date Description Assignment handed out. PLEASE START EARLY May 27 Checkpoint 1: WHOHAS flooding and IHAVE responses Checkpoint 2: Simple Chunk Download with stop-and-wait Checkpiont 3: Sliding window flow-control with reliability June 15 Checkpoint 4: Simple Congestion Avoidance, with cwnd= I after any loss June 19 Early bird deadline with 10 bonus points(full functionality) June 26 Late deadline with no penalty(also extra credit) by 23: 59 There are four mandatory checkpoints. Each checkpoint is worth 10 points
Computer Networks - Project 3 Congestion Control Project 3 TA: Fan Jiachen 07302010028@fudan.edu.cn Zhu Zhu 07302010096@fudan.edu.cn Assigned: May 23, 2011 1 Overview In this assignment, you will implement a BitTorrent-like file transfer application. The application will run on top of UDP, and you will need to implement a reliability and congestion control protocol (similar to TCP) for the application. The application will be able to simultaneously download different parts, called ―chunks‖, of a file from different servers. Please remember to read the complete assignment handout more than once so that you know exactly what is being provided and what functionality you are expected to add. The project consists of a mandatory component that will be used for grading, and an optional optimization component. The group/groups whose application performs the fastest file transfers (by selecting good peers from whom to download, while still performing proper congestion control) will receive high score. This is a group project and you MAY find AT MOST 2 partners to work with. Of course, you can complete this project on your own. 1.1 Help Sessions, Checkpoints and Deadlines The timeline for the project is below, including several checkpoints. To help you pace your work, remember that checkpoints represent a date by which you should easily have completed the required functionality. Given the timeline, you can see that this means you should get started now! The late policy is explained on the course website. Date Description May 23 Assignment handed out. PLEASE START EARLY! May 27 Checkpoint 1: WHOHAS flooding and IHAVE responses June 3 Checkpoint 2: Simple Chunk Download with stop-and-wait June 8 Checkpiont 3: Sliding window flow-control with reliability June 15 Checkpoint 4: Simple Congestion Avoidance, with cwnd = 1 after any loss June 19 Early bird deadline with 10 bonus points (full functionality) June 26 Late deadline with no penalty (also extra credit) by 23:59 There are four mandatory checkpoints. Each checkpoint is worth 10 points
Figure 1: Diagram of bittorrent chunking and torrents: Bittorrent takes a large file and breaks it down into separate chunks which can be downloaded from different"peers". Chunks are identified by a"hash-value" which is the result of computing a well-known hash function over the data in the chunk. When a client wants to download a file, it first grabs a"torrent"file, which contains all of the hash values for the desired data file. The torrent lets the client know what chunks to request from other peers in the network 2 Where to get help a big part of being a good programmer is learning how to be reso urceful during the development process. The first places to look for help are(1)carefully re-reading the assignment, (2) looking at the project 3 website for updates and the FAQ, (3)scanning previous bulletin board posts, and (4) googling any standard compiler or script error mes- ages. If you still have a question AFTER doing this, general questions should be posted to the class bulletin board, FDU SOFTWARE, we are happy to help. Welcome to the Multimedia and Networking Lab(Software Building Room108) to discuss the problems of this project 3 Project Outline During the course of this project, you will do the following Implement a BitTorrent-like protocol to search for peers and download/upload file parts. Implement flow control and congestion control mechanisms to ensure fair and efficient network utilization. Implement smart optimizations to get the best possible transfer time(extra credit) 4 Proiect specification 4.1 Background This project is loosely based on the BitTorrent Peer-to-Peer(P2P)file transfer protocol. In a traditional file transfer application, the client knows which server has the file, and sends a request to that specific server for the given file. In many P2P file transfer applications, the actual location of the file is unknown, and the file may be present at multiple locations. The client first sends a query to discover which of its many peers have the file it wants, and then retrieves the file from one or more of these peers While p2P services had already become commonplace, Bit Torrent introduced some new concepts that made it ally popular. Firstly BitTorrent splits the file into different"chunks". Each chunk can be downloaded independently of the others, and then the entire collection of chunks is reassembled into the file. In this assignment, you will be using a fixed-size chunk of 512 Kbytes Bit Torrent uses a central"tracker"that tracks which peers have which chunks of a file. a client begins a download by first obtaining a"torrent"file, which lists the information about each chunk of the file. a chunk is identified by the
Original File Chunks Hash( ) ".torrent" = Figure 1: Diagram of bittorrent chunking and torrents: Bittorrent takes a large file and breaks it down into separate chunks which can be downloaded from different ―peers‖. Chunks are identified by a ―hash-value‖, which is the result of computing a well-known hash function over the data in the chunk. When a client wants to download a file, it first grabs a ―torrent‖ file, which contains all of the hash values for the desired data file. The torrent lets the client know what chunks to request from other peers in the network. 2 Where to get help A big part of being a good programmer is learning how to be reso urceful during the development process. The first places to look for help are (1) carefully re-reading the assignment, (2) looking at the project 3 website for updates and the FAQ, (3) scanning previous bulletin board posts, and (4) googling any standard compiler or script error messages. If you still have a question AFTER doing this, general questions should be posted to the class bulletin board, FDU_SOFTWARE, we are happy to help.Welcome to the Multimedia and Networking Lab(Software Building Room108) to discuss the problems of this project. 3 Project Outline During the course of this project, you will do the following: • Implement a BitTorrent-like protocol to search for peers and download/upload file parts. • Implement flow control and congestion control mechanisms to ensure fair and efficient network utilization. • Implement smart optimizations to get the best possible transfer time (extra credit). 4 Project specification 4.1 Background This project is loosely based on the BitTorrent Peer-to-Peer (P2P) file transfer protocol. In a traditional file transfer application, the client knows which server has the file, and sends a request to that specific server for the given file. In many P2P file transfer applications, the actual location of the file is unknown, and the file may be present at multiple locations. The client first sends a query to discover which of its many peers have the file it wants, and then retrieves the file from one or more of these peers. While P2P services had already become commonplace, BitTorrent introduced some new concepts that made it really popular. Firstly BitTorrent splits the file into different ―chunks‖. Each chunk can be downloaded independently of the others, and then the entire collection of chunks is reassembled into the file. In this assignment, you will be using a fixed-size chunk of 512 Kbytes. BitTorrent uses a central ―tracker‖ that tracks which peers have which chunks of a file. A client begins a download by first obtaining a ―.torrent‖ file, which lists the information about each chunk of the file. A chunk is identified by the 2
cryptographic hash of its contents; after a client has downloaded a chunk, it must compute the cryptographic hash to determine whether it obtained the right chunk or not. See Figure 1 To download a particular chunk, the receiving peer obtains from the tracker a list of peers that contain the chunk, and then directly contacts one of those peers to begin the download. BitTorrent uses a "rarest-chunk-first heuristic where it tries to fetch the rarest chunk first. The peer can download/upload four different chunks in parallel YoucanreadmoreabouttheBittorrenTprotocoldetailsfromhttp://www.bittorrent.org/beps/bep\ 0003.html. Bram Cohen, its originator also wrote a paper on the design decisions behind Bit Torrent. The paper is availableathttp://www.bittorrent.org/bittorrentecon.p This project departs from real Bit Torrent in several ways Instead of implementing a tracker server, your peers will flood the network to find which peers have which chunks of a file. Each peer will know the identities of every other peer in the network; you do not have to o simplify set-up and testing, all file data is actually accessed from a single master data file". Peers are igured with a file to tell them what chunks from this file they"own"upon startup You do not have to implement BitTorrent 's incentive based mechanism to encourage good uploaders and dis- courage bad ones But the project adds one complexity: BitTorrent obtains chunks using TCP. Your application will obtain them using UDP, and you will have to implement congestion control and reliability. It is a good idea to review congestion control concepts, particularly TCP, from both lecture and the textbook(Peterson Davie Section 6.3) 4.2 Programming guidelines Your peer must be written in the C programming language, no C++ or STL is allowed. You must use UDP for all the communication for control and data transfer. Your code must compile and run correctly on andrew linux machines Refer to slides from past recitations on designing modular code, editing makefiles, using subversion, and debugging As with project 1, your implementation should be single-threaded For network programming, you are not allowed to use any custom socket classes, only the standard libsocket and csapp libraries. We will provide a hashing library, and you may use public code for basic data structures, but not any code performing higher-level functionality. These guidelines are similar to project 1, except that you may freely use any code from your project (even if you switched partners). However, all code you do not freshly write for this assignment must be clearly documented in the REAdMe 4.3 Provided files Your starter code includes: hups im pl- This file emulates a network topology using topo map(see Section 7 sha. [ch]-The SHA-1 hash generator input buffer. [ch]-Handle user input debug. [ch]-helpful utilities for debugging output bt-parse. [ch]-utilities for parsing commandline arguments peer c-A skeleton peer file. Handles some of the setup and processing for you nodes. map-provides the list of peers in the network topo. map-the hidden network topology used by hupsim pl. This should be interpreted only by the hupsim. pl, our code should not read this file. You may need to modify this file when using hupsim pl to test the conges avoidance part of your program
cryptographic hash of its contents; after a client has downloaded a chunk, it must compute the cryptographic hash to determine whether it obtained the right chunk or not. See Figure 1. To download a particular chunk, the receiving peer obtains from the tracker a list of peers that contain the chunk, and then directly contacts one of those peers to begin the download. BitTorrent uses a ―rarest-chunk-first‖ heuristic where it tries to fetch the rarest chunk first. The peer can download/upload four different chunks in parallel. You can read more about the BitTorrent protocol details from http://www.bittorrent.org/beps/bep\ 0003.html. Bram Cohen, its originator also wrote a paper on the design decisions behind BitTorrent. The paper is available at http://www.bittorrent.org/bittorrentecon.pdf. This project departs from real BitTorrent in several ways: • Instead of implementing a tracker server, your peers will flood the network to find which peers have which chunks of a file. Each peer will know the identities of every other peer in the network; you do not have to implement routing. • To simplify set-up and testing, all file data is actually accessed from a single ―master data file‖. Peers are configured with a file to tell them what chunks from this file they ―own‖ upon startup. • You do not have to implement BitTorrent‘s incentive based mechanism to encourage good uploaders and discourage bad ones. But the project adds one complexity: BitTorrent obtains chunks using TCP. Your application will obtain them using UDP, and you will have to implement congestion control and reliability. It is a good idea to review congestion control concepts, particularly TCP, from both lecture and the textbook (Peterson & Davie Section 6.3). 4.2 Programming Guidelines Your peer must be written in the C programming language, no C++ or STL is allowed. You must use UDP for all the communication for control and data transfer. Your code must compile and run correctly on andrew linux machines. Refer to slides from past recitations on designing modular code, editing makefiles, using subversion, and debugging. As with project 1, your implementation should be single-threaded. For network programming, you are not allowed to use any custom socket classes, only the standard libsocket and csapp libraries. We will provide a hashing library, and you may use public code for basic data structures, but not any code performing higher-level functionality. These guidelines are similar to project 1, except that you may freely use any code from your project1 (even if you switched partners). However, all code you do not freshly write for this assignment must be clearly documented in the README. 4.3 Provided Files Your starter code includes: • hupsim.pl - This file emulates a network topology using topo.map (see Section 7) • sha.[ch] - The SHA-1 hash generator • input buffer.[ch] - Handle user input • debug.[ch] - helpful utilities for debugging output • bt parse.[ch] - utilities for parsing commandline arguments. • peer.c - A skeleton peer file. Handles some of the setup and processing for you. • nodes.map - provides the list of peers in the network • topo.map - the hidden network topology used by hupsim.pl. This should be interpreted only by the hupsim.pl, your code should not read this file. You may need to modify this file when using hupsim.pl to test the congestion avoidance part of your program. 3
make-chunks-program to create new chunk files given an input file that contains chunk-id, hash pairs, useful for creating more larger flle download scenarios 4.4 Terminology master-data-file- The input file that contains alL the data in the network, All nodes will have access to this file, but a peer should only read the chunks that it"owns".A peer owns a chunk if the chunk id and hash was listed in that peer's has-chunk-file, or if the peer has already downloaded the chunk since starting up. The second case only applies if you choose to implement caching as extra credit master-chunk-file-A file that lists the chunk IDs and corresponding hashes for the chunks in the master data peer-list-file-A file containing list of all the peers in the network. For a sample of the peer-list-file, please look at nodes. map has-chunk-file- A per-node file containing list of chunks that a particular node has at startup. However, a peers will have access to more chunks as they download the chunks from other peers in the network get-chunk-file- A file containing the list of chunk ids and hashes a peer wants to download. This filename is provided by the user when requesting a new download max-downloads- The maximum number of simultaneous connections allowed in each direction( download peer-identity- The identity of the current peer. This should be used by the peer to get its hostname and port from peer-list-file debug- level The level of debug statements that should be printed out by DPRINtFo. For more information, please look at debug. [h, c] 4.5 How the file transfer works The code you write should produce an executable file named"peer". The command line options for the program are eer -p -m i -f -d ain The peer program listens on standard input for commands from the user. The only command is "GET . This instruction from the user should cause your program to open the specified chunks file and attempt to download all of the chunks listed in it (you can assume the file names contain no spaces). When your program finishes downloading the specified file, it should print"GOT "on a line by itself. You do not have to handle multiple concurrent file requests from the user. Our test code will not send another GEt command until the first has completed; you re welcome to do whatever you want internally. The format of different files are given in Section 4.7 To find hosts to download from, the requesting peer sends a"WHOHAS request to all other peers, where is the list of chunk hashes it wants to download. The list specifies the SHA-1 hashes of the chunks it to retrieve. The entire list may be too large to fit into a single UDP packet. You should assume the maximum size for UDP as 1500 bytes. The peer must split the list into multiple Whohas queries if the list is too large single packet. Chunk hashes have a fixed length of 20 bytes. If the file is too large, your client may send out the get equests iteratively, waiting for responses to a gET request's chunks to be downloaded before continuing. For better performance, your client should send these requests in parallel Upon receipt of a WHOHAs query, a peer sends back the list of chunks it contains using the" IHAVE reply. The list again contains the list of hashes for chunks it has. Since the request was made to fit into one packet, the response is guaranteed to fit into a single packet
• make-chunks - program to create new chunk files given an input file that contains chunk-id, hash pairs, useful for creating more larger file download scenarios. 4.4 Terminology • master-data-file - The input file that contains ALL the data in the network. All nodes will have access to this file, but a peer should only read the chunks that it ―owns‖. A peer owns a chunk if the chunk id and hash was listed in that peer‘s has-chunk-file, or if the peer has already downloaded the chunk since starting up. The second case only applies if you choose to implement caching as extra credit. • master-chunk-file - A file that lists the chunk IDs and corresponding hashes for the chunks in the master data file. • peer-list-file - A file containing list of all the peers in the network. For a sample of the peer-list-file, please look at nodes.map. • has-chunk-file - A per-node file containing list of chunks that a particular node has at startup. However, a peers will have access to more chunks as they download the chunks from other peers in the network. • get-chunk-file - A file containing the list of chunk ids and hashes a peer wants to download. This filename is provided by the user when requesting a new download. • max-downloads - The maximum number of simultaneous connections allowed in each direction (download / upload) • peer-identity - The identity of the current peer. This should be used by the peer to get its hostname and port from peer-list-file • debug-level - The level of debug statements that should be printed out by DPRINTF(). For more information, please look at debug.[h,c]. 4.5 How the file transfer works The code you write should produce an executable file named ―peer‖. The command line options for the program are : peer -p -c -m -i -f -d The peer program listens on standard input for commands from the user. The only command is ―GET ‖. This instruction from the user should cause your program t o open the specified chunks file and attempt to download all of the chunks listed in it (you can assume the file names contain no spaces). When your program finishes downloading the specified file, it should print ―GOT ‖ on a line by itself. You do not have to handle multiple concurrent file requests from the user. Our test code will not send another GET command until the first has completed; you‘re welcome to do whatever you want internally. The format of different files are given in Section 4.7. To find hosts to download from, the requesting peer sends a ―WHOHAS ‖ request to all other peers, where is the list of chunk hashes it wants to download. The list specifies the SHA-1 hashes of the chunks it wants to retrieve. The entire list may be too large to fit into a single UDP packet. You should assume the maximum packet size for UDP as 1500 bytes. The peer must split the list into multiple WHOHAS queries if the list is too large for a single packet. Chunk hashes have a fixed length of 20 bytes. If the file is too large, your client may send out the GET requests iteratively, waiting for responses to a GET request‘s chunks to be downloaded before continuing. For better performance, your client should send these requests in parallel. Upon receipt of a WHOHAS query, a peer sends back the list of chunks it contains using the ―IHAVE ‖ reply. The list again contains the list of hashes for chunks it has. Since the request was made to fit into one packet, the response is guaranteed to fit into a single packet. 4
The requesting peer looks at all IHAVE replies and decides which remote peer to fetch each of the chunks from. It then downloads each chunk individually using"GET "requests. Because you are using UDP, you can think of a"GET request as combining the function of an application-layer"GET request and a the connection-setup function of a TCP SYN packet When a peer receives a gET request for a chunk it owns, it will send back multiple"DATA packets requesting peer(see format below)until the chunk specified in the gEt request has been completely tran These DATA packets are subject to congestion control, as outlined in Section 6. 2. The peer may not be able to satisfy the get request if it is already serving maximum number of other peers. The peer can ignore the request or queue them up or notify the requester about its inability to serve the particular request. Sending this notification is optional, and uses the DENIED code. Each peer can only have I simultaneous download from any other peer in the network, meaning that the IP address and port in the UdP packet will uniquely determine which download a Data packet belongs to. Each peer can however have parallel downloads(one each) from other peers When a peer receives a data packet it sends back an ACK packet to the sender to notify that it successfully eceived the packet Receivers should acknowledge all DATA packets 4.6 Packet formats All the communication between the peers use UDP as the underlying protocol. All packets begin with a common gic Number [2 bytes] sion Number [I byte] 3. Packet Type [I byte] 4. Header Length [2 bytes 5. Total Packet Length [2 bytes 6. Sequence Number [4 bytes] 7. Acknowledgment Number[4 bytes] Just like in the previous assignment, all multi-byte integer fields must be transmitted in network byte order(the magic number, the lengths, and the sequence/acknowledgment numbers ). Also, all integers must be unsigned The magic number should be 15441, and the version number should be 1. Peers should drop packets that do not have these values. The"Packet Type"field determines what kind of payload the peer should expect. The codes for different packet types are given in Table 1. By changing the header length, the peers can provide custom optimizations for all the packets (if you choose). Sequence number and Acknowledgment number are used for congestion control mechanisms similar to tcp as well as reliable transmission If you extend the header length, please begin your extended header with a two-byte extension ID" field set to your groups number, to ensure that you can interoperate cleanly with other people's clients Similarly, if your peer receives an extended header and the extension Id does not match your group number, just ignore the extensions The payload for both WHOHAS and IHAvE contain the number of chunk hashes(I byte), 3 bytes of empty padding space to keep the chunk 32-bit aligned, and the list of hashes(20 bytes each) in them. The format of the packet is shown in Figure 2(b). The payload of geT packet is even more simple: it contains only the chunk hash for the chunk the client wants to fetch(20 bytes) Figure 2(c)shows an example DATa packet. DATA and ACK packets do not have any payload format defined mally they should just contain file data. The sequence number and acknowledgment number fields in the header have meaning only in DATA and ACK packets. In this project the sequence numbers always start from I for a ne GET connection. A receiving peer should send an ACK packet with acknowledgment number 1 to acknowledge that is has received the data packet with sequence number I and so on. Even though there are both a sequence number and an acknowledgment number fields in the header, you should not combine DaTA and ACK packets Do not
The requesting peer looks at all IHAVE replies and decides which remote peer to fetch each of the chunks from. It then downloads each chunk individually using ―GET ‖ requests. Because you are using UDP, you can think of a ―GET‖ request as combining the function of an application-layer ―GET‖ request and a the connection-setup function of a TCP SYN packet. When a peer receives a GET request for a chunk it owns, it will send back multiple ―DATA‖ packets to the requesting peer (see format below) until the chunk specified in the GET request has been completely transferred. These DATA packets are subject to congestion control, as outlined in Section 6.2. The peer may not be able to satisfy the GET request if it is already serving maximum number of other peers. The peer can ignore the request or queue them up or notify the requester about its inability to serve the particular request. Sending this notification is optional, and uses the DENIED code. Each peer can only have 1 simultaneous download from any other peer in the network, meaning that the IP address and port in the UDP packet will uniquely determine which download a DATA packet belongs to. Each peer can however have parallel downloads (one each) from other peers. When a peer receives a DATA packet it sends back an ACK packet to the sender to notify that it successfully received the packet. Receivers should acknowledge all DATA packets. 4.6 Packet Formats All the communication between the peers use UDP as the underlying protocol. All packets begin with a common header: 1. Magic Number [2 bytes] 2. Version Number [1 byte] 3. Packet Type [1 byte] 4. Header Length [2 bytes] 5. Total Packet Length [2 bytes] 6. Sequence Number [4 bytes] 7. Acknowledgment Number [4 bytes] Just like in the previous assignment, all multi-byte integer fields must be transmitted in network byte order (the magic number, the lengths, and the sequence/acknowledgment numbers). Also, all integers must be unsigned. The magic number should be 15441, and the version number should be 1. Peers should drop packets that do not have these values. The ―Packet Type‖ field determines what kind of payload the peer should expect. The codes for different packet types are given in Table 1. By changing the header length, the peers can provide custom optimizations for all the packets (if you choose). Sequence number and Acknowledgment number are used for congestion control mechanisms similar to TCP as well as reliable transmission. If you extend the header length, please begin your extended header with a two-byte ―extension ID‖ field set to your group‘s number, to ensure that you can interoperate cleanly with other people‘s clients. Similarly, if your peer receives an extended header and the extension ID does not match your group number, just ignore the extensions. The payload for both WHOHAS and IHAVE contain the number of chunk hashes (1 byte), 3 bytes of empty padding space to keep the chunk 32-bit aligned, and the list of hashes (20 bytes each) in them. The format of the packet is shown in Figure 2(b). The payload of GET packet is even more simple: it contains only the chunk hash for the chunk the client wants to fetch (20 bytes). Figure 2(c) shows an example DATA packet. DATA and ACK packets do not have any payload format defined; normally they should just contain file data. The sequence number and acknowledgment number fields in the header have meaning only in DATA and ACK packets. In this project the sequence numbers always start from 1 for a new ―GET connection‖. A receiving peer should send an ACK packet with acknowledgment number 1 to acknowledge that is has received the data packet with sequence number 1 and so on. Even though there are both a sequence number and an acknowledgment number fields in the header, you should not combine DATA and ACK packets. Do not use a 5
Packet TypeCode IHAVE DATA 2345 DENIED Table 1: Codes for different packet types 24 4 bytes invalid valid padding Header Len Packet Len Chunk Hash #1(20 bytes) Seq Num Chunk Hash #2 (20 bytes) Ack Num (a)The basic packet header, with each (b) A full WHOHAS request with two (c)A hunk hashes in the request. Note that number of data. both seq num and ack num have no Note that meaning in this packet. Figure 2: Packet headers
15441 1 0 16 60 invalid invalid 2 padding Chunk Hash #1 (20 bytes) Chunk Hash #2 (20 bytes) 15441 1 3 16 1016 24 invalid Chunk Data (1000 bytes) Packet Type Code WHOHAS IHAVE GET DATA ACK DENIED 0 1 2 3 4 5 Table 1: Codes for different packet types. 4 bytes 4 bytes 4 bytes Magic Version Type Header Len Packet Len Seq Num Ack Num (a) The basic packet header, with each header field named. (b) A full WHOHAS request with two Chunk hashes in the request. Note that both seq num and ack num have no meaning in this packet. (c) A full DATA packet, with seq number 24 and 1000 bytes of data. Note that the ack num has no meaning because data-flowis one-way. Figure 2: Packet headers. 6
DATa packet to acknowledge a previous packet and do not send data in a ack packet. This means that for any daTA packet the ACK num will be invalid and for any ACK packet the seQ num field will be invalid. Invalid fields still take up space in the packet header, but their value should be ignored by the peer receiving the packet 4.7 File formats Chunks file: File: Chunks id chunk-hash The master-chunks-file has above format. The first line specifies the file that needs to be shared among the peers The peer should only read the chunks it is provided with in the peer's has-chunks-file parameter. All the chunks have a fixed size of 512KB. If the file size is not a multiple of 5 B then it will be padded appropriately All lines after"Chunks: contain chunk ids and the corresponding hash value of the chunk. The hash is the SHA-1 hash of the chunk, represented as a hexadecimal number(it will not have a starting Ox). The chunk id is a decimal integer, specifying the offset of the chunk in the master data file. If the chunk id is i, then the chunks content starts at an offset of i x 512k bytes into the master data file Has chunk file This file contains a list of the ids and hashes of the chunks a particular peer has, As in the master chunk file, the ids are in decimal format and hashes are in hexadecimal format For the same chunk, the id of the chunk in the has-chunk-fi will be the same as the id of that chunk in the master-chunks -file id chunk-hash Get Chunk File The format of the file is exactly same as the has-chunk-file. It contains a list of the ids and hashes the peer wishes to download. As in the master chunk file. the ids in decimal format and hashes are in hexadecimal format. For the same chunk of data, the id in the get-chunk-file might not be the same as the id of that chunk in the master-chunks-file Rather, the id here refers to the position of the chunk in the file that the user wants to save to id chunk-hash Peer list file This file contains the list of all peers in the network. The format of each line The id is a decimal number, peer-address the IP address in dotted decimal format, and the port is port integer in decimal. It will be easiest to just run all hosts on different localhost ports
DATA packet to acknowledge a previous packet and do not send data in a ACK packet. This means that for any DATA packet the ACK num will be invalid and for any ACK packet the SEQ num field will be invalid. Invalid fields still take up space in the packet header, but their value should be ignored by the peer receiving the packet. 4.7 File Formats Chunks File: File: Chunks: id chunk-hash ..... ..... The master-chunks-file has above format. The first line specifies the file that needs to be shared among the peers. The peer should only read the chunks it is provided with in the peer‘s has-chunks-file parameter. All the chunks have a fixed size of 512KB. If the file size is not a multiple of 512KB then it will be padded appropriately. All lines after ―Chunks:‖ contain chunk ids and the corresponding hash value of the chunk. The hash is the SHA-1 hash of the chunk, represented as a hexadecimal number (it will not have a starting ―0x‖). The chunk id is a decimal integer, specifying the offset of the chunk in the master data file. If the chunk id is i, then the chunk‘s content starts at an offset of i ×512k bytes into the master data file. Has Chunk File This file contains a list of the ids and hashes of the chunks a particular peer has. As in the master chunk file, the ids are in decimal format and hashes are in hexadecimal format. For the same chunk, the id of the chunk in the has-chunk-file will be the same as the id of that chunk in the master-chunks-file. id chunk-hash id chunk-hash ..... Get Chunk File The format of the file is exactly same as the has-chunk-file. It contains a list of the ids and hashes the peer wishes to download. As in the master chunk file, the ids in decimal format and hashes are in hexadecimal format. For the same chunk of data, the id in the get-chunk-file might NOT be the same as the id of that chunk in the master-chunks-file. Rather, the id here refers to the position of the chunk in the file that the user wants to save to. id chunk-hash id chunk-hash ..... Peer List File This file contains the list of all peers in the network. The format of each line is: The id is a decimal number, peer-address the IP address in dotted decimal format, and the port is port integer in decimal. It will be easiest to just run all hosts on different localhost ports. 7
5 Example Assume you have two images A gif and B. gif you want to share. These two files are available in the example subdirectory of the code. We strongly suggest that you walk through these steps as you read them in order to get a better understanding of what each file contains(the hash values in this document are not the actual hash values, to mprove readability) First, create two files whose sizes are multiple of 512K, using tar cf- A gif I dd of=/tmp/A tar bs=512K conv=sync count=2 tar cf -B. gif i dd of=/tmp/B tar bs=512K conv=sync count=2 With padding, A tar and B tar are exactly IMB big(ie: 2 chunks long) Lets run two nodes, one on port 1111 and one on port 2222 Suppose that the SHA-I hash of the first 512KB of A tar is OxDE and the second 512KB is OxAD. Similarly, for B tar the 0-512KB chunk hash is ox 15 and the 512KB-I MB chunk hash is 0x44 1 First. do the follow cat /tmp/A tar /tmp/B tar >/tmp/ctar make-chunks /tmp/c tar >/tmp/c chunks make-chunks /tmp/A tar >/tmp/A chunks make-chunks /tmp/B tar >/tmp/B chunk This will create the master data file at /tmp/C. tar. The contents of C chunks will be 000000000000000000000000000000000000000de 10000000000000000000000000000000000000 20000000000000000000000000000000000000015 30000000000000000000000000000000000000441 Recall that ids are in decimal format, while the hash is in hexadecimal. The contents of A chunks will be 100000000000000000000000000000000000000ad The contents of b chunks will be 00000000000000000000000000000000000000015 10000000000000000000000000000000000000441 Next. edit the c chunks file to add two lines and save this as c. masterchunks File: /tmp/ctar chunks 000000000000000000000000000000000000000de 100000000000000000000000000000000000000ad 000000000000000000000000000000000015 30000000000000000000000000000000000000441 lext create a peer file called/tmp/nodes.map It should contain 1127.0.0.11111 Finally, you need to create files that describe the initial content of each node. Let node I have all of file A tar and none of file b tar. Let node 2 have all of file B tar and none of A tar Create a file/tmp/A. haschunks whose contents are
5 Example Assume you have two images A.gif and B.gif you want to share. These two files are available in the ‗example‘ subdirectory of the code. We strongly suggest that you walk through these steps as you read them in order to get a better understanding of what each file contains (the hash values in this document are not the actual hash values, to improve readability). First, create two files whose sizes are multiple of 512K, using: tar cf - A.gif | dd of=/tmp/A.tar bs=512K conv=sync count=2 tar cf - B.gif | dd of=/tmp/B.tar bs=512K conv=sync count=2 With padding, A.tar and B.tar are exactly 1MB big (ie: 2 chunks long). Let‘s run two nodes, one on port 1111 and one on port 2222 Suppose that the SHA-1 hash of the first 512KB of A.tar is 0xDE and the second 512KB is 0xAD. Similarly, for B.tar the 0-512KB chunk hash is 0x15 and the 512KB-1MB chunk hash is 0x441. First, do the following: cat /tmp/A.tar /tmp/B.tar > /tmp/C.tar make-chunks /tmp/C.tar > /tmp/C.chunks make-chunks /tmp/A.tar > /tmp/A.chunks make-chunks /tmp/B.tar > /tmp/B.chunks This will create the master data file at /tmp/C.tar. The contents of C.chunks will be: 0 00000000000000000000000000000000000000de 1 00000000000000000000000000000000000000ad 2 0000000000000000000000000000000000000015 3 0000000000000000000000000000000000000441 Recall that ids are in decimal format, while the hash is in hexadecimal. The contents of A.chunks will be: 0 00000000000000000000000000000000000000de 1 00000000000000000000000000000000000000ad The contents of B.chunks will be: 0 0000000000000000000000000000000000000015 1 0000000000000000000000000000000000000441 Next, edit the C.chunks file to add two lines and save this as C.masterchunks: File: /tmp/C.tar Chunks: 0 00000000000000000000000000000000000000de 1 00000000000000000000000000000000000000ad 2 0000000000000000000000000000000000000015 3 0000000000000000000000000000000000000441 Next create a peer file called /tmp/nodes.map It should contain 1 127.0.0.1 1111 2 127.0.0.1 2222 Finally, you need to create filesthat describe the initial content of each node. Let node 1 have all of file A.tar and none of file B.tar. Let node 2 have all of file B.tar and none of A.tar. Create a file /tmp/A.haschunks whose contents are: 8
000000000000000000000000000000000000000de 10000000000000000000000000000000000000ad Create a file/tmp/B. haschunks whose contents ar 3000 000015 000441 Note that the ids in the above two files are obtained from C masterchunks which in turn refers to the offset in the ster data file Now, to run node 1, type peer -p /tmp/nodes. map -c /tmp/A. haschunks -f /tmp/C. masterchunks -m 4 -i 1 and to run node 2, type in a different terminal peer -p/tmp/nodes. map -c /tmp/B. haschunks -f /tmp/C. masterchunks -m 4 -i 2 After the peer for node I starts, you can type GET /tmp/B chunks /tmp/newB. tar. This command tell your peer to fetch all chunks listed in/tmp/B chunks and save the downloaded data chunks to the file/tmp/newB.tar ordered by the id values in/tmp/B chunks Here is an example of what your code should to do(note that messages are displayed here in plain text, but the actual packet content will be binary). Node I should send a WHOHAS 2 0000...015 0000.00441'(for the 2 chunks that are named o0.15 and 00. 1)to all the peers in nodes. map. It will get one IHAVE reply from node 2 that has \IHAVE 2 0000...015 0000..00441' Node I should then send a message to Node 2 saying AGET 0000...015 Node 2 starts sending Data packets as limited by flow/congestion control and Node sends ACk packets as it gets them. After the get completes(i.e. 512KB has been transferred), Node l should then send a message to Node 2 saying ' GET 0000.. 00441'/ and should perform this transfer as well At the end, you should have new file called /tmp/newB. tar. To make sure you got it right, you can compare this file with/tmp/B tar to make sure they are identical(use the unix"diff utility 9 In summary, there are basically three chunk description formats(get-chunks, has-chunks and master-chunks) and eer list format 6 Project Tasks This section details the requirements of the assignment. This high-level outline roughly mirrors the order in which you should implement functionality 6.1 Task 1-100% Reliability sliding window The first task is to implement a 100% reliable protocol for file transfer (ie: DATA packets) between two peers with a simple flow-control protocol. Non-Data traffic(WHOHAS, IHAVE, GET packets) does not have to be transmitted reliably or with flow-control. The peer should be able to search the network for available chunks and download them from the peers that have them. All different parts of the file should be collected at the requesting peer and their validity should be ensured before considering the chunks as received. You can check the validity of a downloaded chunk by computing its SHA-I hash and comparing it against the specified chunk hash To start thethe project, use a fixed-size window of 8 packets. The sender should not send packets that fall out of gets an ACK for a higher packet number. There is a sequence number a s e sender slides the window forward when it the window. The figure 3 shows the sliding windows for both sides. TI iated with each packet and the following constraints are valid for the sender(hint: your peers will likely want to keep state very similar to that shown here) Sending side LastPacket Acked< LastPacketSent Last PacketSent< Last PacketAvailable Note that TCP uses a byte-based sliding window, but your project will use a packet-based sliding window, It's a bit simpler to do it by packet Also, unlike TCP, you only have a sender window, meaning that window size does not need to be communicated in the packet header
0 00000000000000000000000000000000000000de 1 00000000000000000000000000000000000000ad Create a file /tmp/B.haschunks whose contents are: 2 0000000000000000000000000000000000000015 3 0000000000000000000000000000000000000441 Note that the ids in the above two files are obtained from C.masterchunks, which in turn refers to the offset in the master data file. Now, to run node 1, type: peer -p /tmp/nodes.map -c /tmp/A.haschunks -f /tmp/C.masterchunks -m 4 -i 1 and to run node 2, type in a different terminal: peer -p /tmp/nodes.map -c /tmp/B.haschunks -f /tmp/C.masterchunks -m 4 -i 2 After the peer for node 1 starts, you can type GET /tmp/B.chunks /tmp/newB.tar. This command tells your peer to fetch all chunks listed in /tmp/B.chunks and save the downloaded data chunks to the file /tmp/newB.tar ordered by the id values in /tmp/B.chunks. Here is an example of what your code should to do (note that messages are displayed here in plain text, but the actual packet content will be binary). Node 1 should send a ‘‘WHOHAS 2 0000...015 0000..00441’’ (for the 2 chunks that are named 00...15 and 00.441) to all the peers in nodes.map. It will get one IHAVE reply from node 2 that has ‘‘IHAVE 2 0000...015 0000..00441’’. Node 1 should then send a message to Node 2 saying ‘‘GET 0000...015’’. Node 2 starts sending Data packets as limited by flow/congestion control and Node 1 sends ACK packets as it gets them. After the GET completes (i.e. 512KB has been transferred), Node 1 should then send a message to Node 2 saying ‘‘GET 0000...00441’’ and should perform this transfer as well. At the end, you should have new file called /tmp/newB.tar. To make sure you got it right, you can compare this file with /tmp/B.tar to make sure they are identical (use the unix ―diff‖ utility ). In summary, there are basically three chunk description formats (get-chunks, has-chunks and master-chunks) and a peer list format. 6 Project Tasks This section details the requirements of the assignment. This high-level outline roughly mirrors the order in which you should implement functionality. 6.1 Task 1 - 100% Reliability & Sliding Window The first task is to implement a 100% reliable protocol for file transfer (ie: DATA packets) between two peers with a simple flow-control protocol. Non-Data traffic (WHOHAS, IHAVE, GET packets) does not have to be transmitted reliably or with flow-control. The peer should be able to search the network for available chunks and download them from the peers that have them. All different parts of the file should be collected at the requesting peer and their validity should be ensured before considering the chunks as received. You can check the validity of a downloaded chunk by computing its SHA-1 hash and comparing it against the specified chunk hash. To start the the project, use a fixed-size window of 8 packets1 . The sender should not send packets that fall out of the window. The Figure 3 shows the sliding windows for both sides. The sender slides the window forward when it gets an ACK for a higher packet number. There is a sequence number associated with each packet and the following constraints are valid for the sender (hint: your peers will likely want to keep state very similar to that shown here): Sending side • LastPacketAcked ≤ LastPacketSent • LastPacketSent ≤ LastPacketAvailable 1 Note that TCP uses a byte-based sliding window, but your project will use a packet-based sliding window. It‘s a bit simpler to do it by packet. Also, unlike TCP, you only have a sender window, meaning that window size does not need to be communicated in the packet header 9
Sender Receiver astPacketAvailable LastPacketRead LastPacketAcked LastPacketsent NextPacketExpected LastPacketRcvd Figure 3: Sliding Window LastPacketAvailable-LastPacketAcked Window S ize packet between LastPacketAcked and LastPacketAvailable must be"buffered-you can either implement his by buffering the packets or by being able to regenerate them from the datafile When the sender sends a data packet it starts a timer for it. It then waits for a fixed amount of time to get the ac lowledgment for the packet. Whenever the receiver gets a packet it sends an acknowledgment for NextPacketE xpected 1. That is, upon receiving a packet with sequence number =8, the reply would be"ACK 8, but only if all packets with sequence numbers less than 8 have already been received. These are called cumulative acknowledgements. The sender has two ways to know if the packets it sent did not reach the receiver: either a time-out occurred, or the sender received"duplicate ACKs If the sender sent a packet and did not receive an acknowledgment for it before the timer for the packet expired, it resends the packet. If the sender sent a packet and received duplicate acknowledgments, it knows that the next expected packet(at least)was lost. To avoid confusion from re-ordering, a sender counts a packet lost only after 3 duplicate ACKs 4 If the requesting client receives a IHAVE from a host, and then it should send a get to that same host,set a timer retransmit the get after some period of time(less than 5 seconds). You should have reasonable mechanisms in your client to recognize when successive timeouts of DaTA or gET traffic indicates that a host has likely crashed Your client should then try to download the file from another peer(reflooding the WHOHAS is fine We will test your your basic functionality using a network topology similar to Figure 4(a). A more complicated topology like Figure 4(b)will be used to test for concurrent downloads and robustness to crashes, as well as for measuring performance in the competition. As suggested by the checkpoints, you can first code-up basic flow control with a completely loss free virtual network to simplify developmen 6.2 Task 2-Congestion control You should implement a TCP-like congestion control algorithm on top of UDP for all Data traffic (you dont congestion control for WHOHAS, IHAVE, and gEt packets). TCP uses an end-to-end congestion control mechani
Sender Receiver LastPacketAvailable LastPacketRead LastPacketAcked LastPacketSent NextPacketExpected LastPacketRcvd Figure 3: Sliding Window • LastPacketAvailable − LastPacketAcked ≤ WindowSize • packet between LastPacketAcked and LastPacketAvailable must be ―buffered‖ – you can either implement this by buffering the packets or by being able to regenerate them from the datafile. When the sender sends a data packet it starts a timer for it. It then waits for a fixed amount of time to get the acknowledgment for the packet. Wheneverthe receiver gets a packet itsends an acknowledgment for NextPacketExpected− 1. That is, upon receiving a packet with sequence number = 8, the reply would be ―ACK 8‖, but only if all packets with sequence numbers less than 8 have already been received. These are called cumulative acknowledgements. The sender has two ways to know if the packets it sent did not reach the receiver: either a time-out occurred, or the sender received ―duplicate ACKs.‖ • If the sender sent a packet and did not receive an acknowledgment for it before the timer for the packet expired, it resends the packet. • If the sender sent a packet and received duplicate acknowledgments, it knows that the next expected packet (at least) was lost. To avoid confusion from re-ordering, a sender counts a packet lost only after 3 duplicate ACKs in a row. If the requesting client receives a IHAVE from a host, and then it should send a GET to that same host, set a timer to retransmit the GET after some period of time (less than 5 seconds). You should have reasonable mechanisms in your client to recognize when successive timeouts of DATA or GET traffic indicates that a host has likely crashed. Your client should then try to download the file from another peer (reflooding the WHOHAS is fine). We will test your your basic functionality using a network topology similar to Figure 4(a). A more complicated topology like Figure 4(b) will be used to test for concurrent downloads and robustness to crashes, as well as for measuring performance in the competition. As suggested by the checkpoints, you can first code-up basic flow control with a completely loss free virtual network to simplify development. 6.2 Task 2 - Congestion control You should implement a TCP-like congestion control algorithm on top of UDP for all DATA traffic (you don‘t need congestion control for WHOHAS, IHAVE, and GET packets). TCP uses an end-to-end congestion control mechanism. 10