How to Own the Internet in Your Spare time Stuart Staniford* Ⅴ ern pa Nicholas Weaver t Silicon Defense ICS/ Center for internet research UC Berkeley stuartasilicondefense.com ernaicir org weaver acs. berkeley. edu Abstract Introduction If you can control a million hosts on the Internet, you can First. you can launch dis- bility of attackers to rapidly gain control of vast tributed denial of service(DDOS)attacks so immensely rs of Internet hosts poses an immense risk to the diffuse that mitigating them is well beyond the state-of- all security of the Internet. Once subverted, these the-art for dDos traceback and protection technologies hosts can not only be used to launch massive denial of service floods, but also to steal or corrupt great quantities Such attacks could readily bring down e-commerce sites news outlets. command and coordination infrastructure of sensitive information, and confuse and disrupt use of the network in more subtle ways specific routers, or the root name servers Second, you can access any sensitive information We present an analysis of the magnitude of the threat. We begin with a mathematical model derived from em- present on any of those million machines--passwords credit card numbers. address books. archived email pirical data of the spread of Code Red I in July, 2001.We patterns of user activity, illicit content--even blindly discuss techniques subsequently employed for achiev- searching for a"needle in a haystack, "i.e, information ing greater virulence by Code Red ll and Nimda. In this that might be on a computer somewhere in the Internet context,we develop and evaluate several new, highly vi ulent possible techniques: hit-list scanning(which cre- for which you trawl using a set of content keywor ates a Warhol worm), permutation scanning(which en- Third, not only can you access this information, but you ables self-coordinating scanning), and use of Internet- can sow confusion and disruption by corrupting the in- sized hit-lists(which creates a fiash worm) formation. or sending out false or confidential informa We then turn to the to the threat of surreptitious worms tion directly from a user's desktop that spread more slowly but in a much harder to detect In short, if you could control a million Internet hosts "contagion"fashion. We demonstrate that such a worm the potential damage is truly immense: on a scale where today could arguably subvert upwards of 10,000,000 In- such an attack could play a significant role in warfare tenet hosts. We also consider robust mechanisms by between nations or in the service of terrorism which attackers can control and update deployed worms Unfortunately it is reasonable for an attacker to gain con- In conclusion, we argue for the pressing need to de- trol of a million Internet hosts, or perhaps even ten mil- velop a"Center for Disease Controlanalog for virus- and worm-based threats to national cybersecurity, and lion. The highway to such control lies in the exploita sketch some of the components that would go into such Internet by exploiting security flaws in widely-used ser- a Center vices. Internet-scale w p89, ER89], but the severity of their threat has rapidly grown with(i) the increasing degree to which the In- We distinguish between the worms discus Research supported by DARPA via contract N66001-00-C-8045 t Also with the Lawrence Berkeley National Laborato niversity some sort of user action to abet their propagation hey tend to of California, Berkeley propagate more slowly. From an attacker's perspective, they also suf- al support from Xilinx, ST Microsystems, and the Cali- fer from the presence of a large anti-virus industry that actively seeks fornia MICRO program to identify and control their spread
How to 0wn the Internet in Your Spare Time Stuart Staniford∗ Vern Paxson† Nicholas Weaver ‡ Silicon Defense ICSI Center for Internet Research UC Berkeley stuart@silicondefense.com vern@icir.org nweaver@cs.berkeley.edu Abstract The ability of attackers to rapidly gain control of vast numbers of Internet hosts poses an immense risk to the overall security of the Internet. Once subverted, these hosts can not only be used to launch massive denial of service floods, but also to steal or corrupt great quantities of sensitive information, and confuse and disrupt use of the network in more subtle ways. We present an analysis of the magnitude of the threat. We begin with a mathematical model derived from empirical data of the spread of Code Red I in July, 2001. We discuss techniques subsequently employed for achieving greater virulence by Code Red II and Nimda. In this context, we develop and evaluate several new, highly virulent possible techniques: hit-list scanning (which creates a Warhol worm), permutation scanning (which enables self-coordinating scanning), and use of Internetsized hit-lists (which creates a flash worm). We then turn to the to the threat of surreptitious worms that spread more slowly but in a much harder to detect “contagion” fashion. We demonstrate that such a worm today could arguably subvert upwards of 10,000,000 Internet hosts. We also consider robust mechanisms by which attackers can control and update deployed worms. In conclusion, we argue for the pressing need to develop a “Center for Disease Control” analog for virusand worm-based threats to national cybersecurity, and sketch some of the components that would go into such a Center. ∗Research supported by DARPA via contract N66001-00-C-8045 †Also with the Lawrence Berkeley National Laboratory, University of California, Berkeley. ‡Additional support from Xilinx, ST Microsystems, and the California MICRO program 1 Introduction If you can control a million hosts on the Internet, you can do enormous damage. First, you can launch distributed denial of service (DDOS) attacks so immensely diffuse that mitigating them is well beyond the state-ofthe-art for DDOS traceback and protection technologies. Such attacks could readily bring down e-commerce sites, news outlets, command and coordination infrastructure, specific routers, or the root name servers. Second, you can access any sensitive information present on any of those million machines—passwords, credit card numbers, address books, archived email, patterns of user activity, illicit content—even blindly searching for a “needle in a haystack,” i.e., information that might be on a computer somewhere in the Internet, for which you trawl using a set of content keywords. Third, not only can you access this information, but you can sow confusion and disruption by corrupting the information, or sending out false or confidential information directly from a user’s desktop. In short, if you could control a million Internet hosts, the potential damage is truly immense: on a scale where such an attack could play a significant role in warfare between nations or in the service of terrorism. Unfortunately it is reasonable for an attacker to gain control of a million Internet hosts, or perhaps even ten million. The highway to such control lies in the exploitation of worms: programs that self-propagate across the Internet by exploiting security flaws in widely-used services.1 Internet-scale worms are not a new phenomenon [Sp89, ER89], but the severity of their threat has rapidly grown with (i) the increasing degree to which the In- 1 We distinguish between the worms discussed in this paper— active worms—and viruses (or email worms) in that the latter require some sort of user action to abet their propagation. As such, they tend to propagate more slowly. From an attacker’s perspective, they also suffer from the presence of a large anti-virus industry that actively seeks to identify and control their spread
Code Red I v2 Code Red ll Code Red ll 88-88。 Nimda Days since地y182001 Days Since Sept 20, 2001 Figure 1: Onset of Code Red I v2, Code Red Il, and Nimda: Figure 2: The endemic nature of Internet worms: Number Number of remote hosts launching confirmed attacks corre- of remote hosts launching confirmed attacks corresponding to sponding to different worms, as seen at the Lawrence berkeley different worms, as seen at the Lawrence berkeley National National Laboratory. Hosts are detected by the distinct URLs Laboratory, over several months since their onset. Since July, they attempt to retrieve, corresponding to the lIs exploits and 139, 000 different remote Code Red I hosts have been con- attack strings. Since Nimda spreads by multiple vectors, firmed attacking LBNL, 125,000 different Code Red ll hosts counts shown for it may be an underestimate and 63.000 Nimda hosts. Of these. 20. 000 were observed to be infected with two different worms, and 1, 000 with all three worms(Again, Nimda is potentially an underestimate because ternet has become part of a nation's critical infrastruc- we are only counting those launching Web attacks. ture, and (ii) the recent, widely publicized introduction of very very rapidly spreading Internet worms, such that this technique is likely to be particularly cur- surreptitious worms. These spread more slowly, but in a ent in the ds of attackers much harder to detect"contagion"fashion, masquerad- ing as normal traffic. We demonstrate that such a worm We present an analysis of the magnitude of the threat today could arguably subvert upwards of 10,000,000 In- We begin with a mathematical model derived from em- ternet host pirical data of the spread of Code Red I v2 in July and August, 2001(Section 2). We then discuss techniques Then in Section 6, we discuss some possibilities employed for achieving greater effectiveness and viru- by which an attacker could control the worm using lence by the subsequent Code Red II and Nimda worms cryptographically-secured updates, enabling it to remain (Section 3). Figures 1 and 2 show the onset and progress a threat for a considerable period of time. Even when of the Code red and Nimda worms as seen "in the wild" most traces of the worm have been removed from the network, such an"updatable worm st emains a SIg- In this context, we develop the threat of three new nificant threat techniques for highly virulent worms: hit-list scanning, Having demonstrated the very serious permutation scanning, and Internet scale hit-lists(Sec of the tion 4). Hit-list scanning is a technique for accelerat- threat, we then in Section 7 discuss an ious but we believe highly necessary strategy for addressing it ing the initial spread of a worm. Permutation scanning the establishment at a national or international level is a mechanism for distributed coordination of a worm of a"Center for Disease Control" analog for virus Combining these two techniques creates the possibility and worm-based threats to cybersecurity. We discuss of a Warhol worm, 2 seemingly capable of infecting most the roles we envision such a Center serving, and offer or all vulnerable targets in a few minutes to perhaps an thoughts on the sort of resources and structure the Cen- hour. An extension of the hit-list technique creates flash worm, which appears capable of infecting the vul- ter would require in order to do so. Our aim is not to nerable population in 10s of seconds: so fast that no comprehensively examine each role, but to spur further human-Imediated counter-response is possible discussion of the issues within the community We then turn in Section 5 to the threat of a new class of 2So named for the quotation"In minutes of fame
0 20 40 60 80 0 5000 10000 20000 Days Since July 18, 2001 Distinct Remote Hosts Attacking LBNL Jul 19 Aug 1 Sep 1 Sep 19 Oct 1 Code Red I v2 Code Red II Nimda Figure 1: Onset of Code Red I v2, Code Red II, and Nimda: Number of remote hosts launching confirmed attacks corresponding to different worms, as seen at the Lawrence Berkeley National Laboratory. Hosts are detected by the distinct URLs they attempt to retrieve, corresponding to the IIS exploits and attack strings. Since Nimda spreads by multiple vectors, the counts shown for it may be an underestimate. ternet has become part of a nation’s critical infrastructure, and (ii) the recent, widely publicized introduction of very large, very rapidly spreading Internet worms, such that this technique is likely to be particularly current in the minds of attackers. We present an analysis of the magnitude of the threat. We begin with a mathematical model derived from empirical data of the spread of Code Red I v2 in July and August, 2001 (Section 2). We then discuss techniques employed for achieving greater effectiveness and virulence by the subsequent Code Red II and Nimda worms (Section 3). Figures 1 and 2 show the onset and progress of the Code Red and Nimda worms as seen “in the wild.” In this context, we develop the threat of three new techniques for highly virulent worms: hit-list scanning, permutation scanning, and Internet scale hit-lists (Section 4). Hit-list scanning is a technique for accelerating the initial spread of a worm. Permutation scanning is a mechanism for distributed coordination of a worm. Combining these two techniques creates the possibility of a Warhol worm,2 seemingly capable of infecting most or all vulnerable targets in a few minutes to perhaps an hour. An extension of the hit-list technique creates a flash worm, which appears capable of infecting the vulnerable population in 10s of seconds: so fast that no human-mediated counter-response is possible. We then turn in Section 5 to the threat of a new class of 2So named for the quotation “In the future, everyone will have 15 minutes of fame.” 0 50 100 150 0 500 1000 1500 2000 Days Since Sept. 20, 2001 Distinct Remote Hosts Attacking LBNL Oct 1 Oct 15 Nov 1 Nov 15 Dec 1 Dec 15 Jan 1 Jan 15 Nimda Code Red I v2 Code Red II Figure 2: The endemic nature of Internet worms: Number of remote hosts launching confirmed attacks corresponding to different worms, as seen at the Lawrence Berkeley National Laboratory, over several months since their onset. Since July, 139,000 different remote Code Red I hosts have been con- firmed attacking LBNL; 125,000 different Code Red II hosts; and 63,000 Nimda hosts. Of these, 20,000 were observed to be infected with two different worms, and 1,000 with all three worms. (Again, Nimda is potentially an underestimate because we are only counting those launching Web attacks.) surreptitious worms. These spread more slowly, but in a much harder to detect “contagion” fashion, masquerading as normal traffic. We demonstrate that such a worm today could arguably subvert upwards of 10,000,000 Internet hosts. Then in Section 6, we discuss some possibilities by which an attacker could control the worm using cryptographically-secured updates, enabling it to remain a threat for a considerable period of time. Even when most traces of the worm have been removed from the network, such an “updatable” worm still remains a significant threat. Having demonstrated the very serious nature of the threat, we then in Section 7 discuss an ambitious but we believe highly necessary strategy for addressing it: the establishment at a national or international level of a “Center for Disease Control” analog for virusand worm-based threats to cybersecurity. We discuss the roles we envision such a Center serving, and offer thoughts on the sort of resources and structure the Center would require in order to do so. Our aim is not to comprehensively examine each role, but to spur further discussion of the issues within the community
2 An Analysis of Code red I a monthly resurgence, as seen in Figure 2. Why it con- tinues to gain strength with each monthly appearance re mains unknown The first version of the Code Red worm was initially We call this model the Random Constant Spread(RCS) seen in the wild on July 13th, 2001, according to Ryan model. The model assumes that the worm had a good [EDSOla, EDSOlb1, who disassembled the worm code random number generator that is properly seeded. We mising Microsoft IIs web servers using the ida vulner- be potentially compromised from the Interne ability discovered also by Eeye and published June 18th make the approximation that N is fixed-ignoring both [EDSolc] and was assigned CVE number CVE-2001- patching of systems during the worm spread and normal 0500cV01] deploying and removing of systems or turning on and off of systems at night. We also ignore any spread of the Once it infected a host, Code -Red spread by launching worm behind firewalls on private Intranets) 99 threads which generated random IP addresses, and K is the initial compromise rate. That is, the number then tried to compromise those IP addresses using the of vulnerable hosts which an infected host can find and same vulnerability. A hundredth thread defaced the web server in some cases compromise per hour at the start of the incident, when few other hosts are compromised. We assume that K However, the first version of the worm analyzed by a global constant, and does not depend on the processor Eeye, which came to be known as CRv1, had an apparent speed, network connection, or location of the infected ng. The random number generator was initialized with machine. (Clearly, constant k is only an approxima- a fixed seed, so that all copies of the worm in a particular tion. )We assume that a compromised machine picks thread, on all hosts, generated and attempted to compro- other machines to attack completely at random, and that mise exactly the same sequence of IP addresses. (The once a machine is compromised, it cannot be compro- thread identifier is part of the seeding, so the worm had a msed again, or that if it is, that does not increase the hundred different sequences that it explores through the rate at which it can find and attack new systems. We space of IP addresses, but it only explored those hun assume that once it is compromised, it stays that way dred.) Thus CRvI had a linear spread and never com- t is a time which fixes when the incident happens promised many machines On July 19th, 2001, a second version of the worm began We then have the following variables to spread. This was suspected informally via mailing list discussion, then confirmed by the mathematical analysi we present below, and finally definitively confirmed by a is the proportion of vulnerable machines which have been compromised disassembly of the new worm. This version came to be known as Crv2 or Code red I t is the time(in hours Code Red I v2 was the same codebase as cRyl in al most all respects--the only differences were fixing the Now, we analyze the problem by assuming that bug with the random number generation, an end to web some particular time t, a proportion of the machines site defacements, and a Ddos payload targeting the IP a have been compromised, and then asking how many addressofwww.whitehouse.gov more machines, Nda, will get compromised in the next amount of time dt. The answer is: We developed a tentative quantitative theory of what happened with the spread of Code red I worm. The new Nda=(Na)K(1-a)dt version spread very rapidly until almost all vulnerable IIS servers on the Internet were compromised. It stopped The reason is that the number of machines compromised trying to spread at midnight UtC due to an internal c in the next increment of time is proportional to the num straint in the worm that caused it to turn itself off. It then ber of machines already compromised(Na)times the reactivated on August Ist, though for a while its spread number of machines each compromised machine can was suppressed by competition with Code Red Il(see below ). However, Code Red Il died by design [SAol] server includes uis. new vulnerable machines have been added October 1. while Code red i has continued to make Internet
2 An Analysis of Code Red I The first version of the Code Red worm was initially seen in the wild on July 13th, 2001, according to Ryan Permeh and Marc Maiffret of Eeye Digital Security [EDS01a, EDS01b], who disassembled the worm code and analyzed its behavior. The worm spread by compromising Microsoft IIS web servers using the .ida vulnerability discovered also by Eeye and published June 18th [EDS01c] and was assigned CVE number CVE-2001- 0500 [CV01]. Once it infected a host, Code-Red spread by launching 99 threads which generated random IP addresses, and then tried to compromise those IP addresses using the same vulnerability. A hundredth thread defaced the web server in some cases. However, the first version of the worm analyzed by Eeye, which came to be known as CRv1, had an apparent bug. The random number generator was initialized with a fixed seed, so that all copies of the worm in a particular thread, on all hosts, generated and attempted to compromise exactly the same sequence of IP addresses. (The thread identifier is part of the seeding, so the worm had a hundred different sequences that it explores through the space of IP addresses, but it only explored those hundred.) Thus CRv1 had a linear spread and never compromised many machines. On July 19th, 2001, a second version of the worm began to spread. This was suspected informally via mailing list discussion, then confirmed by the mathematical analysis we present below, and finally definitively confirmed by disassembly of the new worm. This version came to be known as CRv2, or Code Red I. Code Red I v2 was the same codebase as CRv1 in almost all respects—the only differences were fixing the bug with the random number generation, an end to web site defacements, and a DDOS payload targeting the IP address of www.whitehouse.gov. We developed a tentative quantitative theory of what happened with the spread of Code Red I worm. The new version spread very rapidly until almost all vulnerable IIS servers on the Internet were compromised. It stopped trying to spread at midnight UTC due to an internal constraint in the worm that caused it to turn itself off. It then reactivated on August 1st, though for a while its spread was suppressed by competition with Code Red II (see below). However, Code Red II died by design [SA01] on October 1, while Code Red I has continued to make a monthly resurgence, as seen in Figure 2. Why it continues to gain strength with each monthly appearance remains unknown.3 We call this model the Random Constant Spread (RCS) model. The model assumes that the worm had a good random number generator that is properly seeded. We define N as the total number of vulnerable servers which can be potentially compromised from the Internet. (We make the approximation that N is fixed—ignoring both patching of systems during the worm spread and normal deploying and removing of systems or turning on and off of systems at night. We also ignore any spread of the worm behind firewalls on private Intranets). K is the initial compromise rate. That is, the number of vulnerable hosts which an infected host can find and compromise per hour at the start of the incident, when few other hosts are compromised. We assume that K is a global constant, and does not depend on the processor speed, network connection, or location of the infected machine. (Clearly, constant K is only an approximation.) We assume that a compromised machine picks other machines to attack completely at random, and that once a machine is compromised, it cannot be compromised again, or that if it is, that does not increase the rate at which it can find and attack new systems. We assume that once it is compromised, it stays that way. T is a time which fixes when the incident happens. We then have the following variables: • a is the proportion of vulnerable machines which have been compromised. • t is the time (in hours). Now, we analyze the problem by assuming that at some particular time t, a proportion of the machines a have been compromised, and then asking how many more machines, N da, will get compromised in the next amount of time dt. The answer is: N da = (N a)K(1 − a)dt. (1) The reason is that the number of machines compromised in the next increment of time is proportional to the number of machines already compromised (N a) times the number of machines each compromised machine can 3One possibility is that, since the default install of Windows 2000 server includes IIS, new vulnerable machines have been added to the Internet
This is interesting because it tells us that a worm like this can compromise all vulnerable machines on the Internet fairly fast. 与25558*8日之 Figure 3 shows hourly probe rate data from Ken Eich- mann of the Chemical Abstracts Service for the hourly probe rate inbound on port 80 at that site. Also shown is a fit to the data with K=1. 8.T=11.9. and with 100000 the top of the fit scaled to a maximum probe rate of 510,000 scans/hour. (We fit it to fall slightly below the 0246810121416 data curve. since it seems there is a fixed background Hour of the day rate of web probes that was going on before the rapid rise due to the worm spread. ) This very simple theory 一# of scans一群 of unique IPs 一 Predicted群 of scan can be seen to give a reasonable first approximation ex planation of the worm behavior. See also Section 4.3 for alidation of the theory via simulation igure 3: Hourly probe rate data for inbound port 80 at the Chemical Abstracts Service during the initial outbreak of Code Note that we fit the scan rate. rather than the number of Red I on July 19th, 2001. The t-axis is the hour of the day distinct IPs seen at this site. The incoming scan rate seen (CDT time zone), while the y-axis is probe rate, the number at a site is directly proportional to the total number of in- of different IP addresses seen, and a fit to the data discussed the text fected IPs on the Internet, since there is a fixed probabil ity for any worm copy to scan this particular site in the current time interval. however. the number of distinct compromise per unit time(K(1-a)), times the incre- IPs seen at a site is distorted relative to the overall in- ment of time(dt).(Note that machines can compromis fection curve. This is because a given worm copy, once K others per unit time to begin with, but only K (1-a) it is infected, will take some amount of time before it once a proportion of other machines are compromised gets around to scanning any particular site. For a small address space, this delay can be sizeable and causes the distinct IP graph at the given site to lag behind the over This give us the differential equation all Internet infection rate graph =Ka(1-a) (2) Two implications of this graph are interesting. One is that the worm came close to saturating before it turned with solution itself off at midnight UTC(1900 CDT), as the num- K(t-r) ber of copies ceased increasing a few hours before the 1+ek(t-T (3) worm s automatic turnoff. Thus it had found the bulk of the servers it was going to find at this time. Secondly, the infection rate was about 1.8 per hour-in the early where T is a constant of integration that fixes the time stages of the infection, each infected server was able to position of the incident. This equation has been well known for many years as the logistic equation, and gov erns the rate of growth of epidemics in finite systems Although Code Red I turned itself off at midnight UTC when all entities are equally likely to infect any other on July igth, hosts with inaccurate clocks kept it alive entity(which is true for randomized spreading among and allowed it to spread when the worm code al Internet-connected servers, in the absence of firewall fil- lowed it to re-awaken on August Ist. Figure 4 show tering rules that differentially affect infectability from or similar data and fit for that incident. The K here is about to different addresses) 0. 7. Since the worm code-base was the same. this lower pread rate indicates that the number of vulnerable sys- This is an interesting equation. For early t(significantly tems was a little less than 40% as many as the first time before T), a grows exponentially. For large t(signifi- around. That is, the data appears consistent with slightly cantly after T), a goes to 1(all vulnerable machines are more than half the systems having been fixed in the 11 compromised). The rate at which this happens depends days intervening only on K(the rate at which one machine can compro se others), and not at all on the number of machines
0 100,000 200,000 300,000 400,000 500,000 600,000 0 2 4 6 8 10 12 14 16 Hour of the day Number seen in an hour # of scans # of unique IPs Predicted # of scans Figure 3: Hourly probe rate data for inbound port 80 at the Chemical Abstracts Service during the initial outbreak of Code Red I on July 19th, 2001. The x-axis is the hour of the day (CDT time zone), while the y-axis is probe rate, the number of different IP addresses seen, and a fit to the data discussed in the text. compromise per unit time (K(1 − a)), times the increment of time (dt). (Note that machines can compromise K others per unit time to begin with, but only K ·(1−a) once a proportion of other machines are compromised already.) This give us the differential equation: da dt = Ka(1 − a) (2) with solution: a = e K(t−T) 1 + eK(t−T) , (3) where T is a constant of integration that fixes the time position of the incident. This equation has been well known for many years as the logistic equation, and governs the rate of growth of epidemics in finite systems when all entities are equally likely to infect any other entity (which is true for randomized spreading among Internet-connected servers, in the absence of firewall filtering rules that differentially affect infectability from or to different addresses). This is an interesting equation. For early t (significantly before T), a grows exponentially. For large t (signifi- cantly after T), a goes to 1 (all vulnerable machines are compromised). The rate at which this happens depends only on K (the rate at which one machine can compromise others), and not at all on the number of machines. This is interesting because it tells us that a worm like this can compromise all vulnerable machines on the Internet fairly fast. Figure 3 shows hourly probe rate data from Ken Eichmann of the Chemical Abstracts Service for the hourly probe rate inbound on port 80 at that site. Also shown is a fit to the data with K = 1.8, T = 11.9, and with the top of the fit scaled to a maximum probe rate of 510,000 scans/hour. (We fit it to fall slightly below the data curve, since it seems there is a fixed background rate of web probes that was going on before the rapid rise due to the worm spread.) This very simple theory can be seen to give a reasonable first approximation explanation of the worm behavior. See also Section 4.3 for validation of the theory via simulation. Note that we fit the scan rate, rather than the number of distinct IPs seen at this site. The incoming scan rate seen at a site is directly proportional to the total number of infected IPs on the Internet, since there is a fixed probability for any worm copy to scan this particular site in the current time interval. However, the number of distinct IPs seen at a site is distorted relative to the overall infection curve. This is because a given worm copy, once it is infected, will take some amount of time before it gets around to scanning any particular site. For a small address space, this delay can be sizeable and causes the distinct IP graph at the given site to lag behind the overall Internet infection rate graph. Two implications of this graph are interesting. One is that the worm came close to saturating before it turned itself off at midnight UTC (1900 CDT), as the number of copies ceased increasing a few hours before the worm’s automatic turnoff. Thus it had found the bulk of the servers it was going to find at this time. Secondly, the infection rate was about 1.8 per hour—in the early stages of the infection, each infected server was able to find about 1.8 other servers per hour. Although Code Red I turned itself off at midnight UTC on July 19th, hosts with inaccurate clocks kept it alive and allowed it to spread again when the worm code allowed it to re-awaken on August 1st. Figure 4 shows similar data and fit for that incident. The K here is about 0.7. Since the worm code-base was the same, this lower spread rate indicates that the number of vulnerable systems was a little less than 40% as many as the first time around. That is, the data appears consistent with slightly more than half the systems having been fixed in the 11 days intervening
Finally, with probability 1/8 it would choose a random address from the whole Internet 200000 This strategy appears quite successful. The localized 150.000 preading allows the worm to quickly infect parts of the Internet that contain many vulnerable hosts, and also 100000 means that the infection often proceeds quicker since hosts with similar Ip addresses are often close together in the network topology also. This strategy also allows a 0邮 02468101214161820 once it manages to pass through the external firewal Hour of the day Unfortunately, developing an analytic model for the # of scans一暑 Predicted群 of scans spread of a worm employing this type of localized scan- ning strategy is significantly more difficult than the mod eling effort in Section 2, because it requires incorpo- igure 4: Hourly probe rate data for inbound port 80 at the rating potentially highly non-homogeneous patterns of Chemical Abstracts Service, for Code Red Is reemergence on population locality. The empirical data is also harder August Ist. The x-axis the time of day on August Ist( Central to interpret, because Code Red I was quite active when US Time). The y-axis shows the monitored probe rate and a it Code Red ll was released. Indeed, it appears that Code for the data discussed in the text Red Il took a while to overcome Code Red I(see Fig ure 1), but fully determining the interplay between the 3“ Better” worms-practice two appears to be a significant undertaking In this section, we explore the strategies adopted by the 3.2 Multi-vector worms-Nimda two major worms released subsequent to Code Red I “ Code red ir and“ Nimda As well illustrated by the Nimda worm/virus(and, in- deed, the original Internet Worm[Sp89, ER89), malev 3.1 Localized scanning-Code Red ll olent code is not restricted to a single technique. Nimda began on September 18th, 2001, spre ead ve and maintained itself on the internet for months after it The Code Red II worm was released on Saturday August started. Nimda spread extensively behind firewalls, and 4th, 2001 and spread rapidly [CEO1, SA01. The worm illustrates the ferocity and wide reach that a multi-mode code contained a comment stating that it was"Co worm can exhibit. The worm is thought to have used at Red Il"but it was an unrelated code base. It did use the least five different methods to spread itself. same vulnerability, however-a buffer overflow in Mi- crosoft's Iis Web server with Cve number CVE-2001 0500. When successful, the payload installed a root By infecting Web servers from infected client ma- backdoor allowing unrestricted remote access to the in- chines via active probing for a Microsoft Iis vul- fected host. The worm exploit only worked correctly nerability(CVE-2000-0884) when IIs was running on Microsoft Windows 2000; on Windows NT it caused a system crash rather than an By bulk emailing of itself as an attachment based on email addresses determined from the infected The worm was also a single-stage scanning worm that By copying itself across open network shares chose random IP addresses and attempted to infect them However, it used a localized scanning strategy, where it By adding exploit code to Web pages on com was differentially likely to attempt to infect addresses promised servers in order to infect clients which close to it. Specifically, with probability 3 8 it chose a random IP address from within the class B address space (16 network) of the infected machine. With probability By scanning for the backdoors nd by code 1/2 it chose randomly from its own class A(/8 network) Red ii and also the“ sadmind
0 50,000 100,000 150,000 200,000 250,000 0 2 4 6 8 10 12 14 16 18 20 Hour of the day Number seen in an hour # of scans Predicted # of scans Figure 4: Hourly probe rate data for inbound port 80 at the Chemical Abstracts Service, for Code Red I’s reemergence on August 1st. The x-axis the time of day on August 1st (Central US Time). The y-axis shows the monitored probe rate and a fit for the data discussed in the text. 3 “Better” worms—practice In this section, we explore the strategies adopted by the two major worms released subsequent to Code Red I: “Code Red II” and “Nimda.” 3.1 Localized scanning—Code Red II The Code Red II worm was released on Saturday August 4th, 2001 and spread rapidly [CE01, SA01]. The worm code contained a comment stating that it was “Code Red II,” but it was an unrelated code base. It did use the same vulnerability, however—a buffer overflow in Microsoft’s IIS Web server with CVE number CVE-2001- 0500. When successful, the payload installed a root backdoor allowing unrestricted remote access to the infected host. The worm exploit only worked correctly when IIS was running on Microsoft Windows 2000; on Windows NT it caused a system crash rather than an infection. The worm was also a single-stage scanning worm that chose random IP addresses and attempted to infect them. However, it used a localized scanning strategy, where it was differentially likely to attempt to infect addresses close to it. Specifically, with probability 3/8 it chose a random IP address from within the class B address space (/16 network) of the infected machine. With probability 1/2 it chose randomly from its own class A (/8 network). Finally, with probability 1/8 it would choose a random address from the whole Internet. This strategy appears quite successful. The localized spreading allows the worm to quickly infect parts of the Internet that contain many vulnerable hosts, and also means that the infection often proceeds quicker since hosts with similar IP addresses are often close together in the network topology also. This strategy also allows a worm to spread very rapidly within an internal network once it manages to pass through the external firewall. Unfortunately, developing an analytic model for the spread of a worm employing this type of localized scanning strategy is significantly more difficult than the modeling effort in Section 2, because it requires incorporating potentially highly non-homogeneous patterns of population locality. The empirical data is also harder to interpret, because Code Red I was quite active when Code Red II was released. Indeed, it appears that Code Red II took a while to overcome Code Red I (see Figure 1), but fully determining the interplay between the two appears to be a significant undertaking. 3.2 Multi-vector worms—Nimda As well illustrated by the Nimda worm/virus (and, indeed, the original Internet Worm [Sp89, ER89]), malevolent code is not restricted to a single technique. Nimda began on September 18th, 2001, spread very rapidly, and maintained itself on the Internet for months after it started. Nimda spread extensively behind firewalls, and illustrates the ferocity and wide reach that a multi-mode worm can exhibit. The worm is thought to have used at least five different methods to spread itself. • By infecting Web servers from infected client machines via active probing for a Microsoft IIS vulnerability (CVE-2000-0884). • By bulk emailing of itself as an attachment based on email addresses determined from the infected machine. • By copying itself across open network shares • By adding exploit code to Web pages on compromised servers in order to infect clients which browse the page. • By scanning for the backdoors left behind by Code Red II and also the “sadmind” worm [CE03]
Onset of nimda ing, if it receives the right trigger, or a prearranged time rolls around. We return to this point in Section 7 4“ Better” worms-theor g 8 There are several techniques which, although not yet em- ployed, could further significantly increase the virulence of a worm. Beyond the obvious factors of discover- ing more widespread security holes and increasing the canning rate, some additional strategies a worm author could employ are:() hit-list scanning, (ii)permutation scanning,(iii)topologically aware worms, and (iv)In- ternet scale hit-lists. The goal is very rapid infection-in particular, considerably faster than any possible human- 06.5 A worm's scanner can obviously be made significantly Time(PDt)18 September, 2001 faster than the ones seen today by careful use of thread- ing and an understanding of the protocols. By having Figure 5: Http connections per second seen at the many requests outstanding a worm should be capable Lawrence Berkeley National Laboratory rising due to the on- of scanning targets at a rate proportional to its access set of Nimda, September 18 bandwidth. Since it only takes 40 bytes for a TCP SYN packet to determine if a service is accessible, and often only a few hundred bytes to attempt an exploit, the po- Figure 5 illustrates how rapidly the worm tried to in- tential scans per second can easily exceed 100 for even fect one site, the Lawrence Berkeley National Labora- poor Internet connections. This increases K by allow- tory. The T-axis plots hours past midnight, PDT, while ing a worm to search for a greater number of targets in a the y-axis plots Http connection attempts per second given period of time Only connections from hosts confirmed to have harbored Nimda are counted, to avoid possible confusion with Similarly, the more widespread the vulnerable software concurrent Code Red connection attempts. After the on- is, the faster a worm using that vulnerability can sprea set of the infection, the total rate of probing was about because each random scan of the network is more likely 3 times that from the hosts subsequently confirmed to to pick up a target, also increasing K. We should there- harbor nimda fore expect that worm authors will devote considerable scrutiny to highly homogeneous, highly deployed ser Clearly, onset was quite rapid, rising in just half an hour vices, both for the faster spreading and for the greater from essentially no probing to a sustained rate of nearly number of machines that could be compromised in a sin- 100 probes/sec There is an additional synergy in Nimda's use of mul- tiple infection vectors: many firewalls allow mail to 4.1 Hit-list Scanning pass untouched, relying on the mail servers to re- move pathogens. Yet since many mail servers remove pathogens based on signatures, they arent effective dur- One of the biggest problems a worm faces in achieving ing the first few minutes to hours of an outbreak, giving a very rapid rate of infection is"getting off the ground Nimda a reasonably effective means of crossing firewalls Although a worm spreads exponentially during the early to invade internal networks stages of infection, the time needed to infect say the first 10.000 hosts dominates the infection time. as can be seen Finally, we note that Nimda's full functionality is still in Figure 3 not known: all that is known is how it spreads, but not what it might be capable of doing in addition to spre There is a simple way for an active worm to overcome
Onset of NIMDA Time (PDT) 18 September, 2001 C onn / Sec 6.0 6.5 7.0 7.5 8.0 0 20 4 0 6 0 80 10 0 120 14 0 C o n n e ctio ns/S e c o n d Figure 5: HTTP connections per second seen at the Lawrence Berkeley National Laboratory, rising due to the onset of Nimda, September 18. Figure 5 illustrates how rapidly the worm tried to infect one site, the Lawrence Berkeley National Laboratory. The x-axis plots hours past midnight, PDT, while the y-axis plots HTTP connection attempts per second. Only connections from hosts confirmed to have harbored Nimda are counted, to avoid possible confusion with concurrent Code Red connection attempts. After the onset of the infection, the total rate of probing was about 3 times that from the hosts subsequently confirmed to harbor Nimda. Clearly, onset was quite rapid, rising in just half an hour from essentially no probing to a sustained rate of nearly 100 probes/sec. There is an additional synergy in Nimda’s use of multiple infection vectors: many firewalls allow mail to pass untouched, relying on the mail servers to remove pathogens. Yet since many mail servers remove pathogens based on signatures, they aren’t effective during the first few minutes to hours of an outbreak, giving Nimda a reasonably effective means of crossing firewalls to invade internal networks. Finally, we note that Nimda’s full functionality is still not known: all that is known is how it spreads, but not what it might be capable of doing in addition to spreading, if it receives the right trigger, or a prearranged time rolls around. We return to this point in Section 7. 4 “Better” worms—theory There are several techniques which, although not yet employed, could further significantly increase the virulence of a worm. Beyond the obvious factors of discovering more widespread security holes and increasing the scanning rate, some additional strategies a worm author could employ are: (i) hit-list scanning, (ii) permutation scanning, (iii) topologically aware worms, and (iv) Internet scale hit-lists. The goal is very rapid infection—in particular, considerably faster than any possible humanmediated response. A worm’s scanner can obviously be made significantly faster than the ones seen today, by careful use of threading and an understanding of the protocols. By having many requests outstanding, a worm should be capable of scanning targets at a rate proportional to its access bandwidth. Since it only takes 40 bytes for a TCP SYN packet to determine if a service is accessible, and often only a few hundred bytes to attempt an exploit, the potential scans per second can easily exceed 100 for even poor Internet connections. This increases K by allowing a worm to search for a greater number of targets in a given period of time. Similarly, the more widespread the vulnerable software is, the faster a worm using that vulnerability can spread, because each random scan of the network is more likely to pick up a target, also increasing K. We should therefore expect that worm authors will devote considerable scrutiny to highly homogeneous, highly deployed services, both for the faster spreading and for the greater number of machines that could be compromised in a single attack. 4.1 Hit-list Scanning One of the biggest problems a worm faces in achieving a very rapid rate of infection is “getting off the ground.” Although a worm spreads exponentially during the early stages of infection, the time needed to infect say the first 10,000 hosts dominates the infection time, as can be seen in Figure 3. There is a simple way for an active worm to overcome
this obstacle, which we term hit-list scanning. Before gines in order to produce a list of most Internet the worm is released the worm author collects a list of connected Web sites. This would be unlikely to at say 10,000 to 50,000 potentially vulnerable machines tract serious attention ideally ones with good network connections. The worm, when released onto an initial machine on this hit-list be-. Public For many potential targets there gins scanning down the list. When it infects a machine, available listing them, such as the it divides the hit-list in half, communicating half to the Ne02] recipient worm, keeping the other half. Just listen. Some applications, such as peer-to- This quick division ensures that even if only 10-20%of peer networks, wind up advertising many of their the machines on the hit-list are actually vulnerable, an servers. Similarly, many previous worms effec- active worm will quickly go through the hit-list and tively broadcast that the infected machine is vul tablish itself on all vulnerable machines in only a few nerable to further attack. For example, because of seconds. Although the hit-list may start at 200 kilo- its widespread scanning, during the Code Red I in- bytes, it quickly shrinks to nothing during the partition fection it was easy to pick up the addresses of up- ing. This provides a great benefit in constructing a fast wards of 300.000 vulnerable IIs servers-because worm by speeding the initial infection. each one came knocking on everyones door Indeed, some individuals produced active counter The hit-list neednt be perfect: a simple list of machines measures to Code red ll by exploiting this obser- running a particular server type may suffice, although vation. when combined with the backdoor which greater accuracy will improve the spread. The hit-list Code red ll installs [ DAol. However, it is not a itself can be generated using one or several of the fol- given that future worms will broadcast their pres- lowing techniques, prepared well in advance, generally ence, and we also note that worms could readily fix with little fear of detection the very security holes they exploit(such as is often already observed for attackers performing break Stealthy scans. Portscans are so common and ins manually ) which undermines the superficially widely ignored that even a fast scan of the entire ppealing countermeasure of using the worms vec- Internet would be unlikely to attract law enforce- tor as a means by which to disable it ment attention or more than mild comment in the incident response community. However, for attack ers wishing to be especially careful, a randomized 4.2 Permutation Scanning ealthy scan taking several months would be ver unlikely to attract much attention, as most intrusion detection systems are not currently capable of de- Another limitation to very fast infection is the general tecting such low-profile scans. Some portion of the inefficiency of random scanning: many addresses scan would be out of date by the time it was used probed multiple times. Similarly there is no means for a but much of it would not randomly scanning worm to effectively determine when all vulnerable machines are infected Permutation scan- Distributed scanning. An attacker could scan the ning solves these problems by assuming that a worm can Internet using a few dozen to a few thousand detect that a particular target is already infected Iready-compromised"zombies, similar to what DDOS attackers assemble in a fairly routine fash- In a permutation scan, all worms share a common ion. Such distributed scanning has already been pseudo random permutation of the IP address space seen in the wild-Lawrence Berkeley National Such a permutation can be efficiently generate aboratory received 10 during the past year a 32-bit block cipher and a preselected key: simply en- DNS searches. Assemble a list of domains(for ex- crypt an index to get the corresponding address in the mple, by using widely available spam mail lists, or permutation, and decrypt an address to get its index. trolling the address registries). The DNS can then be searched for the IP addresses of mail-servers Any machines infected during the hit-list phase(or lo- ia MX records) or Web servers(by looking for cal subnet scanning) start scanning just after their point in the permutation, working their way through the per mutation, looking for vulnerable machines. Whenever Spiders. For Web server worms(like Code red), the worm sees an already infected machine, it chooses a use Web-crawling techniques similar to search en- new, random start point and proceeds from there. Worn
this obstacle, which we term hit-list scanning. Before the worm is released, the worm author collects a list of say 10,000 to 50,000 potentially vulnerable machines, ideally ones with good network connections. The worm, when released onto an initial machine on this hit-list, begins scanning down the list. When it infects a machine, it divides the hit-list in half, communicating half to the recipient worm, keeping the other half. This quick division ensures that even if only 10–20% of the machines on the hit-list are actually vulnerable, an active worm will quickly go through the hit-list and establish itself on all vulnerable machines in only a few seconds. Although the hit-list may start at 200 kilobytes, it quickly shrinks to nothing during the partitioning. This provides a great benefit in constructing a fast worm by speeding the initial infection. The hit-list needn’t be perfect: a simple list of machines running a particular server type may suffice, although greater accuracy will improve the spread. The hit-list itself can be generated using one or several of the following techniques, prepared well in advance, generally with little fear of detection. • Stealthy scans. Portscans are so common and so widely ignored that even a fast scan of the entire Internet would be unlikely to attract law enforcement attention or more than mild comment in the incident response community. However, for attackers wishing to be especially careful, a randomized stealthy scan taking several months would be very unlikely to attract much attention, as most intrusion detection systems are not currently capable of detecting such low-profile scans. Some portion of the scan would be out of date by the time it was used, but much of it would not. • Distributed scanning. An attacker could scan the Internet using a few dozen to a few thousand already-compromised “zombies,” similar to what DDOS attackers assemble in a fairly routine fashion. Such distributed scanning has already been seen in the wild—Lawrence Berkeley National Laboratory received 10 during the past year. • DNS searches. Assemble a list of domains (for example, by using widely available spam mail lists, or trolling the address registries). The DNS can then be searched for the IP addresses of mail-servers (via MX records) or Web servers (by looking for www.domain.com). • Spiders. For Web server worms (like Code Red), use Web-crawling techniques similar to search engines in order to produce a list of most Internetconnected Web sites. This would be unlikely to attract serious attention. • Public surveys. For many potential targets there may be surveys available listing them, such as the Netcraft survey [Ne02]. • Just listen. Some applications, such as peer-topeer networks, wind up advertising many of their servers. Similarly, many previous worms effectively broadcast that the infected machine is vulnerable to further attack. For example, because of its widespread scanning, during the Code Red I infection it was easy to pick up the addresses of upwards of 300,000 vulnerable IIS servers—because each one came knocking on everyone’s door! Indeed, some individuals produced active countermeasures to Code Red II by exploiting this observation, when combined with the backdoor which Code Red II installs [DA01]. However, it is not a given that future worms will broadcast their presence, and we also note that worms could readily fix the very security holes they exploit (such as is often already observed for attackers performing breakins manually), which undermines the superficially appealing countermeasure of using the worm’s vector as a means by which to disable it. 4.2 Permutation Scanning Another limitation to very fast infection is the general inefficiency of random scanning: many addresses are probed multiple times. Similarly there is no means for a randomly scanning worm to effectively determine when all vulnerable machines are infected. Permutation scanning solves these problems by assuming that a worm can detect that a particular target is already infected. In a permutation scan, all worms share a common pseudo random permutation of the IP address space. Such a permutation can be efficiently generated using a 32-bit block cipher and a preselected key: simply encrypt an index to get the corresponding address in the permutation, and decrypt an address to get its index. Any machines infected during the hit-list phase (or local subnet scanning) start scanning just after their point in the permutation, working their way through the permutation, looking for vulnerable machines. Whenever the worm sees an already infected machine, it chooses a new, random start point and proceeds from there. Worms
infected by permutation scanning would start at a ran- dom point 300.000 This has the effect of providing a self-coordinated, com- prehensive scan while maintaining the benefits of ran- 200.000 dom probing. Each worm looks like it is conducting a random scan, but it attempts to minimize duplication of effort. Any time an instance of the worm, W, encounters already-infected host, it knows that w, the original infector of the host, is already working along the cur rent sequence in the permutation, and is further ahead 0 Hence, there's no need for w to continue working 5678 Time(hours) the current sequence in addition to w K=26,T=5.52 Self-coordination keeps the infection rate high and guar antees an eventual comprehensive scan. Furthermore, it allows the worm to make a local decision that further Figure 6: The spread of a simulated worm capable of 10 scanning is of little benefit. After any particular copy scans/second in a population of 300,000 vulnerable machines of the worm sees several infected machines without dis- and its comparison to the model developed in Section 2. The covering new vulnerable targets, the worm assumes that simulation and theoretical results overlap completely. effectively complete infection has occurred and stops the canning process. lished itself on the network A timer could then induce the worms to wake up, change It may seem that the permutation scanning algorithm is the permutation key to the next one in a prespecified se- spoofable, but only to a very limited degree. If an unin- quence, and begin scanning through the new permuta- fected machine responds to the scan in the same way as tion, starting at its own index and halting when another a worm, by falsely claiming to be infected, it will tem- instance is discovered. This process insures that every porarily protect those machines which exist later in the address would be efficiently rescanned at regular inte vals, detecting any machines which came onto the net However, since the permutation itself changes on ev- or were reinstalled but not patched, greatly increasing a ery rescan, the set of machines protected is constantly worm's staying power. Otherwise, the worms are silent changing. The result is that unless a very large number ind difficult to detect, until they receive attack orders of uninfected machines respond to probes like an actual (see Section 6) worm, the protection is almost nonexistent A further optimization is a partitioned permutation scan In this scheme, the worm has a range of the permutation 4.3 Simulation of a Warhol Worm that it is initially responsible for. When it infects another machine, it reduces its range in half, with the newly in- fected worm taking the other section. When the range A combination of hit-list and permutation scanning can gets below a certain level, it switches to simple permu- create what we term a Warhol worm, capable of attack tation scanning and otherwise behaves like a permuta- ing most vulnerable targets in well under an hour, possi- tion scan. This scheme offers a slight but noticeable bly less than 15 minutes. Hit-list scanning greatly im- increase in scanning efficiency, by dividing up the ini- proves the initial spread, while permutation scanning tial workload using an approximate divide-and-conquer keeps the worm's infection rate high for much longer when compared with random scanning Permutation scanning interacts particularly well with a In order to evaluate the effects of hit-list and permuta- worm which attacks multiple security holes: after de- tion scanning, we wrote a small, abstract simulator of a ciding that the initial exploit is exhausted, the worm re- Warhol worms spread. The simulator assumes complete sets the permutation to its current address changes the connectivity within a 2 entry address space using a permutation key, and exploits the second security hole Thus, even relatively rare secondary holes can be effi- pseudo-random permutation to ddresses to a sub- ciently and quickly scanned once the worm has estab- If a machine is not reachable from an arbit rary point on the external
infected by permutation scanning would start at a random point. This has the effect of providing a self-coordinated, comprehensive scan while maintaining the benefits of random probing. Each worm looks like it is conducting a random scan, but it attempts to minimize duplication of effort. Any time an instance of the worm, W, encounters an already-infected host, it knows that W0 , the original infector of the host, is already working along the current sequence in the permutation, and is further ahead. Hence, there’s no need for W to continue working on the current sequence in addition to W0 . Self-coordination keeps the infection rate high and guarantees an eventual comprehensive scan. Furthermore, it allows the worm to make a local decision that further scanning is of little benefit. After any particular copy of the worm sees several infected machines without discovering new vulnerable targets, the worm assumes that effectively complete infection has occurred and stops the scanning process. A timer could then induce the worms to wake up, change the permutation key to the next one in a prespecified sequence, and begin scanning through the new permutation, starting at its own index and halting when another instance is discovered. This process insures that every address would be efficiently rescanned at regular intervals, detecting any machines which came onto the net or were reinstalled but not patched, greatly increasing a worm’s staying power. Otherwise, the worms are silent and difficult to detect, until they receive attack orders (see Section 6). A further optimization is a partitioned permutation scan. In this scheme, the worm has a range of the permutation that it is initially responsible for. When it infects another machine, it reduces its range in half, with the newly infected worm taking the other section. When the range gets below a certain level, it switches to simple permutation scanning and otherwise behaves like a permutation scan. This scheme offers a slight but noticeable increase in scanning efficiency, by dividing up the initial workload using an approximate divide-and-conquer technique. Permutation scanning interacts particularly well with a worm which attacks multiple security holes: after deciding that the initial exploit is exhausted, the worm resets the permutation to its current address, changes the permutation key, and exploits the second security hole. Thus, even relatively rare secondary holes can be effi- ciently and quickly scanned once the worm has estab- 0 100,000 200,000 300,000 0 1 2 3 4 5 6 7 8 Time (hours) Number of Instances Simulation K = 2.6, T = 5.52 Figure 6: The spread of a simulated worm capable of 10 scans/second in a population of 300,000 vulnerable machines and its comparison to the model developed in Section 2. The simulation and theoretical results overlap completely. lished itself on the network. It may seem that the permutation scanning algorithm is spoofable, but only to a very limited degree. If an uninfected machine responds to the scan in the same way as a worm, by falsely claiming to be infected, it will temporarily protect those machines which exist later in the current permutation from being scanned by the worm. However, since the permutation itself changes on every rescan, the set of machines protected is constantly changing. The result is that unless a very large number of uninfected machines respond to probes like an actual worm, the protection is almost nonexistent. 4.3 Simulation of a Warhol Worm A combination of hit-list and permutation scanning can create what we term a Warhol worm, capable of attacking most vulnerable targets in well under an hour, possibly less than 15 minutes. Hit-list scanning greatly improves the initial spread, while permutation scanning keeps the worm’s infection rate high for much longer when compared with random scanning. In order to evaluate the effects of hit-list and permutation scanning, we wrote a small, abstract simulator of a Warhol worm’s spread. The simulator assumes complete connectivity within a 2 32 entry address space4 using a pseudo-random permutation to map addresses to a sub- 4 In general, the Internet address space isn’t completely connected. If a machine is not reachable from an arbitrary point on the external
300000 200.000 100000 Time(minutes Time(hours) nfected Machines Active worms Figure 7: The spread of three simulated worms in a popu- Figure 8: A closeup of the behavior of the Warhol worm tion of 300,000 vulnerable machines: (i a Code Red-like seen in Figure 7. The infection initially progresses rapidly worm capable of 10 scans/second, (ii) a faster scanning worm effectively all worms are actively scanning the net -but as in- capable of 100 scans/second, and (iii)a Warhol worm, capable fection rates near 100%, many worms have gone dormant, cor- of 100 scans/second, using a 10,000 entry hit-list and permu- rectly concluding that there are few vulnerable machines I tation scanning which gives up when 2 infected machines are maining and should therefore cease scanning discovered without finding a new target. All graphs stop at 99.99% infection of the simulated addres 15 minutes set of vulnerable machines. We used a 32-bit, 6-round Figure 8 shows in more detail the behavior of the Warhol ariant of RC5 to generate all permutations and random strategies. It gets a huge boost from the hit-list during the numbers first few seconds of propagation, quickly establishing it- self on the network and then spreading exponentially. As We can parameterize the simulation in terms of: the the infection exceeds the 50 number of vulnerable machines in the address space, begin recognizing that saturation is occurring and stop scans per second; the time to infect a machine, num- scanning. By the time the graph ends(at 99.99% of ber infected during the hit-list phase; and the type of the simulated population), most of the worms have gone secondary scan(permutation, partitioned permutation, silent, leaving a few remaining worms to finish scannin or random). The simulator assumes multithreaded scan- the last of the address spac To ensure that the simulator produces reasonable re- 4.4 Topological Scanning sults, Figure 6 shows a comparison between the simu lator's output and the model developed in Section 2, for a worm capable of 10 scans/second in a population of An alternative to hit-list scanning is topologically aware 300.000 vulnerable machines. The simulation results fit canning. which uses information contained on the vic- the model for K= 2.6 and T=5.52. This represents a tim machine in order to select new targets. Email worms worm which is slightly faster(less than 50%)than Code Red I ave used this tactic since their inception, as they harvest addresses from their victim in order to find new poten tial targets, as did the Morris worm(necessary because Figure 7 then shows how both faster scanning and the of the very sparse address space when was released) Warhol strategies affect the propagation time. The faster scanning worm(capable of 100 scans/second)reduces the infection time down to under an hour, while the com- Many future active worms could easily apply these tech- bination of hit-list scanning, permutation scanning, and niques during the initial spread, before switching to fast scanning, further reduces infection time to roughly a permutation scan once the known neighbors are ex hausted. An active worm local scanning. to-peer application could easily get a list of peers from
0 100,000 200,000 300,000 0 1 2 3 4 6 7 8 Time (hours) Number of Instances Conventional Fast Scanning Warhol Figure 7: The spread of three simulated worms in a population of 300,000 vulnerable machines: (i) a Code Red-like worm capable of 10 scans/second, (ii) a faster scanning worm capable of 100 scans/second, and (iii) a Warhol worm, capable of 100 scans/second, using a 10,000 entry hit-list and permutation scanning which gives up when 2 infected machines are discovered without finding a new target. All graphs stop at 99.99% infection of the simulated address space. set of vulnerable machines. We used a 32-bit, 6-round variant of RC5 to generate all permutations and random numbers. We can parameterize the simulation in terms of: the number of vulnerable machines in the address space; scans per second; the time to infect a machine; number infected during the hit-list phase; and the type of secondary scan (permutation, partitioned permutation, or random). The simulator assumes multithreaded scanning. To ensure that the simulator produces reasonable results, Figure 6 shows a comparison between the simulator’s output and the model developed in Section 2, for a worm capable of 10 scans/second in a population of 300,000 vulnerable machines. The simulation results fit the model for K = 2.6 and T = 5.52. This represents a worm which is slightly faster (less than 50%) than Code Red I. Figure 7 then shows how both faster scanning and the Warhol strategies affect the propagation time. The faster scanning worm (capable of 100 scans/second) reduces the infection time down to under an hour, while the combination of hit-list scanning, permutation scanning, and fast scanning, further reduces infection time to roughly network, it is usually not reachable directly by a worm except through local scanning. 0 100,000 200,000 300,000 0 2 4 6 8 10 12 14 16 Time (minutes) Infected Machines Infected Machines Active Worms Dormant Worms Figure 8: A closeup of the behavior of the Warhol worm seen in Figure 7. The infection initially progresses rapidly— effectively all worms are actively scanning the net—but as infection rates near 100%, many worms have gone dormant, correctly concluding that there are few vulnerable machines remaining and should therefore cease scanning. 15 minutes. Figure 8 shows in more detail the behavior of the Warhol strategies. It gets a huge boost from the hit-list during the first few seconds of propagation, quickly establishing itself on the network and then spreading exponentially. As the infection exceeds the 50% point, some of the worms begin recognizing that saturation is occurring and stop scanning. By the time the graph ends (at 99.99% of the simulated population), most of the worms have gone silent, leaving a few remaining worms to finish scanning the last of the address space. 4.4 Topological Scanning An alternative to hit-list scanning is topologically aware scanning, which uses information contained on the victim machine in order to select new targets. Email worms have used this tactic since their inception, as they harvest addresses from their victim in order to find new potential targets, as did the Morris worm (necessary because of the very sparse address space when it was released) [Sp89, ER89]. Many future active worms could easily apply these techniques during the initial spread, before switching to a permutation scan once the known neighbors are exhausted. An active worm that attacked a flaw in a peerto-peer application could easily get a list of peers from
a victim and use those peers as the basis of its attack, grammed to divide the list into n blocks, and then to find which makes such applications highly attractive targets and infect the first address in each block(or an especially for worm authors. Although we have yet to see such a chosen high-bandwidth address in that block), and then worm in the wild, these applications must be scrutinized hand the child worm the list of addresses for that block. for security. These applications are also vulnerable to That copy of the worm can then re-iterate the process to contagion worms. as discussed in Section 5 infect everything in its block. A threaded worm could begin infecting hosts before it had received the full host Similarly, a worm attacking web servers could look for list from its parent to work on, to maximize the paral URLs on disk and use these URLs as seed targets as well lelization process, and it could start work on looking for as simply scanning for random targets. Since these are multiple children in parallel known to be valid web servers, this would tend to greatly increase the initial spread by preferentially probing for This design is somewhat fragile if an early copy of the likely targets worm is neutralized very quickly, or infects a site from which it cannot scan out. To mitigate this. the worm copies could overlap in their scanning so that all ad- 4.5 Flash worms dresses were scanned a small number of times with every target address being scanned by different paths through the infection tree. This has the additional side- We further observe that there is a variant of the hit-list effect of removing the need for further parent-to-child strategy that could plausibly result in most of the vul- communication after initial infection occurs nerable servers on the Internet being infected in tens of econds. We term this a fash worm A related design would call for most of the address list to be located in pre-assigned chunks on one or a num- The nub of our observation is that an attacker could plau- ber of high-bandwidth servers that were well-known to sibly obtain a hit-list of most servers with the relevant of service open to the Internet in advance of the release of signment from its parent, and then fetch the address list from there. The server would only have to send out por tions of the list, not the entire list; in principle, it should In addition to the methods already discussed for con- only have to transmit each address in the list once. In ad- structing a hit-list in Section 4. 1, a complete scan of the dition, after the worm has propagated sufficiently that a Internet through an OC-12 connection would complete large number of copies are attempting to fetch their(now quickly. Given a rate of 750,000 TCP SYN packets per quite small) lists, at that point the worm collective could second(the OC-12 provides 622 Mbps, the TCP seg switch to sending around the address list with each new ment takes 40 bytes, and we allow for link-layer fram- infection, rather than having the infectees each contact ing), and that the return traffic is smaller in volume the server than the outbound (it is comprised of either same-sized SYN ACKs or RSTs, smaller ICMPs, or, most often, no This process will result in relatively little wasted effort response at all), it would take roughly 2 hours to scan For example, if the worm had a list of Web servers, and the entire address space. Faster links could of course a zero-day IIs vulnerability, about 26% of the list would scan even faster. Such a brute-force scan would be easily be vulnerable. No server would be probed twice. If within the resources of a nation-state bent on cyberwar 10. then the infection tree for the 3 million vult able servers would be just 7 layers deep Given that an attacker has the determination and fore The spread rate of such a worm would likely be con- sight to assemble a list of all or most Internet connected strained by one of two things. The worm itself is likely addresses with the relevant service(s)open, a worm can to be small( Code red I was about 4 KB, and a highly spread most efficiently by simply attacking addresses on malicious worm could easily be less than 100 KB, even that list. For example, there are about 12.6 million Web allowing for a complex payload). Thus, at the start,the servers on the Internet(according to Netcraft (Ne02), so address list is much larger than the worm itself, and the the size of that particular address list would be 48 MB, propagation of the worm could be limited by the time re- uncompressed. The initial copy of the worm can be pro- quired to transmit the host list out of the initial infection site or servers where it was stored. Since all the children of the infection will have much smaller lists to transmit, nachines that connect to the Internet with variable IP addresses but these nonetheless have vulnerable services open lists are less likely to limit the worm spread
a victim and use those peers as the basis of its attack, which makes such applications highly attractive targets for worm authors. Although we have yet to see such a worm in the wild, these applications must be scrutinized for security. These applications are also vulnerable to contagion worms, as discussed in Section 5. Similarly, a worm attacking web servers could look for URLs on disk and use these URLs as seed targets as well as simply scanning for random targets. Since these are known to be valid web servers, this would tend to greatly increase the initial spread by preferentially probing for likely targets. 4.5 Flash Worms We further observe that there is a variant of the hit-list strategy that could plausibly result in most of the vulnerable servers on the Internet being infected in tens of seconds. We term this a flash worm. The nub of our observation is that an attacker could plausibly obtain a hit-list of most servers with the relevant service open to the Internet in advance of the release of the worm.5 In addition to the methods already discussed for constructing a hit-list in Section 4.1, a complete scan of the Internet through an OC-12 connection would complete quickly. Given a rate of 750,000 TCP SYN packets per second (the OC-12 provides 622 Mbps, the TCP segment takes 40 bytes, and we allow for link-layer framing), and that the return traffic is smaller in volume than the outbound (it is comprised of either same-sized SYN ACKs or RSTs, smaller ICMPs, or, most often, no response at all), it would take roughly 2 hours to scan the entire address space. Faster links could of course scan even faster. Such a brute-force scan would be easily within the resources of a nation-state bent on cyberwarfare. Given that an attacker has the determination and foresight to assemble a list of all or most Internet connected addresses with the relevant service(s) open, a worm can spread most efficiently by simply attacking addresses on that list. For example, there are about 12.6 million Web servers on the Internet (according to Netcraft [Ne02]), so the size of that particular address list would be 48 MB, uncompressed. The initial copy of the worm can be pro- 5Servers behind load balancers create complications here, as do machines that connect to the Internet with variable IP addresses but nonetheless have vulnerable services open. grammed to divide the list into n blocks, and then to find and infect the first address in each block (or an especially chosen high-bandwidth address in that block), and then hand the child worm the list of addresses for that block. That copy of the worm can then re-iterate the process to infect everything in its block. A threaded worm could begin infecting hosts before it had received the full host list from its parent to work on, to maximize the parallelization process, and it could start work on looking for multiple children in parallel. This design is somewhat fragile if an early copy of the worm is neutralized very quickly, or infects a site from which it cannot scan out. To mitigate this, the worm copies could overlap in their scanning so that all addresses were scanned a small number of times, with every target address being scanned by different paths through the infection tree. This has the additional sideeffect of removing the need for further parent-to-child communication after initial infection occurs. A related design would call for most of the address list to be located in pre-assigned chunks on one or a number of high-bandwidth servers that were well-known to the worm. Each copy of the worm would receive an assignment from its parent, and then fetch the address list from there. The server would only have to send out portions of the list, not the entire list; in principle, it should only have to transmit each address in the list once. In addition, after the worm has propagated sufficiently that a large number of copies are attempting to fetch their (now quite small) lists, at that point the worm collective could switch to sending around the address list with each new infection, rather than having the infectees each contact the server. This process will result in relatively little wasted effort. For example, if the worm had a list of Web servers, and a zero-day IIS vulnerability, about 26% of the list would be vulnerable. No server would be probed twice. If n = 10, then the infection tree for the 3 million vulnerable servers would be just 7 layers deep. The spread rate of such a worm would likely be constrained by one of two things. The worm itself is likely to be small (Code Red I was about 4 KB, and a highly malicious worm could easily be less than 100 KB, even allowing for a complex payload). Thus, at the start, the address list is much larger than the worm itself, and the propagation of the worm could be limited by the time required to transmit the host list out of the initial infection site or servers where it was stored. Since all the children of the infection will have much smaller lists to transmit, these later lists are less likely to limit the worm spread