《数学建模》美赛优秀论文：2007 A O What to Feed a Gerrymander.pdf_大学文库

Control 1421 Page 2 out of 35 Wasted Vote-A vote cast by a member of group A in a district where A is already assured victory so voting has no bearing on the result. In general, the group with more wasted votes is made worse off by a districting plan 1.2 Why is it so bad? Politicians today still gerrymander federal and state-level electoral districts and the public outcry is still strongly negative. Before we set out to eliminate this practice we should discuss why gerrymandering is considered problematic First off, gerrymandering reduces electoral competition within districts since cracking packing makes elections uncompetitive. Further, incumbent representatives are in no real danger of losing elections so they do not campaign vigorously which can lead to lower voter turnout. Exacerbating the problem, incumbents' increased advantage means they are less incentivized to govern based on their constituents' interests so democratic accountability and engagement mutually deteriorate Gerrymandering also presents the practical problem that it is difficult to explain to voters why district shapes are so labyrinthine. Some districts connect demographically similar but geographically distant regions using thin filaments such as the district depicted in Figure 2 "Niceness "of district shape almost always takes a back seat to political and racial concerns when districts are being created. Example: In the 2000 California realignment, Democrats and Republicans united to design incumbent -favoring districts which resulted in the reeled- tion of all of the 153 involved legislators in 2004. How can one argue that this is in voters However, it should be noted that gerrymandering can be considered appropriate in specific situations. For instance, the Arizona Legislature gerrymandered a division between the historically hostile Hopi and Navajo tribes even though the Hopi reservation is entirely surrounded by the Navajo reservation Congressional District 4 Figure 2: A present-day gerrymander, the Illinois 4th congressional district (The two "earmuffs"are connected by a narrow band along Highway 294

Control # 1421 Page 2 out of 35 • Wasted Vote - A vote cast by a member of group A in a district where A is already assured victory so voting has no bearing on the result. In general, the group with more wasted votes is made worse off by a districting plan. 1.2 Why is it so bad? Politicians today still gerrymander federal and state-level electoral districts and the public outcry is still strongly negative. Before we set out to eliminate this practice we should discuss why gerrymandering is considered problematic. First off, gerrymandering reduces electoral competition within districts since cracking/- packing makes elections uncompetitive. Further, incumbent representatives are in no real danger of losing elections so they do not campaign vigorously which can lead to lower voter turnout. Exacerbating the problem, incumbents’ increased advantage means they are less incentivized to govern based on their constituents’ interests so democratic accountability and engagement mutually deteriorate. Gerrymandering also presents the practical problem that it is difficult to explain to voters why district shapes are so labyrinthine. Some districts connect demographically similar but geographically distant regions using thin filaments such as the district depicted in Figure 2. “Niceness” of district shape almost always takes a back seat to political and racial concerns when districts are being created. Example: In the 2000 California realignment, Democrats and Republicans united to design incumbent-favoring districts which resulted in the reelection of all of the 153 involved legislators in 2004. How can one argue that this is in voters’ best interests? However, it should be noted that gerrymandering can be considered appropriate in specific situations. For instance, the Arizona Legislature gerrymandered a division between the historically hostile Hopi and Navajo tribes even though the Hopi reservation is entirely surrounded by the Navajo reservation. Figure 2: A present-day gerrymander, the Illinois 4th congressional district. (The two “earmuffs” are connected by a narrow band along Highway 294.)

Control 1421 Page 3 out of 35 1.3 The legality of gerrymandering We should be clear on one point: though gerrymandering is objectionable to many, it is legal around the country. Interestingly, the Voting Rights Act of 1965 which eliminated poll taxes and other discriminatory voting policies may have inadvertently increased the prevalence of gerrymandering. One interpretation of the Act was that it mandated nondiscriminatory election results which led to a strange reversal of vocabulary where creating "ma jority minority"districts was considered beneficial. These gerrymandered districts were packed with minorities which guaranteed minority representation in Congress However, in Shaw v. Reno(1993), and later in Miller v. Johnson(1995), the Supreme Court ruled that racial/ethnic gerrymanders were unconstitutional. Nevertheless, Hunt v Cromatrie approved of a seemingly racial gerrymandering since the motivation was mostly partisan rather than racial. The recent case League of United Latin American Citizens v Perry (June 2006) upheld the position that states are free to redistrict as often as they like o long as these redistrictings follow are not purely racially motivated. 2 Assumptions and Notation 2.1 What can we consider when districting? We have compiled the following list of possible factors one might consider is districting a State. The list is ranked with factors we consider more important or legitimate at the top 1. Population equality between districts (legally mandated) 2. Continuity of districts(legally mandated, excepting islands) 3. Respect for legal boundaries(counties, city limits, townships) 4. Respect for natural geographic boundaries 5. Compactness of district shapes 6. Respect for man-made boundaries(highways, parks, etc. 7. Respect for socio-economic similarity of constituents 8. Similarity to past district boundaries 9. Partisan political concerns 10. Desire to make districts(un)competitive 11. Racial/ethnic concerns 12. Desire to protect (or unseat)incumbent politicians We consider only the top seven factors in our model. Factors 9-12 are all related to political or racial concerns which our model is specifically designed to ignore. The case SC State Conference of Branches v. Riley(1982)ruled that past districts(Factor 8)are legitimate tool for creating new districts but we choose to ignore past districts since they are heay avily biased by Factors 9-12

Control # 1421 Page 3 out of 35 1.3 The legality of gerrymandering We should be clear on one point: though gerrymandering is objectionable to many, it is legal around the country. Interestingly, the Voting Rights Act of 1965 which eliminated poll taxes and other discriminatory voting policies may have inadvertently increased the prevalence of gerrymandering. One interpretation of the Act was that it mandated nondiscriminatory election results which led to a strange reversal of vocabulary where creating “majorityminority” districts was considered beneficial. These gerrymandered districts were packed with minorities which guaranteed minority representation in Congress. However, in Shaw v. Reno (1993), and later in Miller v. Johnson (1995), the Supreme Court ruled that racial/ethnic gerrymanders were unconstitutional. Nevertheless, Hunt v. Cromatrie approved of a seemingly racial gerrymandering since the motivation was mostly partisan rather than racial. The recent case League of United Latin American Citizens v. Perry (June 2006) upheld the position that states are free to redistrict as often as they like so long as these redistrictings follow are not purely racially motivated. 2 Assumptions and Notation 2.1 What can we consider when districting? We have compiled the following list of possible factors one might consider is districting a State. The list is ranked with factors we consider more important or legitimate at the top. 1. Population equality between districts (legally mandated) 2. Continuity of districts (legally mandated, excepting islands) 3. Respect for legal boundaries (counties, city limits, townships) 4. Respect for natural geographic boundaries 5. Compactness of district shapes 6. Respect for man-made boundaries (highways, parks, etc.) 7. Respect for socio-economic similarity of constituents 8. Similarity to past district boundaries 9. Partisan political concerns 10. Desire to make districts (un)competitive 11. Racial/ethnic concerns 12. Desire to protect (or unseat) incumbent politicians We consider only the top seven factors in our model. Factors 9-12 are all related to political or racial concerns which our model is specifically designed to ignore. The case SC State Conference of Branches v. Riley (1982) ruled that past districts (Factor 8) are a legitimate tool for creating new districts but we choose to ignore past districts since they are heavily biased by Factors 9-12

Control 1421 Page 4 out of 35 2.2 Geography and similar characteristics The US Census Bureau provides a great deal of data on legal, natural, and man-made boundaries as well as socio-economic similarity of regions. In each census, the United States is broken up into several degrees of accuracy, the smallest of which are: blocks(40 people on average), block groups(1500 people), and tracts(4500 people) We follow the practice in Young(1988) by districting based on a maximum level of resolution which in our Case Study( Section 5) is census tracts. Notational note: we refer to the smallest unit of division generally as a tract A reference from the Caliper Corporation describes tracts in the following quotation Census tract boundaries normally follow visible features, but may follow go ernmental unit boundaries and other non-visible features, and they always nest within counties. Census tracts are designed to be relatively homogenous units at the time the users established thenr? s, economic status, and living conditions For these reasons we believe that units at the tracts size(or less)are acceptably small and nous to use as a base unit. Further, tracts are completely contained within counties so we can easily check whether or not a district breaks county integrity 2. 3 Notation Define m to be the number of census tracts. and n the number of districts We denote our districts by D,1≤j≤m, and our tracts by T,1≤l≤m. Denote the set of all tracts by I=TihI<I<m; we call this a State. Denote the set of all districts at a particular time by A=(DiJ1sisn. We call this a partition for the State 2.3.1 Adjacency Define the symmetric relation Tp N Tg for tract pairs(Tp, Tq) which are adjacent. Define the function d(Ti) to be the district to which the tract Ti belongs. We also naturally extend the definition of d to sets of tracts Define the neighbor set of tract Ti by ar(T1)=TpErIT Tp) to be the set of all census acts neighboring Ti, and define ap(ti)=d(ar(i) to be the set of all districts containing ghbors of Ti. Every tract borders at least one other tracts, so ar(Ti) and ap(li) have 2.3.2 Borders Define the border of district D; as aDi=iT E Dilap(Ti)+(Di which is the set tracts in Di that are adjacent to at least one district other than D. The interior of district D; is Ii= DiaDi, the set of census tracts in D; whose neighbors are all in D;. Denote the total number of tracts in district D; as mi= Dil the number of border tracts as b;=laDl The frontier of D; is denoted Fi=UTED, ar(TD)\Di, i.e. the set of all tracts outside of D i that border the boundry tracts of D

Control # 1421 Page 4 out of 35 2.2 Geography and similar characteristics The US Census Bureau provides a great deal of data on legal, natural, and man-made boundaries as well as socio-economic similarity of regions. In each census, the United States is broken up into several degrees of accuracy, the smallest of which are: blocks (40 people on average), block groups (1500 people), and tracts (4500 people). We follow the practice in Young (1988) by districting based on a maximum level of resolution which in our Case Study (Section 5) is census tracts. Notational note: we refer to the smallest unit of division generally as a tract. A reference from the Caliper Corporation describes tracts in the following quotation: Census tract boundaries normally follow visible features, but may follow governmental unit boundaries and other non-visible features, and they always nest within counties. Census tracts are designed to be relatively homogenous units with respect to population characteristics, economic status, and living conditions at the time the users established them. For these reasons we believe that units at the tracts size (or less) are acceptably small and homogenous to use as a base unit. Further, tracts are completely contained within counties so we can easily check whether or not a district breaks county integrity. 2.3 Notation Define m to be the number of census tracts, and n the number of districts. We denote our districts by Di , 1 ≤ j ≤ n, and our tracts by Tl , 1 ≤ l ≤ m. Denote the set of all tracts by Γ = {Tl}1≤l≤m; we call this a State. Denote the set of all districts at a particular time by ∆ = {Di}1≤j≤n. We call this a partition for the State. 2.3.1 Adjacency Define the symmetric relation Tp ∼ Tq for tract pairs (Tp, Tq) which are adjacent. Define the function d(Tl) to be the district to which the tract Tl belongs. We also naturally extend the definition of d to sets of tracts. Define the neighbor set of tract Tl by aT (Tl) = {Tp ∈ Γ|Tl ∼ Tp} to be the set of all census tracts neighboring Tl , and define aD(Tl) = d(aT (Tl)) to be the set of all districts containing neighbors of Tl . Every tract borders at least one other tracts, so aT (Tl) and aD(Tl) have cardinality at least one for all Tl . 2.3.2 Borders Define the border of district Di as ∂Di = {Tl ∈ Di |aD(Tl) 6= {Di}} which is the set tracts in Di that are adjacent to at least one district other than Di . The interior of district Di is Ii = Di\∂Di , the set of census tracts in Di whose neighbors are all in Di . Denote the total number of tracts in district Di as mi = |Di | the number of border tracts as bi = |∂Di |. The frontier of Di is denoted Fi = (∪Tl∈DiaT (Tl))\Di , i.e. the set of all tracts outside of Di that border the boundry tracts of Di

Control 1421 Page 6 out of 35 2. 4 Past models Prior to explaining our modeling approach we would like discuss some previous work in the literature on congressional districting and gerrymandering. We used these papers as guides as we thought about and further refined our algorithm and implementation Cirincione et al.(2000)judge the quality of a districting plan based on equal population preservation of county integrity, and district area compactness. They require that district populations differ by no more than 1% from exact equality in the number of constituents and require point contiguity of the districts. The algorithm constructs districts by picking a random block group(their unit size), then adding additional block groups to the new district until the district population reaches p. At this point they repeat the process starting with a new random block group. Compactness of districts is based on their minimum bounding rectangles and county integrety is encouraged by "randomly"selecting new block groups with a preference for block groups in already inhabited counties Mehrotra et al.(1998) and Garfinkel and Nemhauser(1970) implement a"branch-and- price"method in the optimization step. They first obtain a districting, and optimize over their constraints such that population values are allowed to vary in the final solution of the optimization step. In a final step they split up population units to ensure population equality. They define compactness in a graph-theoretical manner where connected nodes are adjacent tracts. They define the "center"of a district to be the node(tract)with the lowest maximum distance to another other tract. They consider a graph(district )more compact when sum of distances from each node to the center is small We do not use this measure, as it does not uniquely define the center of a graph, and contrary to their claims, does allow for oddly-shaped districts, such as a district whose graph is a star-shaped tree with one tract in the center and many non-contiguous paths emanating from it. In our case study simulations, prior to the incorporation of a compactness factor in the objective function, we often obtain such a tree structure, which is one of the salient features of gerrymandering We also do not use a"branch-and-price"method of optimization. Following suggestions of Nagel(1965) and Kaiser(1966), we employ a local search algorithm in which tracts are swapped between existing districts to maximize some objective function. We describe this process in detail in Section 4 2.5 Measuring compactness The notion of compactness of a planar region has no uniformly accepted definition and research done by Young(1988) suggests that any reasonable measure of compactness fails to work well for certain geographic configurations. He further suggests that any good measure of compactness in such problems should consider the population units(census tracts in our case study) as indivisible units, and therefore that the measure of compactness should be made independently of the predetermined shapes of the population units. We follow this directive in our definition of compactness In fact, the compactness measures given in Young(1998)are not reasonable in the first instance, and do not include any notion of the area of a district, or comparing it to the perimeter. The measures include the maximum total perimeter of a district in a districting, determining the relative height and width of the district, and finding the moment of inertia

Control # 1421 Page 6 out of 35 2.4 Past Models Prior to explaining our modeling approach we would like discuss some previous work in the literature on congressional districting and gerrymandering. We used these papers as guides as we thought about and further refined our algorithm and implementation. Cirincione et al. (2000) judge the quality of a districting plan based on equal population, preservation of county integrity, and district area compactness. They require that district populations differ by no more than 1% from exact equality in the number of constituents, and require point contiguity of the districts. The algorithm constructs districts by picking a random block group (their unit size), then adding additional block groups to the new district until the district population reaches ¯p. At this point they repeat the process starting with a new random block group. Compactness of districts is based on their minimum bounding rectangles and county integrety is encouraged by “randomly” selecting new block groups with a preference for block groups in already inhabited counties. Mehrotra et al. (1998) and Garfinkel and Nemhauser (1970) implement a “branch-andprice” method in the optimization step. They first obtain a districting, and optimize over their constraints such that population values are allowed to vary in the final solution of the optimization step. In a final step they split up population units to ensure population equality. They define compactness in a graph-theoretical manner where connected nodes are adjacent tracts. They define the “center” of a district to be the node (tract) with the lowest maximum distance to another other tract. They consider a graph (district) more compact when sum of distances from each node to the center is small. We do not use this measure, as it does not uniquely define the center of a graph, and, contrary to their claims, does allow for oddly-shaped districts, such as a district whose graph is a star-shaped tree with one tract in the center and many non-contiguous paths emanating from it. In our case study simulations, prior to the incorporation of a compactness factor in the objective function, we often obtain such a tree structure, which is one of the salient features of gerrymandering. We also do not use a “branch-and-price” method of optimization. Following suggestions of Nagel (1965) and Kaiser (1966), we employ a local search algorithm in which tracts are swapped between existing districts to maximize some objective function. We describe this process in detail in Section 4. 2.5 Measuring compactness The notion of compactness of a planar region has no uniformly accepted definition and research done by Young (1988) suggests that any reasonable measure of compactness fails to work well for certain geographic configurations. He further suggests that any good measure of compactness in such problems should consider the population units (census tracts in our case study) as indivisible units, and therefore that the measure of compactness should be made independently of the predetermined shapes of the population units. We follow this directive in our definition of compactness. In fact, the compactness measures given in Young (1998) are not reasonable in the first instance, and do not include any notion of the area of a district, or comparing it to the perimeter. The measures include the maximum total perimeter of a district in a districting, determining the relative height and width of the district, and finding the moment of inertia

Control 1421 Page 7 out of 35 of the district. All of these measures fail to consider both perimeter and area simultaneously, which seems to be a reasonable requirement of a good compactness measure The Isoperimetric Theorem, first proved (non-rigorously) by J. Steiner in 1838, states that the quantity A/P2, given by the ratio of the area A of a planar region(not necessarily continuous) to the square of its perimeter is maximized when the region is circular. The maximum achievable compactness, that of a circle with radius r, is given by 472r2 =4 so we define compactness of a region as the ratio(4T A)/P2. This ratio is bounded within(0, 1] where higher values indicate greater compactness We believe this serves as a good measure of the broadly defined "regularity "of a region which is so important to the study of Congressional districting and gerrymandering. Specifi cally, any shear of factor s applied to a circle decreases the compactness by a factor of s, and any concave region has lower compactness than does its convex hull. It is easy to see that we can make an even stronger statement: the convex hull of a concave region has greater area and smaller perimeter Observe that a square gets close to the optimum, with a compactness of tr N 0.785. This implies that the set of possible compactness values for rectangles is(0, 0.785) since a square is the most compact rectangle 3 The Multi-Seeded growth Model We take a two-stage approach to finding the best districts for a given State. In the multi- Seeded growth Model. referred to as MSgm hereafter. we find an initial allocation of n districts so that the partition has modest levels of population equality and county preser- vation. Our more precise Partition Optimization Model, or POM, edits and improves the rough sketch from MSGM into until it becomes, hopefully, a work of art The reason that our model runs in two phases is simple: speed. Our knee-jerk reaction to the problem was to randomly allocate tracts to the n districts and then optimize by swapping tracts trying to improve some objective function. However, a random initial configuration is so far from the global maximum that the search might take millions of years The MSGM generates a very crude coloring of a State that ensures district contiguity and tries, but does not guarantee, to achieve population equality and county preservation The districts created by MSGM are completely unacceptable for an actual plan but save enormous amounts of computing time for our solution 3. 1 How it works At first, our task seems daunting. How do we allocate n districts equally, even to a rough approximation? Our solution is to grow the n districts simultaneously until they cover the State We start by allocating the entire State to a blank, dummy district Do, and then al- locating n tracts that serve as the initial "seeds" for the final districts. such that each Di, iEl,.,n begins as only a single tract. Now while Dol>0, we take the set S of all

Control # 1421 Page 7 out of 35 of the district. All of these measures fail to consider both perimeter and area simultaneously, which seems to be a reasonable requirement of a good compactness measure. The Isoperimetric Theorem, first proved (non-rigorously) by J. Steiner in 1838, states that the quantity A/P2 , given by the ratio of the area A of a planar region (not necessarily continuous) to the square of its perimeter is maximized when the region is circular. The maximum achievable compactness, that of a circle with radius r, is given by πr2 4π2r 2 = 1 4π so we define compactness of a region as the ratio (4πA)/P2 . This ratio is bounded within (0, 1] where higher values indicate greater compactness. We believe this serves as a good measure of the broadly defined “regularity” of a region which is so important to the study of Congressional districting and gerrymandering. Specifi- cally, any shear of factor s applied to a circle decreases the compactness by a factor of s, and any concave region has lower compactness than does its convex hull. It is easy to see that we can make an even stronger statement: the convex hull of a concave region has greater area and smaller perimeter. Observe that a square gets close to the optimum, with a compactness of 4π 16 ≈ 0.785. This implies that the set of possible compactness values for rectangles is (0, 0.785) since a square is the most compact rectangle. 3 The Multi-Seeded Growth Model We take a two-stage approach to finding the best districts for a given State. In the MultiSeeded Growth Model, referred to as MSGM hereafter, we find an initial allocation of n districts so that the partition has modest levels of population equality and county preservation. Our more precise Partition Optimization Model, or POM, edits and improves the rough sketch from MSGM into until it becomes, hopefully, a work of art. The reason that our model runs in two phases is simple: speed. Our knee-jerk reaction to the problem was to randomly allocate tracts to the n districts and then optimize by swapping tracts trying to improve some objective function. However, a random initial configuration is so far from the global maximum that the search might take millions of years. The MSGM generates a very crude coloring of a State that ensures district contiguity and tries, but does not guarantee, to achieve population equality and county preservation. The districts created by MSGM are completely unacceptable for an actual plan but save enormous amounts of computing time for our solution. 3.1 How it works At first, our task seems daunting. How do we allocate n districts equally, even to a rough approximation? Our solution is to grow the n districts simultaneously until they cover the State. We start by allocating the entire State to a blank, dummy district D0, and then allocating n tracts that serve as the initial “seeds” for the final districts, such that each Di , i ∈ {1, . . . , n} begins as only a single tract. Now while |D0| > 0, we take the set S of all

Control 1421 Page 8 out of 35 possible moves which involve taking a district from Do while preserving contiguity. That is Where M(T, Di, Di)represents a move of tract Ti from D; to Di, corresponding to the exit of Ti from Di and the entrance of Ti into D;. We then sort the moves in S by our heuristic function y (D1,., D,)-R, a function increasing in the desirability of our prospectiv partition. Each move is scored by the heuristic value that would result if we were to accept only that move. We then conclude by performing the moves corresponding to the top 3% of the scored moves in S. Note that this method preserves contiguity, because by definition any Ti E Fi must be contiguous with Di, and thus the Di are contiguous at each step Had we but world enough, and time, we would only perform the best possible move found in S before recalculating the frontier. Even though in the msgm we do not consider moves between two "true"districts (rather, we consider only moves between a true district and the dummy district), the value of a move does not exist in isolation. Consider two distinct districts D; and D;, and two tracts Ti E FinFi and Tk E FinF. The acceptance of M(Tk, Do, Di)alters the heuristic value of every move associated with Fi, which could poten- tially affect the optimality of further moves with Di, such as the acceptance of M(Ti, Do, Di rather than M(Ti, Do, Di). Furthermore, the acceptance of M(Ti, Do, Di) likely expands the size of Fi. Perhaps there is an optimal move opened up in this new frontier that we do not even consider, because we have not even calculated its value It would be better to only perform the best move, but such a strategy was found to be too computationally intensive. We compromise by taking only a small, elite fraction of the moves in each step before recalculating S and the values of its associated moves. In this respect, our approach is analogous to the strategy of modified policy iteration for solving a Markov decision problem. And just as modified policy iteration excels in practice, we found that the tradeoff of possible inefficiency is more than compensated for by the speed gains of the algorithm, especially considering that the solution obtained by MSGM will be further refined by POM In true modified policy iteration, k rounds of value iteration are made in-between policy iterations such that k is fixed. Our MSGM scheme uses a variable number of moves in- between recalculating the value of the frontier. We selected our scheme because it causes us to be delicate in our selections of tract allocations, making moves virtually one-at-a- time, at the beginning and end of the MsGm. By focusing on the beginning and end of the problem, we attempt to avoid having a single district grow too large through possible inefficient allocation Unlike Cirincione(2000)we use random initial seeds weighted by population rather than seeds that are equally spaced around the State. The process works as follows: while there are still random seeds to be selected, we find a candidate initial seed tract Ti in Do. Letting the largest tract in our State have population p, we accept Ti as an initial seed with probability P(Ti/p. We thus select tracts in linear proportion to their population. We found that the MSGm algorithm produces the best initial results when all the districts have the same amount of population, rather than the same number of tracts around which to grow. The geographically optimal placement of five, or fewer, starting seeds in the NYC Metropolitan

Control # 1421 Page 8 out of 35 possible moves which involve taking a district from D0 while preserving contiguity. That is: S = [n i=1 [ Tl∈Fi M(Tl , D0, Di) Where M(Tl , Di , Dj ) represents a move of tract Tl from Di to Dj , corresponding to the exit of Tl from Di and the entrance of Tl into Dj . We then sort the moves in S by our heuristic function Ψ(D1, . . . , Dn) → R, a function increasing in the desirability of our prospective partition. Each move is scored by the heuristic value that would result if we were to accept only that move. We then conclude by performing the moves corresponding to the top 3% of the scored moves in S. Note that this method preserves contiguity, because by definition any Tl ∈ Fi must be contiguous with Di , and thus the Di are contiguous at each step. Had we but world enough, and time, we would only perform the best possible move found in S before recalculating the frontier. Even though in the MSGM we do not consider moves between two “true” districts (rather, we consider only moves between a true district and the dummy district), the value of a move does not exist in isolation. Consider two distinct districts Di and Dj , and two tracts Tl ∈ Fi ∩Fj and Tk ∈ Fi ∩F c j . The acceptance of M(Tk, D0, Di) alters the heuristic value of every move associated with Fi , which could potentially affect the optimality of further moves with Di , such as the acceptance of M(Tl , D0, Di) rather than M(Tl , D0, Dj ). Furthermore, the acceptance of M(Tl , D0, Di) likely expands the size of Fi . Perhaps there is an optimal move opened up in this new frontier that we do not even consider, because we have not even calculated its value. It would be better to only perform the best move, but such a strategy was found to be too computationally intensive. We compromise by taking only a small, elite fraction of the moves in each step before recalculating S and the values of its associated moves. In this respect, our approach is analogous to the strategy of modified policy iteration for solving a Markov decision problem. And just as modified policy iteration excels in practice, we found that the tradeoff of possible inefficiency is more than compensated for by the speed gains of the algorithm, especially considering that the solution obtained by MSGM will be further refined by POM. In true modified policy iteration, k rounds of value iteration are made in-between policy iterations, such that k is fixed. Our MSGM scheme uses a variable number of moves inbetween recalculating the value of the frontier. We selected our scheme because it causes us to be delicate in our selections of tract allocations, making moves virtually one-at-atime, at the beginning and end of the MSGM. By focusing on the beginning and end of the problem, we attempt to avoid having a single district grow too large through possible inefficient allocation. Unlike Cirincione (2000) we use random initial seeds weighted by population rather than seeds that are equally spaced around the State. The process works as follows: while there are still random seeds to be selected, we find a candidate initial seed tract Tl in D0. Letting the largest tract in our State have population ˆp, we accept Tl as an initial seed with probability p(Tl)/pˆ. We thus select tracts in linear proportion to their population. We found that the MSGM algorithm produces the best initial results when all the districts have the same amount of population, rather than the same number of tracts around which to grow. The geographically optimal placement of five, or fewer, starting seeds in the NYC Metropolitan