▲ A PP E N D IX Assemblers, Linkers, and the SPIM Simulator James R.Larus Microsoft Research Microsoft Fear of serious injury cannot alone justify suppression of free speech and assembly. Louis Brandeis Whitney v.California,1927
A Fear of serious injury cannot alone justify suppression of free speech and assembly. Louis Brandeis Whitney v. California, 1927 Assemblers, Linkers, and the SPIM Simulator James R. Larus Microsoft Research Microsoft APPENDIX
A.1 Introduction A-3 A.2 Assemblers A-10 A.3 Linkers A-18 A.4 Loading A-19 A.5 Memory Usage A-20 A.6 Procedure Call Convention A-22 A.7 Exceptions and Interrupts A-33 A.8 Input and Output A-38 A.9 SPIM A-40 A.10 MIPS R2000 Assembly Language A-45 A.11 Concluding Remarks A-81 A.12 Exercises A-82 A.1 Introduction Encoding instructions as binary numbers is natural and efficient for computers. Humans,however,have a great deal of difficulty understanding and manipulating these numbers.People read and write symbols (words)much better than long sequences of digits.Chapter 2 showed that we need not choose between numbers and words because computer instructions can be represented in many ways. Humans can write and read symbols,and computers can execute the equivalent binary numbers.This appendix describes the process by which a human-readable program is translated into a form that a computer can execute,provides a few hints about writing assembly programs,and explains how to run these programs on SPIM,a simulator that executes MIPS programs.UNIX,Windows,and Mac OS X versions of the SPIM simulator are available on the CD. Assembly language is the symbolic representation of a computer's binary encoding-machine language.Assembly language is more readable than machine machine language Binary rep- resentation used for communi- language because it uses symbols instead of bits.The symbols in assembly lan- cation within a computer guage name commonly occurring bit patterns,such as opcodes and register speci- system. fiers,so people can read and remember them.In addition,assembly language
A.1 Introduction A-3 A.2 Assemblers A-10 A.3 Linkers A-18 A.4 Loading A-19 A.5 Memory Usage A-20 A.6 Procedure Call Convention A-22 A.7 Exceptions and Interrupts A-33 A.8 Input and Output A-38 A.9 SPIM A-40 A.10 MIPS R2000 Assembly Language A-45 A.11 Concluding Remarks A-81 A.12 Exercises A-82 Encoding instructions as binary numbers is natural and efficient for computers. Humans, however, have a great deal of difficulty understanding and manipulating these numbers. People read and write symbols (words) much better than long sequences of digits. Chapter 2 showed that we need not choose between numbers and words because computer instructions can be represented in many ways. Humans can write and read symbols, and computers can execute the equivalent binary numbers. This appendix describes the process by which a human-readable program is translated into a form that a computer can execute, provides a few hints about writing assembly programs, and explains how to run these programs on SPIM, a simulator that executes MIPS programs. UNIX, Windows, and Mac OS X versions of the SPIM simulator are available on the CD. Assembly language is the symbolic representation of a computer’s binary encoding—machine language. Assembly language is more readable than machine language because it uses symbols instead of bits. The symbols in assembly language name commonly occurring bit patterns, such as opcodes and register speci- fiers, so people can read and remember them. In addition, assembly language A.1 Introduction A.1 machine language Binary representation used for communication within a computer system
A-4 Appendix A Assemblers,Linkers,and the SPIM Simulator Source Object Assembler file file Source Object Executable file Assembler file Linker file Source Assembler Object Program file file library FIGURE A.1.1 The process that produces an executable file.An assembler translates a file of assembly language into an object file,which is linked with other files and libraries into an executable file. permits programmers to use labels to identify and name particular memory words that hold instructions or data. assembler A program that A tool called an assembler translates assembly language into binary instruc- translates a symbolic version of tions.Assemblers provide a friendlier representation than a computer's 0s and 1s an instruction into the binary that simplifies writing and reading programs.Symbolic names for operations and version. locations are one facet of this representation.Another facet is programming facili- macro Apattern-matching and ties that increase a program's clarity.For example,macros,discussed in replacement facility that pro- Section A.2,enable a programmer to extend the assembly language by defining vides a simple mechanism to new operations. name a frequently used An assembler reads a single assembly language source file and produces an sequence of instructions. object file containing machine instructions and bookkeeping information that helps combine several object files into a program.Figure A.1.1 illustrates how a program is built.Most programs consist of several files-also called modules- that are written,compiled,and assembled independently.A program may also use prewritten routines supplied in a program library.A module typically con- tains references to subroutines and data defined in other modules and in librar- unresolved reference A refer- ies.The code in a module cannot be executed when it contains unresolved ence that requires more references to labels in other object files or libraries.Another tool,called a information from an outside linker,combines a collection of object and library files into an executable file, source in order to be complete which a computer can run. linker Also called link editor.A To see the advantage of assembly language,consider the following sequence systems program that combines of figures,all of which contain a short subroutine that computes and prints the independently assembled sum of the squares of integers from 0 to 100.Figure A.1.2 shows the machine machine language programs and resolves all undefined labels into language that a MIPS computer executes.With considerable effort,you could an executable file. use the opcode and instruction format tables in Chapter 2 to translate the instructions into a symbolic program similar to Figure A.1.3.This form of the
A-4 Appendix A Assemblers, Linkers, and the SPIM Simulator permits programmers to use labels to identify and name particular memory words that hold instructions or data. A tool called an assembler translates assembly language into binary instructions. Assemblers provide a friendlier representation than a computer’s 0s and 1s that simplifies writing and reading programs. Symbolic names for operations and locations are one facet of this representation. Another facet is programming facilities that increase a program’s clarity. For example, macros, discussed in Section A.2, enable a programmer to extend the assembly language by defining new operations. An assembler reads a single assembly language source file and produces an object file containing machine instructions and bookkeeping information that helps combine several object files into a program. Figure A.1.1 illustrates how a program is built. Most programs consist of several files—also called modules— that are written, compiled, and assembled independently. A program may also use prewritten routines supplied in a program library. A module typically contains references to subroutines and data defined in other modules and in libraries. The code in a module cannot be executed when it contains unresolved references to labels in other object files or libraries. Another tool, called a linker, combines a collection of object and library files into an executable file, which a computer can run. To see the advantage of assembly language, consider the following sequence of figures, all of which contain a short subroutine that computes and prints the sum of the squares of integers from 0 to 100. Figure A.1.2 shows the machine language that a MIPS computer executes. With considerable effort, you could use the opcode and instruction format tables in Chapter 2 to translate the instructions into a symbolic program similar to Figure A.1.3. This form of the FIGURE A.1.1 The process that produces an executable file. An assembler translates a file of assembly language into an object file, which is linked with other files and libraries into an executable file. Object file Source file Assembler Assembler Linker Assembler Program library Object file Object file Source file Source file Executable file assembler A program that translates a symbolic version of an instruction into the binary version. macro A pattern-matching and replacement facility that provides a simple mechanism to name a frequently used sequence of instructions. unresolved reference A reference that requires more information from an outside source in order to be complete. linker Also called link editor. A systems program that combines independently assembled machine language programs and resolves all undefined labels into an executable file
A.1 Introduction A-5 00100111101111011111111111100000 10101111101111110000000000010100 10101111101001000000000000100000 10101111101001010000000000100100 10101111101000000000000000011000 10101111101000000000000000011100 10001111101011100000000000011100 10001111101110000000000000011000 00000001110011100000000000011001 00100101110010000000000000000001 00101001000000010000000001100101 10101111101010000000000000011100 00000000000000000111100000010010 898986889889999899 10101111101110010000000000011000 00111100000001000001000000000000 10001111101001010000000000011000 00001100000100000000000011101100 00100100100001000000010000110000 10001111101111110000000000010100 00100111101111010000000000100000 00000011111000000000000000001000 00000000000000000001000000100001 FIGURE A.1.2 MIPS machine language code for a routine to compute and print the sum of the squares of integers between 0 and 100. routine is much easier to read because operations and operands are written with symbols,rather than with bit patterns.However,this assembly language is still difficult to follow because memory locations are named by their address,rather than by a symbolic label. Figure A.1.4 shows assembly language that labels memory addresses with mne- monic names.Most programmers prefer to read and write this form.Names that begin with a period,for example.data and globl,are assembler directives assembler directive An opera- that tell the assembler how to translate a program but do not produce machine tion that tells the assembler how instructions.Names followed by a colon,such as str or main,are labels that to translate a program but does name the next memory location.This program is as readable as most assembly not produce machine instruc- language programs(except for a glaring lack of comments),but it is still difficult tions;always begins with a period. to follow because many simple operations are required to accomplish simple tasks and because assembly language's lack of control flow constructs provides few hints about the program's operation. By contrast,the C routine in Figure A.1.5 is both shorter and clearer since vari- ables have mnemonic names and the loop is explicit rather than constructed with branches.In fact,the C routine is the only one that we wrote.The other forms of the program were produced by a C compiler and assembler. In general,assembly language plays two roles(see Figure A.1.6).The first role is the output language of compilers.A compiler translates a program written in a
A.1 Introduction A-5 routine is much easier to read because operations and operands are written with symbols, rather than with bit patterns. However, this assembly language is still difficult to follow because memory locations are named by their address, rather than by a symbolic label. Figure A.1.4 shows assembly language that labels memory addresses with mnemonic names. Most programmers prefer to read and write this form. Names that begin with a period, for example .data and .globl, are assembler directives that tell the assembler how to translate a program but do not produce machine instructions. Names followed by a colon, such as str or main, are labels that name the next memory location. This program is as readable as most assembly language programs (except for a glaring lack of comments), but it is still difficult to follow because many simple operations are required to accomplish simple tasks and because assembly language’s lack of control flow constructs provides few hints about the program’s operation. By contrast, the C routine in Figure A.1.5 is both shorter and clearer since variables have mnemonic names and the loop is explicit rather than constructed with branches. In fact, the C routine is the only one that we wrote. The other forms of the program were produced by a C compiler and assembler. In general, assembly language plays two roles (see Figure A.1.6). The first role is the output language of compilers. A compiler translates a program written in a 00100111101111011111111111100000 10101111101111110000000000010100 10101111101001000000000000100000 10101111101001010000000000100100 10101111101000000000000000011000 10101111101000000000000000011100 10001111101011100000000000011100 10001111101110000000000000011000 00000001110011100000000000011001 00100101110010000000000000000001 00101001000000010000000001100101 10101111101010000000000000011100 00000000000000000111100000010010 00000011000011111100100000100001 00010100001000001111111111110111 10101111101110010000000000011000 00111100000001000001000000000000 10001111101001010000000000011000 00001100000100000000000011101100 00100100100001000000010000110000 10001111101111110000000000010100 00100111101111010000000000100000 00000011111000000000000000001000 00000000000000000001000000100001 FIGURE A.1.2 MIPS machine language code for a routine to compute and print the sum of the squares of integers between 0 and 100. assembler directive An operation that tells the assembler how to translate a program but does not produce machine instructions; always begins with a period
A-6 Appendix A Assemblers,Linkers,and the SPIM Simulator addiu $29.$29.-32 $31,20($29) Sw $4, 32($29) $5. 36($29) SW $0. 24($29) $0 28($29) $14,28($29) 1w $24,24($29) multu $14.$14 addiu $8, $14,1 slti $1, $8.101 SW $8, 28($29) mflo $15 addu $25,$24,$15 bne $1, $0.-9 SW $25.24($29) lui $4. 4096 1w $5. 24($29) jal 1048812 addiu $4, $4,1072 lw $31,20($29) addiu $29. $29,32 jr $31 move $2, $0 FIGURE A.1.3 The same routine written in assembly language.However,the code for the rou- tine does not label registers or memory locations nor include comments. high-level language (such as C or Pascal)into an equivalent program in machine source language The high- or assembly language.The high-level language is called the source language,and level language in which a pro- the compiler's output is its target language. gram is originally written. Assembly language's other role is as a language in which to write programs. This role used to be the dominant one.Today,however,because of larger main memories and better compilers,most programmers write in a high-level language and rarely,if ever,see the instructions that a computer executes.Nevertheless, assembly language is still important to write programs in which speed or size are critical or to exploit hardware features that have no analogues in high-level lan- guages. Although this appendix focuses on MIPS assembly language,assembly pro- gramming on most other machines is very similar.The additional instructions and address modes in CISC machines,such as the VAX,can make assembly pro- grams shorter but do not change the process of assembling a program or provide assembly language with the advantages of high-level languages such as type- checking and structured control flow
A-6 Appendix A Assemblers, Linkers, and the SPIM Simulator high-level language (such as C or Pascal) into an equivalent program in machine or assembly language. The high-level language is called the source language, and the compiler’s output is its target language. Assembly language’s other role is as a language in which to write programs. This role used to be the dominant one. Today, however, because of larger main memories and better compilers, most programmers write in a high-level language and rarely, if ever, see the instructions that a computer executes. Nevertheless, assembly language is still important to write programs in which speed or size are critical or to exploit hardware features that have no analogues in high-level languages. Although this appendix focuses on MIPS assembly language, assembly programming on most other machines is very similar. The additional instructions and address modes in CISC machines, such as the VAX, can make assembly programs shorter but do not change the process of assembling a program or provide assembly language with the advantages of high-level languages such as typechecking and structured control flow. addiu $29, $29, -32 sw $31, 20($29) sw $4, 32($29) sw $5, 36($29) sw $0, 24($29) sw $0, 28($29) lw $14, 28($29) lw $24, 24($29) multu $14, $14 addiu $8, $14, 1 slti $1, $8, 101 sw $8, 28($29) mflo $15 addu $25, $24, $15 bne $1, $0, -9 sw $25, 24($29) lui $4, 4096 lw $5, 24($29) jal 1048812 addiu $4, $4, 1072 lw $31, 20($29) addiu $29, $29, 32 jr $31 move $2, $0 FIGURE A.1.3 The same routine written in assembly language. However, the code for the routine does not label registers or memory locations nor include comments. source language The highlevel language in which a program is originally written
A.1 Introduction A-7 text .align .globl main main: subu $sp,$sp,32 SW sra. 20($sp) sd $a0.32($sp) SW $0 24($sp) $0. 28($sp)】 100p: lw $t6.28($sp) mul $t7,$t6,$t6 1w $t8,24($sp) addu $t9.$t8,$t7 SW $t9.24($sp) addu $t0.$t6,1 SW $t0.28($sD) ble $t0.100.100p la sa0.str lw $a1.24($sp) jal printf move $v0,$0 lw $ra,20($sp】 addu $sp,$sp,32 sra data .align 0 str: .asciiz "The sum from 0..100 is &d\n" FIGURE A.1.4 The same routine written in assembly language with labels,but no com- ments.The commands that start with periods are assembler directives (see pages A-47-A-49)..text indicates that succeeding lines contain instructions..data indicates that they contain data..al i gn n indicates that the items on the succeeding lines should be aligned on a 2"byte boundary.Hence,.al ign 2 means the next item should be on a word boundary..globl ma in declares that ma i n is a global sym- bol that should be visible to code stored in other files.Finally,.asci iz stores a null-terminated string in memory. When to Use Assembly Language The primary reason to program in assembly language,as opposed to an available high-level language,is that the speed or size of a program is critically important. For example,consider a computer that controls a piece of machinery,such as a car's brakes.A computer that is incorporated in another device,such as a car,is called an embedded computer.This type of computer needs to respond rapidly and predictably to events in the outside world.Because a compiler introduces uncer-
A.1 Introduction A-7 When to Use Assembly Language The primary reason to program in assembly language, as opposed to an available high-level language, is that the speed or size of a program is critically important. For example, consider a computer that controls a piece of machinery, such as a car’s brakes. A computer that is incorporated in another device, such as a car, is called an embedded computer. This type of computer needs to respond rapidly and predictably to events in the outside world. Because a compiler introduces uncer- .text .align 2 .globl main main: subu $sp, $sp, 32 sw $ra, 20($sp) sd $a0, 32($sp) sw $0, 24($sp) sw $0, 28($sp) loop: lw $t6, 28($sp) mul $t7, $t6, $t6 lw $t8, 24($sp) addu $t9, $t8, $t7 sw $t9, 24($sp) addu $t0, $t6, 1 sw $t0, 28($sp) ble $t0, 100, loop la $a0, str lw $a1, 24($sp) jal printf move $v0, $0 lw $ra, 20($sp) addu $sp, $sp, 32 jr $ra .data .align 0 str: .asciiz "The sum from 0 .. 100 is %d\n" FIGURE A.1.4 The same routine written in assembly language with labels, but no comments. The commands that start with periods are assembler directives (see pages A-47–A-49). .text indicates that succeeding lines contain instructions. .data indicates that they contain data. .align n indicates that the items on the succeeding lines should be aligned on a 2n byte boundary. Hence, .align 2 means the next item should be on a word boundary. .globl main declares that main is a global symbol that should be visible to code stored in other files. Finally, .asciiz stores a null-terminated string in memory
A-8 Appendix A Assemblers,Linkers,and the SPIM Simulator #include int main (int argc.char *argv[]) int i: int sum 0: for (i 0:i <100:ii 1)sum sum +ii: printf ("The sum from 0 .100 is %d\n",sum): FIGURE A.1.5 The routine written in the C programming language. High-level language program Program Compiler Linker Computer Assembly language program FIGURE A.1.6 Assembly language either is written by a programmer or is the output of a compiler. tainty about the time cost of operations,programmers may find it difficult to ensure that a high-level language program responds within a definite time inter- val-say,1 millisecond after a sensor detects that a tire is skidding.An assembly language programmer,on the other hand,has tight control over which instruc- tions execute.In addition,in embedded applications,reducing a program's size, so that it fits in fewer memory chips,reduces the cost of the embedded computer. A hybrid approach,in which most of a program is written in a high-level lan- guage and time-critical sections are written in assembly language,builds on the strengths of both languages.Programs typically spend most of their time execut- ing a small fraction of the program's source code.This observation is just the principle of locality that underlies caches(see Section 7.2 in Chapter 7). Program profiling measures where a program spends its time and can find the time-critical parts of a program.In many cases,this portion of the program can be made faster with better data structures or algorithms.Sometimes,however,sig- nificant performance improvements only come from recoding a critical portion of a program in assembly language
A-8 Appendix A Assemblers, Linkers, and the SPIM Simulator tainty about the time cost of operations, programmers may find it difficult to ensure that a high-level language program responds within a definite time interval—say, 1 millisecond after a sensor detects that a tire is skidding. An assembly language programmer, on the other hand, has tight control over which instructions execute. In addition, in embedded applications, reducing a program’s size, so that it fits in fewer memory chips, reduces the cost of the embedded computer. A hybrid approach, in which most of a program is written in a high-level language and time-critical sections are written in assembly language, builds on the strengths of both languages. Programs typically spend most of their time executing a small fraction of the program’s source code. This observation is just the principle of locality that underlies caches (see Section 7.2 in Chapter 7). Program profiling measures where a program spends its time and can find the time-critical parts of a program. In many cases, this portion of the program can be made faster with better data structures or algorithms. Sometimes, however, significant performance improvements only come from recoding a critical portion of a program in assembly language. #include int main (int argc, char *argv[]) { int i; int sum = 0; for (i = 0; i <= 100; i = i + 1) sum = sum + i * i; printf ("The sum from 0 .. 100 is %d\n", sum); } FIGURE A.1.5 The routine written in the C programming language. FIGURE A.1.6 Assembly language either is written by a programmer or is the output of a compiler. Program Compiler Assembler Linker Computer High-level language program Assembly language program
A.1 Introduction A-9 This improvement is not necessarily an indication that the high-level language's compiler has failed.Compilers typically are better than programmers at producing uniformly high-quality machine code across an entire program.Pro- grammers,however,understand a program's algorithms and behavior at a deeper level than a compiler and can expend considerable effort and ingenuity improving small sections of the program.In particular,programmers often consider several procedures simultaneously while writing their code.Compilers typically compile each procedure in isolation and must follow strict conventions governing the use of registers at procedure boundaries.By retaining commonly used values in regis- ters,even across procedure boundaries,programmers can make a program run faster. Another major advantage of assembly language is the ability to exploit special- ized instructions,for example,string copy or pattern-matching instructions. Compilers,in most cases,cannot determine that a program loop can be replaced by a single instruction.However,the programmer who wrote the loop can replace it easily with a single instruction. Currently,a programmer's advantage over a compiler has become difficult to maintain as compilation techniques improve and machines'pipelines increase in complexity (Chapter 6). The final reason to use assembly language is that no high-level language is available on a particular computer.Many older or specialized computers do not have a compiler,so a programmer's only alternative is assembly language. Drawbacks of Assembly Language Assembly language has many disadvantages that strongly argue against its wide- spread use.Perhaps its major disadvantage is that programs written in assembly language are inherently machine-specific and must be totally rewritten to run on another computer architecture.The rapid evolution of computers discussed in Chapter 1 means that architectures become obsolete.An assembly language pro- gram remains tightly bound to its original architecture,even after the computer is eclipsed by new,faster,and more cost-effective machines. Another disadvantage is that assembly language programs are longer than the equivalent programs written in a high-level language.For example,the C program in Figure A.1.5 is 11 lines long,while the assembly program in Figure A.1.4 is 31 lines long.In more complex programs,the ratio of assembly to high-level lan- guage (its expansion factor)can be much larger than the factor of three in this example.Unfortunately,empirical studies have shown that programmers write roughly the same number of lines of code per day in assembly as in high-level lan- guages.This means that programmers are roughly x times more productive in a high-level language,where x is the assembly language expansion factor
A.1 Introduction A-9 This improvement is not necessarily an indication that the high-level language’s compiler has failed. Compilers typically are better than programmers at producing uniformly high-quality machine code across an entire program. Programmers, however, understand a program’s algorithms and behavior at a deeper level than a compiler and can expend considerable effort and ingenuity improving small sections of the program. In particular, programmers often consider several procedures simultaneously while writing their code. Compilers typically compile each procedure in isolation and must follow strict conventions governing the use of registers at procedure boundaries. By retaining commonly used values in registers, even across procedure boundaries, programmers can make a program run faster. Another major advantage of assembly language is the ability to exploit specialized instructions, for example, string copy or pattern-matching instructions. Compilers, in most cases, cannot determine that a program loop can be replaced by a single instruction. However, the programmer who wrote the loop can replace it easily with a single instruction. Currently, a programmer’s advantage over a compiler has become difficult to maintain as compilation techniques improve and machines’ pipelines increase in complexity (Chapter 6). The final reason to use assembly language is that no high-level language is available on a particular computer. Many older or specialized computers do not have a compiler, so a programmer’s only alternative is assembly language. Drawbacks of Assembly Language Assembly language has many disadvantages that strongly argue against its widespread use. Perhaps its major disadvantage is that programs written in assembly language are inherently machine-specific and must be totally rewritten to run on another computer architecture. The rapid evolution of computers discussed in Chapter 1 means that architectures become obsolete. An assembly language program remains tightly bound to its original architecture, even after the computer is eclipsed by new, faster, and more cost-effective machines. Another disadvantage is that assembly language programs are longer than the equivalent programs written in a high-level language. For example, the C program in Figure A.1.5 is 11 lines long, while the assembly program in Figure A.1.4 is 31 lines long. In more complex programs, the ratio of assembly to high-level language (its expansion factor) can be much larger than the factor of three in this example. Unfortunately, empirical studies have shown that programmers write roughly the same number of lines of code per day in assembly as in high-level languages. This means that programmers are roughly x times more productive in a high-level language, where x is the assembly language expansion factor
A-10 Appendix A Assemblers,Linkers,and the SPIM Simulator To compound the problem,longer programs are more difficult to read and understand and they contain more bugs.Assembly language exacerbates the prob- lem because of its complete lack of structure.Common programming idioms,such as if-then statements and loops,must be built from branches and jumps.The result- ing programs are hard to read because the reader must reconstruct every higher- level construct from its pieces and each instance of a statement may be slightly dif- ferent.For example,look at Figure A.1.4 and answer these questions:What type of loop is used?What are its lower and upper bounds? Elaboration:Compilers can produce machine language directly instead of relying on an assembler.These compilers typically execute much faster than those that invoke an assembler as part of compilation.However,a compiler that generates machine lan- guage must perform many tasks that an assembler normally handles,such as resolving addresses and encoding instructions as binary numbers.The trade-off is between com- pilation speed and compiler simplicity. Elaboration:Despite these considerations,some embedded applications are writ- ten in a high-level language.Many of these applications are large and complex pro- grams that must be extremely reliable.Assembly language programs are longer and more difficult to write and read than high-level language programs.This greatly increases the cost of writing an assembly language program and makes it extremely dif- ficult to verify the correctness of this type of program.In fact,these considerations led the Department of Defense,which pays for many complex embedded systems,to develop Ada,a new high-level language for writing embedded systems. A.2 Assemblers An assembler translates a file of assembly language statements into a file of binary external label Also called glo- machine instructions and binary data.The translation process has two major parts. bal label.A label referring to an The first step is to find memory locations with labels so the relationship between object that can be referenced symbolic names and addresses is known when instructions are translated.The sec- from files other than the one in ond step is to translate each assembly statement by combining the numeric equiva- which it is defined. lents of opcodes,register specifiers,and labels into a legal instruction.As shown in local label A label referring to Figure A.1.1,the assembler produces an output file,called an object file,which con- an object that can be used only tains the machine instructions,data,and bookkeeping information. within the file in which it is An object file typically cannot be executed because it references procedures or defined. data in other files.A label is external (also called global)if the labeled object can
A-10 Appendix A Assemblers, Linkers, and the SPIM Simulator To compound the problem, longer programs are more difficult to read and understand and they contain more bugs. Assembly language exacerbates the problem because of its complete lack of structure. Common programming idioms, such as if-then statements and loops, must be built from branches and jumps. The resulting programs are hard to read because the reader must reconstruct every higherlevel construct from its pieces and each instance of a statement may be slightly different. For example, look at Figure A.1.4 and answer these questions: What type of loop is used? What are its lower and upper bounds? Elaboration: Compilers can produce machine language directly instead of relying on an assembler. These compilers typically execute much faster than those that invoke an assembler as part of compilation. However, a compiler that generates machine language must perform many tasks that an assembler normally handles, such as resolving addresses and encoding instructions as binary numbers. The trade-off is between compilation speed and compiler simplicity. Elaboration: Despite these considerations, some embedded applications are written in a high-level language. Many of these applications are large and complex programs that must be extremely reliable. Assembly language programs are longer and more difficult to write and read than high-level language programs. This greatly increases the cost of writing an assembly language program and makes it extremely dif- ficult to verify the correctness of this type of program. In fact, these considerations led the Department of Defense, which pays for many complex embedded systems, to develop Ada, a new high-level language for writing embedded systems. An assembler translates a file of assembly language statements into a file of binary machine instructions and binary data. The translation process has two major parts. The first step is to find memory locations with labels so the relationship between symbolic names and addresses is known when instructions are translated. The second step is to translate each assembly statement by combining the numeric equivalents of opcodes, register specifiers, and labels into a legal instruction. As shown in Figure A.1.1, the assembler produces an output file, called an object file, which contains the machine instructions, data, and bookkeeping information. An object file typically cannot be executed because it references procedures or data in other files. A label is external (also called global) if the labeled object can A.2 Assemblers A.2 external label Also called global label. A label referring to an object that can be referenced from files other than the one in which it is defined. local label A label referring to an object that can be used only within the file in which it is defined
A.2 Assemblers A-11 be referenced from files other than the one in which it is defined.A label is local if the object can be used only within the file in which it is defined.In most assem- blers,labels are local by default and must be explicitly declared global.Subrou- tines and global variables require external labels since they are referenced from many files in a program.Local labels hide names that should not be visible to other modules-for example,static functions in C,which can only be called by other functions in the same file.In addition,compiler-generated names-for example,a name for the instruction at the beginning of a loop-are local so the compiler need not produce unique names in every file. Local and Global Labels Consider the program in Figure A.1.4 on page A-7.The subroutine has an external (global)label main.It also contains two local labels-1oop and EXAMPLE str-that are only visible with this assembly language file.Finally,the routine also contains an unresolved reference to an external label printf, which is the library routine that prints values.Which labels in Figure A.1.4 could be referenced from another file? Only global labels are visible outside of a file,so the only label that could be referenced from another file is main. ANSWER Since the assembler processes each file in a program individually and in isola- tion,it only knows the addresses of local labels.The assembler depends on another tool,the linker,to combine a collection of object files and libraries into an executable file by resolving external labels.The assembler assists the linker by pro- viding lists of labels and unresolved references. However,even local labels present an interesting challenge to an assembler. Unlike names in most high-level languages,assembly labels may be used before they are defined.In the example,in Figure A.1.4,the label str is used by the 1a instruction before it is defined.The possibility of a forward reference,like this forward reference A label that one,forces an assembler to translate a program in two steps:first find all labels is used before it is defined. and then produce instructions.In the example,when the assembler sees the 1a instruction,it does not know where the word labeled str is located or even whether str labels an instruction or datum
A.2 Assemblers A-11 be referenced from files other than the one in which it is defined. A label is local if the object can be used only within the file in which it is defined. In most assemblers, labels are local by default and must be explicitly declared global. Subroutines and global variables require external labels since they are referenced from many files in a program. Local labels hide names that should not be visible to other modules—for example, static functions in C, which can only be called by other functions in the same file. In addition, compiler-generated names—for example, a name for the instruction at the beginning of a loop—are local so the compiler need not produce unique names in every file. Since the assembler processes each file in a program individually and in isolation, it only knows the addresses of local labels. The assembler depends on another tool, the linker, to combine a collection of object files and libraries into an executable file by resolving external labels. The assembler assists the linker by providing lists of labels and unresolved references. However, even local labels present an interesting challenge to an assembler. Unlike names in most high-level languages, assembly labels may be used before they are defined. In the example, in Figure A.1.4, the label str is used by the la instruction before it is defined. The possibility of a forward reference, like this one, forces an assembler to translate a program in two steps: first find all labels and then produce instructions. In the example, when the assembler sees the la instruction, it does not know where the word labeled str is located or even whether str labels an instruction or datum. Local and Global Labels Consider the program in Figure A.1.4 on page A-7. The subroutine has an external (global) label main. It also contains two local labels—loop and str—that are only visible with this assembly language file. Finally, the routine also contains an unresolved reference to an external label printf, which is the library routine that prints values. Which labels in Figure A.1.4 could be referenced from another file? Only global labels are visible outside of a file, so the only label that could be referenced from another file is main. EXAMPLE ANSWER forward reference A label that is used before it is defined