Machine code

From LQWiki
Jump to navigation Jump to search

Machine Code, sometimes called binary code, is simply a set of instructions and data that directly control the actions of a CPU. A collection of machine code used to accomplish some useful task (or play Counterstrike) is called a program. The machine code is the actual 1s and 0s of binary that the computer understands, as opposed to assembler or high-level language code.

Not every CPU can read every program. Whether a microprocessor can read a given program depends on its instruction set architecture (ISA), which defines the form that individual instructions to a given chip have to take. For example, Intel has its x86 instruction set, while others (such as SGI's R-series microprocessors) are based on the MIPS instruction set.

There exist entire families of chips with the same instruction set, so a program compiled for the x86 architecture will run happily on a Pentium IV, a Cyrix x86, or an AMD Athlon. Even within a single manufacturer's product line, there can be wide variations in clock speed, cache size, or pipeline size, without harming compatability.

An instruction set represents all the discrete operations that can be performed by a processor, including fundamental operations like adding and subtracting integer and floating point numbers, loading data from and storing it back to memory, multiplying and dividing, testing conditions (for example, does variable x = 5?) and branching the program based on the results. Some chips also include more specialized operations (for example, early Pentiums contained "MMX instructions" which were intended to speed up multimedia).

The Life of a Program

Most programs start their lives as a collection of source code; human-readable files that define how a program should operate. There are literally hundreds of programming languages to choose from.

Before a program is compiled into machine code, it may first be translated into object code, which is basically machine code with certain symbols (calls to libraries for example) left undefined. It is up to a separate program called the linker to resolve the unknown symbols so that the program can use them.

Once a program has been compiled and linked, it can be stored in an executable format (common Unix formats include a.out and ELF) on the hard drive. Now the program can be loaded into memory and run.

Note: Some programming languages (notably Java and Microsoft's .NET) compile into bytecode rather than machine code. Instead of targeting a specific type of hardware, the compiler will translate programs into instructions that can be read by a software-based virtual machine.