COMPUTATIONAL SCIENCE WITH SUMAN: Structure and operations of translators

A translator may formally be defined as a function, whose domain is a source language, and whose range is contained in an object or target language.


       Source language  --------->  Translator  --------->   Target language

       instructions                                          instructions

A little experience with translators will reveal that it is rarely considered part of the translator's function to execute the algorithm expressed by the source, merely to change its representation from one form to another. In fact, at least three languages are involved in the development of translators: the source language to be translated, the object or target language to be generated, and the host language to be used for implementing the translator. If the translation takes place in several stages, there may even be other, intermediate, languages. Most of these - and, indeed, the host language and object languages themselves - usually remain hidden from a user of the source language.

2.1 T-diagrams

A useful notation for describing a computer program, particularly a translator, uses so-called T-diagrams, examples of which are shown in Figure 2.1.


      .-------------------------------------------------------.
|                      Program Name                     |

      |                                                       |
| Data inputs       ---------------->      Data outputs |
      `------------------.                 .------------------'

                         `-----------------'
|  Implementation |
                         |    Language     |

      .-------------------------------------------------------.

      |                                                       |
|                    Translator Name                    |
      | Source Language ------------------->  Target Language |
      `------------------.    Translator   .------------------'

               |               TPC.EXE               |
|       Host      |
                         |     Language    |
                         `-----------------'

               .-------------------------------------.

                          |               |
| Turbo Pascal  ------->  8086 M-code |
               |                                     |
               `----------.               .----------'
                          |  8086 M-code  |

              (c) A Turbo Pascal compiler for an MS-DOS system
`---------------'


  Figure 2.1  T-diagrams.  (a) A general program (b) a general translator

We shall use the notation "M-code" to stand for "machine code" in these diagrams. Translation itself is represented by standing the T on a machine, and placing the source program and object program on the left and right arms, as depicted in Figure 2.2.


               .-------------------------------------.
|               TPC.EXE               |

   PROG.PAS    | Turbo Pascal ---------> 8086 M-code |  PROG.EXE
|                                     |
               `----------.               .----------'

                          `---------------'
|  8086 M-code  |
                          |               |
                          .---------------.

   Figure 2.2  A Turbo Pascal compilation on an 80x86 machine
| 80x86 Machine |
                          `---------------'

We can also regard this particular combination as depicting an abstract machine (sometimes called a virtual machine), whose aim in life is to convert Turbo Pascal source programs into their 8086 machine code equivalents.

T-diagrams were first introduced by Bratman (1961). They were further refined by Earley and Sturgis (1970), and are also used in the books by Bennett (1990), Watt (1993), and Aho, Sethi and Ullman (1986).

2.2 Classes of translator

It is common to distinguish between several well-established classes of translator:

The term assembler is usually associated with those translators that map low-level language instructions into machine code which can then be executed directly. Individual source language statements usually map one-for-one to machine-level instructions.
The term macro-assembler is also associated with those translators that map low-level language instructions into machine code, and is a variation on the above. Most source language statements map one- for-one into their target language equivalents, but some macro statements map into a sequence of machine- level instructions - effectively providing a text replacement facility, and thereby extending the assembly language to suit the user. (This is not to be confused with the use of procedures or other subprograms to "extend" high-level languages, because the method of implementation is usually very different.)
The term compiler is usually associated with those translators that map high-level language instructions into machine code which can then be executed directly. Individual source language statements usually map into many machine-level instructions.
The term pre-processor is usually associated with those translators that map a superset of a high-level language into the original high-level language, or that perform simple text substitutions before translation takes place. The best-known pre-processor is probably that which forms an integral part of implementations of the language C, and which provides many of the features that contribute to the widely- held perception that C is the only really portable language.
The term high-level translator is often associated with those translators that map one high-level language into another high-level language - usually one for which sophisticated compilers already exist on a range of machines. Such translators are particularly useful as components of a two-stage compiling system, or in assisting with the bootstrapping techniques to be discussed shortly.
The terms decompiler and disassembler refer to translators which attempt to take object code at a low level and regenerate source code at a higher level. While this can be done quite successfully for the production of assembler level code, it is much more difficult when one tries to recreate source code originally written in, say, Pascal.

Many translators generate code for their host machines. These are called self-resident translators. Others, known as cross-translators, generate code for machines other than the host machine. Cross-translators are often used in connection with microcomputers, especially in embedded systems, which may themselves be too small to allow self-resident translators to operate satisfactorily. Of course, cross-translation introduces additional problems in connection with transferring the object code from the donor machine to the machine that is to execute the translated program, and can lead to delays and frustration in program development.

Structure and operations of translators

2.1 T-diagrams

2.2 Classes of translator

1 comment: