Specialisation_digitale.gif

  The signs numbers  Multiple precision  Floating point 
  Return to the synopsis To contact the author Low of page

Created it, 06/09/09

Update it, 06/09/21

N° Visitors  

apasrule.gif

Reception

6. - PROCEDURES EMPLOYED IN THE NUMERICAL SYSTEMS

The numerical systems, when they are cabled or programmed to carry out certain functions, fill their task with each time they are requested and this indefinitely (or almost), without tiredness, nor lassitude and at very high speed.

They can repeat long procedures, without error, with for only initiatives those which the technician will have provided in their programming.

These systems are thus disciplined but without imagination, which implies that the procedure will be indicated to them with order and method; each stage will have to be dictated to them and peeled.

It appears that the technician must have a perfect knowledge of the problem which its machine will have to only deal with thereafter.

In the case which interests us for the moment, it would be desirable to find a procedure universal for the realization of these operations and according to the operator (designation of the operation), certain stages could be made transparent.

If this solution can appear long, we know that that does not present really a great disadvantage, taking into account the speed to which these calculations are carried out (except for particular cases: calculations of ballistics where the computing speed is of primary importance).

  The register

For approaching these methods, it is still necessary to clarify some points. On this subject, we will encroach on the continuation, while speaking about register.

When one wishes to carry out an operation whose result cannot be found mentally, one takes a paper sheet and one registered there the numbers : the operation is posed.

If one is destined for another urgent task at this time there, this one completed, we return near our paper sheet on which we find the numbers previously recorded.

There is memorizing of information.

If we enter of information a numerical machine, using a keyboard, before carrying out transformations on this information, they should be stored, to put them in memory.

In these machines, these memories are registers (étymologiquement, a register is a book in which one consigns facts or acts which one wants to have the memory).

Elementary information into binary is the “bit” (binary - digit = binary digit). This information is either 0, or 1, which, into positive logic, results in the absence or the presence of a tension.

Consequently, the basic cell of a register must be able to keep in memory in a final way, if no external action intervenes, either an absence of tension (level 0), or the presence of a tension (definite as being equal to level 1).

In fact, it memorizes the numerical value of the bit.

We know that decimal numbers (language human being) when they are expressed into binary (language of the machine) use a greater number of weights, and consequently, their writing is longer.

For example, if one wants to enter the machine of the numbers whose numerical value does not exceed 255, one needs eight basic cells of register into binary whereas one would need three into decimal of them.

In short, a register is a memory in which one can store binary numbers. One their principal characteristics, is the maximum numerical value which they can memorize, one also speaks capacity or length about the register.

You will learn in the next lessons devoted to these registers, how one enters this information, how they remain there and how one reaches it.

For the moment, admit that it is possible.

These registers are very important because they condition the capacity of calculation of the system. Let us imagine that our paper sheet is not enough large to register there numbers of more than three digits, one understands that calculations will quickly be limited.

HIGH OF PAGE 6. 1. - THE SIGNS NUMBERS

In addition, we know that a number is characterized by its absolute numerical value and by its sign.

It is thus necessary to find a method which makes it possible to bind a sign to the binary numerical value.

We will now describe the methods considered :

      The first consists in putting in front of the absolute value of the number a bit of sign.

For a positive number, the bit of sign is 0. If on the contrary it is negative, the bit of sign is 1.

Example :

      The number (+ 43), according to the signed binary representation is noted :

Bit_de_signe_est_0.gif

      The number (- 43) is noted :

Bit_de_signe_est_1.gif

In the case of a numerical system, for example a pocket calculator, the length of the registers is definite and immutable. If they consist of eight cells, the representation of these numbers is as follows :

Huit_cellules.gif 

This method tends to being abandoned with the profit of that of the complement with 2 which we describe further.

      The second method which was employed, does not use the bit of sign in the same way.

The positive numbers are represented with one 0 with the most significant digit.

The negative numbers are represented by the complement with 1 of the positive number corresponding.

The complement intervenes on the bit of sign, one finds, consequently, one 0 for a positive number and one 1 for the negative numbers, with the most significant digit.

An example is given figure 27.

Autre_representation_d_un_nombre_signe.gif

This system is also abandoned because it presents a disadvantage of size: the double expression of the 0.

Indeed, if one deducts positive values towards 0, this one will have as an expression 0. If one deducts negative values towards 0, this one will have as an expression 1.

Figure 28 shows it clearly.

Double_representation_possible_du_0.gif

This method creates an ambiguity whose numerical systems cannot adapt without the use of subterfuges.

      The third method, which spreads, is founded on the complement with 2 (see chapter 4. 3. 4. on the complement with 2).

It consists, for the positive numbers in their representation into normal binary preceded by one 0.

Their opposites, in negative values, are represented by the complement with 2.

The complementation also intervenes on the bit of sign and the negative numbers are preceded by one 1.

Example : 

As shown in the figure 29.

Complement_a_deux.gif

This representation of the negative binary numbers, by the complement with 2 does not pose the ambiguity of the double expression of the zero and will be useful to us in the procedure of obtaining of the result for the operations carried out by the machine.

The use of the relative numbers imposes a reduction in the capacity of the registers, since a cell will be reserved for the sign.

Figure 30 represents some of the numbers relative between (+ 127) and (- 128) used in the numerical machines whose registers comprise eight basic cells, therefore able to store words of eight bits, called bytes.

Registres_a_8_cellules.gif

It is necessary at the time them discussions to specify the method used for the representation of the negative numbers, this is obvious.

In the same way, one will not have to forget to make precede by one the 0 all positive numbers. These two points are very important.

HIGH OF PAGE 6. 2. - MULTIPLE PRECISION

We spoke about words of eight bits or bytes. In the numerical systems, a word, whatever the number of bits, can take the name of “byte” (Anglo-Saxon term).

We have just seen that with a byte, it was possible to represent 256 values (+ 127 with - 128 including 0).

It is obvious that for the majority of calculations, it is very insufficient. It is thus necessary to have recourse to an artifice.

One can extend the number of cells of the registers, but that led to certain problems on the level of the integrated circuits.

One can also use several times eight bits. For example, if the numbers are coded on two bytes, one can represent 65 536 numerical values, which represents the relative numbers of (+ 32 767) with (- 32 768) while passing by 0.

The number thus represented is composed of twice eight bits, the eight bits of the weakest weights constitute the least significant word (M.M.S.) and the eight bits of the strongest weights, the most significant word (M.P.S.).

One also says, the least significant byte (O.M.S.) and the most significant byte (O.P.S.). This way of proceeding using several bytes, takes the multiple name of precision.

When one uses that two bytes, we will say that it is about double precision.

In the calculating machines, this resolution is not yet sufficient. One uses several words or several bytes (the words are not inevitably organized in bytes).

According to the wished resolution, one is brought to employ three or four words, thus the precision is definitely sufficient.

This procedure has as a name : multiple precision.

The multiple precision increases the time of obtaining the result, because the machine, to carry out calculations must call the M.M.S. (the least significant words), carry out the operation with those, store the result and the carryforward, if there exists, then to call the following words, to carry out calculations.

It is understood easily that if the procedure is longer, the result is obtained a little later.

For a binary number, just as into decimal (spoke we in chapter 1), the figure which occupies the row of weight highest A for name : the most significant bit or B.L.P.S.

On the other hand, that which occupies least low row: the least significant bit or B.L.M.S.

HIGH OF PAGE 6. 3. - THE FLOATING POINT

Until now, we spoke only about the integers, it is also necessary to be able to represent the fractional numbers like, for certain cases, the very large numbers.

The floating point is not other than the exponential notation (or scientific notation) and it makes it possible to solve the problem of the representation of the very small numbers to the very large numbers.

These procedures, multiple precision and floating point, will be useful for you when you approach the microprocessors.

For the moment, they are described for memory and because they form part normally of this lesson.

In the decimal system, it is about the notation using the powers of 10.

This notation is composed of a part which one calls the mantissa and one second that one names the exhibitor.

The exhibitor is not other than the weight of the row occupied by the whole part of the mantissa.

Example :

0,00015  is written 1,5      x 10-4

0,005       is written :   5        x 10-3

1246        is written :   1,246 x 103

One can also use following convention by taking again the same examples :

0,00015 Þ 0,15      x 10-3

0,005     Þ 0,5        x 10-2

1246      Þ 0,1246 x 104

All these numbers start with 0 and since they all are resulting from the power of 10, one can adopt the written form very well according to :

0,15       x 10-3 Þ (+ 15) (- 3)

0,5         x 10-2 Þ (+ 5) (- 2)

0,1246 x 104 Þ (+ 1246) (+ 4)

The mantissa M is always lower than 1 and equal or higher than 0,1 :

0,1 £ M < 1

In the numerical systems and in particular with the microprocessors, it does not act any more power of 10, but of the power of 2, since we work into binary.

In these systems using of the words of eight bits or bytes, one can preserve this form of writing by affecting a word for the mantissa and a word for the exhibitor.

In the numerical example, one realizes that with a byte, one will not be able to represent the numerical value 1246, especially if one uses the method of the complement with 2 for the negative values because it does not remain whereas seven bits to express this numerical value.

By using the multiple precision, i.e. while working on several bytes, that becomes possible.

If for example one uses three bytes for the mantissa and his sign, and a byte for the exhibitor and his sign, one can represent the relative numbers within the following limits :

(+ 223 - 1) X 2127 à (- 223) X 2127

Maybe into decimal :

± 0,142 X 1046 ou, ± 142 suivi de 43 zéros.

The values as small as (± 1) x 2-127 can be represented, is into decimal :

± 0,58 X 10-38 ou, ± 0, ... 38 zéros ... 58.

This way of writing the numbers, by keeping the memory of the decimal radix point, allows calculations on very large or very small numbers (fractional).

To summarize, in the numerical systems intended for calculations, one uses for the representation of the relative numbers, the complement with 2 for the negative numbers, the multiple precision and the exponential notation or floating point.

If four bytes are used, three for the mantissa and for the exhibitor, all the numbers will be represented by this same number of bytes, i.e. they all will have the same format.

Quatre_octets.gif

That is to say : 

The operations, in floating point, are subjected to a special procedure.

The multiplication does not raise a difficulty, one multiplies the mantissas between them and one adds the exhibitors.

The addition requires an operation of retiming which consists in making the exhibitors equal in value absolute, which is imperative in this case, because one should add only with the of the same numbers weight.

We will not go more into these details and will know that there are integrated circuits especially designed for the operations in floating point.

Click here for the following lesson or in the synopsis envisaged to this end. Haut de page High of page
Preceding page Following page

 

     

Daniel