| |
||
| |
Created it, 06/09/09
Update it, 06/09/21
N° Visitors
6. - PROCEDURES EMPLOYED IN THE NUMERICAL SYSTEMS
The numerical systems, when they are cabled or programmed to carry out certain functions, fill their task with each time they are requested and this indefinitely (or almost), without tiredness, nor lassitude and at very high speed.
They can repeat long procedures, without error, with for only initiatives those which the technician will have provided in their programming.
These systems are thus disciplined but without imagination, which implies that the procedure will be indicated to them with order and method; each stage will have to be dictated to them and peeled.
It appears that the technician must have a perfect knowledge of the problem which its machine will have to only deal with thereafter.
In the case which interests us for the moment, it would be desirable to find a procedure universal for the realization of these operations and according to the operator (designation of the operation), certain stages could be made transparent.
If this solution can appear long, we know that that does not present really a great disadvantage, taking into account the speed to which these calculations are carried out (except for particular cases: calculations of ballistics… where the computing speed is of primary importance).
The register
For approaching these methods, it is still necessary to clarify some points. On this subject, we will encroach on the continuation, while speaking about register.
When one wishes to carry out an operation whose result cannot be found mentally, one takes a paper sheet and one registered there the numbers : the operation is posed.
If one is destined for another urgent task at this time there, this one completed, we return near our paper sheet on which we find the numbers previously recorded.
There is memorizing of information.
If we enter of information a numerical machine, using a keyboard, before carrying out transformations on this information, they should be stored, to put them in memory.
In these machines, these memories are registers (étymologiquement, a register is a book in which one consigns facts or acts which one wants to have the memory).
Elementary information into binary is the “bit” (binary - digit = binary digit). This information is either 0, or 1, which, into positive logic, results in the absence or the presence of a tension.
Consequently, the basic cell of a register must be able to keep in memory in a final way, if no external action intervenes, either an absence of tension (level 0), or the presence of a tension (definite as being equal to level 1).
In fact, it memorizes the numerical value of the bit.
We know that decimal numbers (language human being) when they are expressed into binary (language of the machine) use a greater number of weights, and consequently, their writing is longer.
For example, if one wants to enter the machine of the numbers whose numerical value does not exceed 255, one needs eight basic cells of register into binary whereas one would need three into decimal of them.
In short, a register is a memory in which one can store binary numbers. One their principal characteristics, is the maximum numerical value which they can memorize, one also speaks capacity or length about the register.
You will learn in the next lessons devoted to these registers, how one enters this information, how they remain there and how one reaches it.
For the moment, admit that it is possible.
These registers are very important because they condition the capacity of calculation of the system. Let us imagine that our paper sheet is not enough large to register there numbers of more than three digits, one understands that calculations will quickly be limited.
6. 1. - THE SIGNS NUMBERS
In addition, we know that a number is characterized by its absolute numerical value and by its sign.
It is thus necessary to find a method which makes it possible to bind a sign to the binary numerical value.
We will now describe the methods considered :
The first consists in putting in front of the absolute value of the number a bit
of sign.
For a positive number, the bit of sign is 0. If on the contrary it is negative, the bit of sign is 1.
Example :
The number (+ 43), according to the signed
binary representation is noted :
The number (- 43) is noted :
In the case of a numerical system, for example a pocket calculator, the length of the registers is definite and immutable. If they consist of eight cells, the representation of these numbers is as follows :
This method tends to being abandoned with the profit of that of the complement with 2 which we describe further.
The second method which was employed, does not use the bit of sign in the same
way.
The positive numbers are represented with one 0 with the most significant digit.
The negative numbers are represented by the complement with 1 of the positive number corresponding.
The complement intervenes on the bit of sign, one finds, consequently, one 0 for a positive number and one 1 for the negative numbers, with the most significant digit.
An example is given figure 27.
This system is also abandoned because it presents a disadvantage of size: the double expression of the 0.
Indeed, if one deducts positive values towards 0, this one will have as an expression 0. If one deducts negative values towards 0, this one will have as an expression 1.
Figure 28 shows it clearly.
This method creates an ambiguity whose numerical systems cannot adapt without the use of subterfuges.
The third method, which spreads, is founded on the complement with 2 (see
chapter 4. 3. 4. on the complement with 2).
It consists, for the positive numbers in their representation into normal binary preceded by one 0.
Their opposites, in negative values, are represented by the complement with 2.
The complementation also intervenes on the bit of sign and the negative numbers are preceded by one 1.
Example :
The number (+ 10) is represented by : 01010
The number (- 10) is represented by : 10110
As shown in the figure 29.
This representation of the negative binary numbers, by the complement with 2 does not pose the ambiguity of the double expression of the zero and will be useful to us in the procedure of obtaining of the result for the operations carried out by the machine.
The use of the relative numbers imposes a reduction in the capacity of the registers, since a cell will be reserved for the sign.
Figure 30 represents some of the numbers relative between (+ 127) and (- 128) used in the numerical machines whose registers comprise eight basic cells, therefore able to store words of eight bits, called bytes.
It is necessary at the time them discussions to specify the method used for the representation of the negative numbers, this is obvious.
In the same way, one will not have to forget to make precede by one the 0 all positive numbers. These two points are very important.
6. 2. - MULTIPLE PRECISION
We spoke about words of eight bits or bytes. In the numerical systems, a word, whatever the number of bits, can take the name of “byte” (Anglo-Saxon term).
We have just seen that with a byte, it was possible to represent 256 values (+ 127 with - 128 including 0).
It is obvious that for the majority of calculations, it is very insufficient. It is thus necessary to have recourse to an artifice.
One can extend the number of cells of the registers, but that led to certain problems on the level of the integrated circuits.
One can also use several times eight bits. For example, if the numbers are coded on two bytes, one can represent 65 536 numerical values, which represents the relative numbers of (+ 32 767) with (- 32 768) while passing by 0.
The number thus represented is composed of twice eight bits, the eight bits of the weakest weights constitute the least significant word (M.M.S.) and the eight bits of the strongest weights, the most significant word (M.P.S.).
One also says, the least significant byte (O.M.S.) and the most significant byte (O.P.S.). This way of proceeding using several bytes, takes the multiple name of precision.
When one uses that two bytes, we will say that it is about double precision.
In the calculating machines, this resolution is not yet sufficient. One uses several words or several bytes (the words are not inevitably organized in bytes).
According to the wished resolution, one is brought to employ three or four words, thus the precision is definitely sufficient.
This procedure has as a name : multiple precision.
The multiple precision increases the time of obtaining the result, because the machine, to carry out calculations must call the M.M.S. (the least significant words), carry out the operation with those, store the result and the carryforward, if there exists, then to call the following words, to carry out calculations.
It is understood easily that if the procedure is longer, the result is obtained a little later.
For a binary number, just as into decimal (spoke we in chapter 1), the figure which occupies the row of weight highest A for name : the most significant bit or B.L.P.S.
On the other hand, that which occupies least low row: the least significant bit or B.L.M.S.
6. 3. - THE FLOATING POINT
Until now, we spoke only about the integers, it is also necessary to be able to represent the fractional numbers like, for certain cases, the very large numbers.
The floating point is not other than the exponential notation (or scientific notation) and it makes it possible to solve the problem of the representation of the very small numbers to the very large numbers.
These procedures, multiple precision and floating point, will be useful for you when you approach the microprocessors.
For the moment, they are described for memory and because they form part normally of this lesson.
In the decimal system, it is about the notation using the powers of 10.
This notation is composed of a part which one calls the mantissa and one second that one names the exhibitor.
The exhibitor is not other than the weight of the row occupied by the whole part of the mantissa.
Example :
0,00015 is written : 1,5 x 10-4
0,005 is written : 5 x 10-3
1246 is written : 1,246 x 103
One can also use following convention by taking again the same examples :
0,00015 Þ 0,15 x 10-3
0,005 Þ 0,5 x 10-2
1246
Þ 0,1246 x
104
All these numbers start with 0
and since they all are resulting from the power of 10,
one can adopt the written form very well according to :
0,15
x 10-3 Þ (+ 15) (- 3)
0,5
x 10-2 Þ (+ 5) (-
2)
0,1246
x 104 Þ (+ 1246) (+ 4)
The mantissa M
is always lower than 1 and equal or higher
than 0,1 :
0,1
£ M < 1
In the numerical systems and in
particular with the microprocessors, it does not act any more power of 10,
but of the power of 2, since we work into
binary.
In these systems using of the words
of eight bits or bytes, one can preserve this form of writing by affecting a
word for the mantissa and a word for the exhibitor.
In the numerical example, one
realizes that with a byte, one will not be able to represent the numerical value
1246, especially if one uses the method of
the complement with 2 for the negative
values because it does not remain whereas seven bits to express this numerical
value.
By using the multiple precision,
i.e. while working on several bytes, that becomes possible.
If for example one uses three bytes
for the mantissa and his sign, and a byte for the exhibitor and his sign, one
can represent the relative numbers within the following limits :
(+ 223
- 1) X 2127 à (- 223) X 2127
Maybe into decimal :
± 0,142
X 1046 ou, ±
142 suivi de 43 zéros.
The values as small as (± 1) x 2-127 can be represented, is into decimal :
± 0,58 X 10-38 ou, ± 0, ... 38 zéros ... 58.
This way of writing the numbers, by keeping the memory of the decimal radix point, allows calculations on very large or very small numbers (fractional).
To summarize, in the numerical systems intended for calculations, one uses for the representation of the relative numbers, the complement with 2 for the negative numbers, the multiple precision and the exponential notation or floating point.
If four bytes are used, three for the mantissa and for the exhibitor, all the numbers will be represented by this same number of bytes, i.e. they all will have the same format.
That is to say :
for the exhibitor 7 bits Þ 127 (decimal) more the sign : ± 127, therefore 2±127
for the mantissa 23 bits plus for the sign, which corresponds to : ± 223
The operations, in floating point, are subjected to a special procedure.
The multiplication does not raise a difficulty, one multiplies the mantissas between them and one adds the exhibitors.
The addition requires an operation of retiming which consists in making the exhibitors equal in value absolute, which is imperative in this case, because one should add only with the of the same numbers weight.
We will not go more into these details and will know that there are integrated circuits especially designed for the operations in floating point.
| Following
page |
![]()