How is IEEE 854 used when the performance of an addition operation is carried out?

Question

Question

How is IEEE 854 used when the performance of an addition operation is carried out?

engineering Electrical-Engineering

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

solution:

It consists of three loosely connected parts. The first section, Rounding Error, discusses the implications of using different rounding strategies for the basic operations of addition, subtraction, multiplication and division. It also contains background information on the two methods of measuring rounding error, ulps and relative error. The second part discusses the IEEE floating-point standard, which is becoming rapidly accepted by commercial hardware manufacturers. Included in the IEEE standard is the rounding method for basic operations. The discussion of the standard draws on the material in the section Rounding Error. The third part discusses the connections between floating-point and the design of various aspects of computer systems. Topics include instruction set design, optimizing compilers and exception handling.

I have tried to avoid making statements about floating-point without also giving reasons why the statements are true, especially since the justifications involve nothing more complicated than elementary calculus. Those explanations that are not central to the main argument have been grouped into a section called "The Details," so that they can be skipped if desired. In particular, the proofs of many of the theorems appear in this section. The end of each proof is marked with the z symbol. When a proof is not included, the z appears immediately following the statement of the theorem.

Rounding Error

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation. The section Relative Error and Ulps describes how it is measured.

Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? That question is a main theme throughout this section. The section Guard Digits discusses guard digits, a means of reducing the error when subtracting two nearby numbers. Guard digits were considered sufficiently important by IBM that in 1968 it added a guard digit to the double precision format in the System/360 architecture (single precision already had a guard digit), and retrofitted all existing machines in the field. Two examples are given to illustrate the utility of guard digits.

The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations.

Floating-point Formats

Several different representations of real numbers have been proposed, but by far the most widely used is the floating-point representation.¹ Floating-point representations have a base (which is always assumed to be even) and a precision p. If = 10 and p = 3, then the number 0.1 is represented as 1.00 × 10^-1. If = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2^-4.

In general, a floating-point number will be represented as ± d.dd... d × ^e, where d.dd... d is called the significand² and has p digits. More precisely ± d₀. d₁d₂ ... d_p-1 × ^e represents the number

(1) .

The term floating-point number will be used to mean a real number that can be exactly represented in the format under discussion. Two other parameters associated with floating-point representations are the largest and smallest allowable exponents, e_max and e_min. Since there are ^p possible significands, and e_max - e_min + 1 possible exponents, a floating-point number can be encoded in

bits, where the final +1 is for the sign bit. The precise encoding is not important for now.

There are two reasons why a real number might not be exactly representable as a floating-point number. The most common situation is illustrated by the decimal number 0.1. Although it has a finite decimal representation, in binary it has an infinite repeating representation. Thus when = 2, the number 0.1 lies strictly between two floating-point numbers and is exactly representable by neither of them. A less common situation is that a real number is out of range, that is, its absolute value is larger than × or smaller than 1.0 × . Most of this paper discusses issues due to the first reason. However, numbers that are out of range will be discussed in the sections Infinity and Denormalized Numbers.

Floating-point representations are not necessarily unique. For example, both 0.01 × 10¹ and 1.00 × 10^-1 represent 0.1. If the leading digit is nonzero (d₀ 0 in equation (1) above), then the representation is said to be normalized. The floating-point number 1.00 × 10^-1 is normalized, while 0.01 × 10¹ is not. When = 2, p = 3, e_min = -1 and e_max = 2 there are 16 normalized floating-point numbers, as shown in FIGURE D-1. The bold hash marks correspond to numbers whose significand is 1.00. Requiring that a floating-point representation be normalized makes the representation unique. Unfortunately, this restriction makes it impossible to represent zero! A natural way to represent 0 is with 1.0 × , sincethis preserves the fact that the numerical ordering of nonnegative real numbers corresponds to the lexicographic ordering of their floating-point representations.³ When the exponent is stored in a k bit field, that means that only 2^k - 1 values are available for use as exponents, since one must be reserved to represent 0.

Note that the × in a floating-point number is part of the notation, and different from a floating-point multiply operation. The meaning of the × symbol should be clear from the context. For example, the expression (2.5 × 10^-3) × (4.0 × 10²) involves only a single floating-point multiplication.

FIGURE D-1 Normalized numbers when = 2, p = 3, e_min = -1, e_max = 2

please give me like..its help me to write more questions..thank you..

Add a comment

Answer 2

How is IEEE 854 used when the performance of an addition operation is carried out?

Homework Answers

Add Answer to:
How is IEEE 854 used when the performance of an addition operation is carried out?

Post as a guest

Earn Coins

14) (15 pts) Show how the following synthetic conversion can be carried out. In addition to...

estion 12 The day-to-day operation of the National DNA Database is carried out by this entity:...

A slab-milling operation is carried out on a 200 mm long, 80-mm-wide annealed mild-steel workpiec...

A slab-milling operation is being carried out on a block on a 200 mm-long, 150 mm-wide...

With the help of an example SQL query, explain the different steps that are carried out when cost...

Kindly provide with clear step by step solution An orthogonal cutting operation is being carried out...

5. When isolation of a product from a reaction mixture is carried out, a solvent is...

10- In a welding operation carried out on a eutectoid steel (C =0.8wt%), the selection of...

An activity measure in activity-based costing expresses how much of an activity is carried out and...

1a. Briefly explain how the fatigue test is carried out. 1b. Briefly explain how the creep...

How is IEEE 854 used when the performance of an addition operation is carried out?

Homework Answers

Add Answer to: How is IEEE 854 used when the performance of an addition operation is carried out?

Post as a guest

Earn Coins

Add Answer to:
How is IEEE 854 used when the performance of an addition operation is carried out?