Question

# How is IEEE 854 used when the performance of an addition operation is carried out?

How is IEEE 854 used when the performance of an addition operation is carried out?

solution:

It consists of three loosely connected parts. The first section, Rounding Error, discusses the implications of using different rounding strategies for the basic operations of addition, subtraction, multiplication and division. It also contains background information on the two methods of measuring rounding error, ulps and relative error. The second part discusses the IEEE floating-point standard, which is becoming rapidly accepted by commercial hardware manufacturers. Included in the IEEE standard is the rounding method for basic operations. The discussion of the standard draws on the material in the section Rounding Error. The third part discusses the connections between floating-point and the design of various aspects of computer systems. Topics include instruction set design, optimizing compilers and exception handling.

I have tried to avoid making statements about floating-point without also giving reasons why the statements are true, especially since the justifications involve nothing more complicated than elementary calculus. Those explanations that are not central to the main argument have been grouped into a section called "The Details," so that they can be skipped if desired. In particular, the proofs of many of the theorems appear in this section. The end of each proof is marked with the z symbol. When a proof is not included, the z appears immediately following the statement of the theorem.

Rounding Error

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation. The section Relative Error and Ulps describes how it is measured.

Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? That question is a main theme throughout this section. The section Guard Digits discusses guard digits, a means of reducing the error when subtracting two nearby numbers. Guard digits were considered sufficiently important by IBM that in 1968 it added a guard digit to the double precision format in the System/360 architecture (single precision already had a guard digit), and retrofitted all existing machines in the field. Two examples are given to illustrate the utility of guard digits.

The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations.

Floating-point Formats

Several different representations of real numbers have been proposed, but by far the most widely used is the floating-point representation.1 Floating-point representations have a base (which is always assumed to be even) and a precision p. If = 10 and p = 3, then the number 0.1 is represented as 1.00 × 10-1. If = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2-4.

In general, a floating-point number will be represented as ± d.dd... d × e, where d.dd... d is called the significand2 and has p digits. More precisely ± d0. d1d2 ... dp-1 × e represents the number

(1) .

The term floating-point number will be used to mean a real number that can be exactly represented in the format under discussion. Two other parameters associated with floating-point representations are the largest and smallest allowable exponents, emax and emin. Since there are p possible significands, and emax - emin + 1 possible exponents, a floating-point number can be encoded in

bits, where the final +1 is for the sign bit. The precise encoding is not important for now.

There are two reasons why a real number might not be exactly representable as a floating-point number. The most common situation is illustrated by the decimal number 0.1. Although it has a finite decimal representation, in binary it has an infinite repeating representation. Thus when = 2, the number 0.1 lies strictly between two floating-point numbers and is exactly representable by neither of them. A less common situation is that a real number is out of range, that is, its absolute value is larger than × or smaller than 1.0 × . Most of this paper discusses issues due to the first reason. However, numbers that are out of range will be discussed in the sections Infinity and Denormalized Numbers.

Floating-point representations are not necessarily unique. For example, both 0.01 × 101 and 1.00 × 10-1 represent 0.1. If the leading digit is nonzero (d0 0 in equation (1) above), then the representation is said to be normalized. The floating-point number 1.00 × 10-1 is normalized, while 0.01 × 101 is not. When = 2, p = 3, emin = -1 and emax = 2 there are 16 normalized floating-point numbers, as shown in FIGURE D-1. The bold hash marks correspond to numbers whose significand is 1.00. Requiring that a floating-point representation be normalized makes the representation unique. Unfortunately, this restriction makes it impossible to represent zero! A natural way to represent 0 is with 1.0 × , sincethis preserves the fact that the numerical ordering of nonnegative real numbers corresponds to the lexicographic ordering of their floating-point representations.3 When the exponent is stored in a k bit field, that means that only 2k - 1 values are available for use as exponents, since one must be reserved to represent 0.

Note that the × in a floating-point number is part of the notation, and different from a floating-point multiply operation. The meaning of the × symbol should be clear from the context. For example, the expression (2.5 × 10-3) × (4.0 × 102) involves only a single floating-point multiplication.

FIGURE D-1 Normalized numbers when = 2, p = 3, emin = -1, emax = 2

please give me like..its help me to write more questions..thank you..

#### Earn Coins

Coins can be redeemed for fabulous gifts.

Similar Homework Help Questions
• ### 14) (15 pts) Show how the following synthetic conversion can be carried out. In addition to...

14) (15 pts) Show how the following synthetic conversion can be carried out. In addition to the material given, you may use any needed solvents, inorganic reagents, and organic compounds with fhree or fewer carbons. If needed, use the back of this page to give your answer OH NO2

• ### estion 12 The day-to-day operation of the National DNA Database is carried out by this entity:...

estion 12 The day-to-day operation of the National DNA Database is carried out by this entity: O The Home Office O Custodian Unit The Forensic Science Service Assoc. of Police Authorities

• ### A slab-milling operation is carried out on a 200 mm long, 80-mm-wide annealed mild-steel workpiec...

A slab-milling operation is carried out on a 200 mm long, 80-mm-wide annealed mild-steel workpiece having a feedrate of 0.1 mm/tooth and a depth of cut of 4.0 mm. The cutter of 50 mm diameter has 18 straight teeth and rotates at 135 rpm. The given specific energy for this material is 3.5 W s/mm3 and the slab mill is wider than the workpiece to be machined. Calculate: ‧ the material-removal rate; ‧ the power and torque required for this...

• ### A slab-milling operation is being carried out on a block on a 200 mm-long, 150 mm-wide...

A slab-milling operation is being carried out on a block on a 200 mm-long, 150 mm-wide annealed mild steel block at a feed f= 0.20 mm/tooth and depth of cut d= 2.5 mm. The cutter is D= 45 mm in diameter, has 20 straight teeth, rotates at N=110 rpm and by the definition, is wider than the block to be machined. Calculate ; I.material-removal, II.estimate the power and torque required for this operation, III.calculate the cutting time

• ### With the help of an example SQL query, explain the different steps that are carried out when cost...

With the help of an example SQL query, explain the different steps that are carried out when cost- based optimization is used by the optimization engine. Cleary state any assumption that you make.

• ### Kindly provide with clear step by step solution An orthogonal cutting operation is being carried out...

Kindly provide with clear step by step solution An orthogonal cutting operation is being carried out under the following conditions: depth of cut-0.10 mm, width of cut 5 mm, chip thickness-0.2 mm, cutting speed = 2 m/s, rake angle = 15°, cutting force-500 N, and thrust force-200 N. Calculate the percentage of the total energy that is dissipated in the shear plane during cutting.

• ### 5. When isolation of a product from a reaction mixture is carried out, a solvent is...

5. When isolation of a product from a reaction mixture is carried out, a solvent is generally used to extract the product before purification. What boiling point characteristic must this solvent have in order to facilitate its easy separation from the product you want to isolate?

• ### 10- In a welding operation carried out on a eutectoid steel (C =0.8wt%), the selection of...

10- In a welding operation carried out on a eutectoid steel (C =0.8wt%), the selection of welding travel speed, applied voltage and current and application of blowing air and wrapping of the work piece into insulating cloth was as such that the following cooling schedules obtained. You are required to specify the most likely microstructure of the weld metal. Once identified the most likely microstructure, then discuss the ductility and toughness of the weld joint. Rc (Hardness) Eutectoid temperature Temperature...

• ### An activity measure in activity-based costing expresses how much of an activity is carried out and...

An activity measure in activity-based costing expresses how much of an activity is carried out and it is used as the allocation base for assigning overhead costs to products and services. True or False

• ### 1a. Briefly explain how the fatigue test is carried out. 1b. Briefly explain how the creep...

1a. Briefly explain how the fatigue test is carried out. 1b. Briefly explain how the creep test is carried out.