Question

Why is quadruple precision (128 bits) more than twice as accurate as double precision (64-bit), which...

Why is quadruple precision (128 bits) more than twice as accurate as double precision (64-bit), which is in turn more than twice as accurate as single precision (32-bit)?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

b format at oce upies lo byte and bi Whose pre c 3computer memory de ian ed ση1Y, ナ Application,3 nor Requi uiR ? higher-Uonbetveen te value extra peひ82- - metic and the price y mplenm to w fot: very noo too more preuaroy wh become lerable and ifnetdata tyre acod a8 a double Yequives 6h a3 xows the table belon, bits, tomate 31T8 63 62 to n s1 to o SaRe Exponent, bfa ed by

Add a comment
Know the answer?
Add Answer to:
Why is quadruple precision (128 bits) more than twice as accurate as double precision (64-bit), which...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (30 pts) In addition to the default IEEE double-precision format (8 byte 64 bits) to store...

    (30 pts) In addition to the default IEEE double-precision format (8 byte 64 bits) to store floating-point numbers, MATLAB can also store the numbers in single-precision format (4 bytes, 32 bits). Each value is stored in 4 bytes with 1 bit for the sign, 23 bits for the mantissa, and 8 bits for the signed exponent: Sign Signed exponent Mantissa 23 bits L bit 8 bits Determine the smallest positive value (expressed in base-10 number) that can be represented using...

  • Find the precision of IEEE 754 FP code on 64-bit machines? • Double Precision Floating Point...

    Find the precision of IEEE 754 FP code on 64-bit machines? • Double Precision Floating Point Numbers (64 bits) – 1-bit sign + 11-bit exponent + 52-bit fraction S Exponent11 Fraction52 (continued)

  • 2.4 Recall from class that MATLAB uses standard (IEEE) double-precision floating point notation: 52 bits 11...

    2.4 Recall from class that MATLAB uses standard (IEEE) double-precision floating point notation: 52 bits 11 bits where each bit b Any Number- +/- (1.bbb...bbb)2 x 2 (bbb..bb2 102310 represents the digit 0 or 1. That is, the mantissa is always assumed to start with a 1, with 52 bits afterwards, and the exponent is an eleven bit integer (from 000..001 to 111...110) biased by subtracting 1023 Well, in "my college days" the standard was single-precision floating point notation in...

  • IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit is...

    IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Write down the bit pattern to represent -1.6875 X 100 assuming a version of this format, which uses an excess-16 format to store the exponent. Comment on how the range and accuracy of this 16-bit floating...

  • 4. (5 points) IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit...

    4. (5 points) IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Write down the bit pattern to represent-1.09375 x 10-1 assuming a version of this format, which uses an excess-16 format to store the exponent. Comment on how the range and accuracy of this...

  • If we use the IEEE standard floating-point single-precision representation (1 sign bit, 8 bit exponent bits...

    If we use the IEEE standard floating-point single-precision representation (1 sign bit, 8 bit exponent bits using excess-127 representation, 23 significand bits with implied bit), then which of the following hexadecimal number is equal to the decimal value 3.875? C0780000 40007800 Oo 40780000 40A80010 The binary string 01001001110000 is a floating-point number expressed using a simplified 14-bit floating-point representation format (1 sign bit, 5 exponent bits using excess-15 representation, and 8 significand bits with no implied bit). What is its...

  • 1a. convert the following decimal number to 32 bit single precision Floating point binary number and...

    1a. convert the following decimal number to 32 bit single precision Floating point binary number and convert that binary number to hexadecimal NUMBER = -134.5 in decimal b. convert the following 32-bit single precision floating point number to decimal: 01000111111100000000000000000000 2. Using Booth's algorithm, multiply the decimal numbers -12 and +13. 3. you have two improvement alternatives, which is better and why? The first one improves 15% of the instructions, and it improves that speed by a factor of 6....

  • Among 8-bit, 12-bit, 16-bit, 32-bit and 64-bit ADC, which is most appropriate for using that Type...

    Among 8-bit, 12-bit, 16-bit, 32-bit and 64-bit ADC, which is most appropriate for using that Type K thermocouple to measure human body temperature to 4 significant digits (appropriate := accurate yet cost-effective)?

  • 10. A 64 K cache has lines that are 128 bytes long, and is 4-way set...

    10. A 64 K cache has lines that are 128 bytes long, and is 4-way set associative. The cache is in a computer with a 32-bit address. Answer the following questions: A) How many lines are in the cache? B) How many sets are in the cache? C) How many tags are in the cache? D) How big is each tag? E) If the cache uses an LRU replacement algorithm, how many extra bits will be required to keep track...

  • computer architecture The sum of the two 32 bit integers may not be representable in 32 bits. In this case, we say that an overflow has occurred. Write MIPS instructions that adds two numbers stor...

    computer architecture The sum of the two 32 bit integers may not be representable in 32 bits. In this case, we say that an overflow has occurred. Write MIPS instructions that adds two numbers stored in registers Ss1 and Ss2, stores the sum in register $s3, and sets register Sto to 1 if an overflow occurs and to 0 otherwise. 5. (16pts) 6. Show the IEEE 754 binary representation of the number -7.425 in a single and double 7. If...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT