A Floating Point Question Revisited – Numerical Methods Guy

QUESTION: A machine stores floating point numbers in 7-bit word. The first bit is stored for the sign of the number, the next three for the biased exponent and the next three for the magnitude of the mantissa. You are asked to represent 33.35 in the above word. The error you will get in this case would be
(A) underflow
(B) overflow
(C) NaN
(D) No error will be registered

The solution to problem is given here.

However a student asked me a follow up question, and here is the answer.

QUESTION: I was doing the multiple choice question and I am having trouble understanding it. I looked at the solution but I am having trouble still. I began by turning 33.35 into binary and i get 100001.01011. I just am having trouble putting it into the format. The max exponent value is 4 in this case but in the solutions it says you need 5. Maybe I do not understand what underflow and over flow is exactly.

ANSWER: The solution is given as you have pointed out.

The binary number in fixed format needs to be converted to floating point format. That would be 100001.01011=1.0000101011*2^5 as you move the radix point by 5 places to the left. We move that 5 places as it gives us only one non-zero digit now to the left of the radix point. This is no different from the procedure you use for converting a decimal format to scientific format for base-10 numbers.

Now all floating point formats have an upper limit of number it can represent. Since the biased exponent has 3 bits, the biased exponent that can be represented is from 0 to 7, which means the unbiased exponent that can be represented is from -3 to 4 (biasing by +3, and unbiasing by -3). But since we need to represent an unbiased exponent of 5, it cannot be done. The maximum unbiased exponent that can be represented is 4. So the number is larger than the one that can be represented. If you put 32 ounces of water in a 24-ounce cup, we say that the water overflowed. In this case, the number will overflow as it is more than it can handle.

You can see this in a different way as follows (looking at a solution a different way; that always helps the brain and your long-term memory).

The maximum number you can represent in binary for the given 7-bit word is 0111111 and that translates to (1.111)₂*2^(111)₂ which in base 10 is equivalent to (1.875)*2^(7-3)=30 (the 3 is used for unbiasing the exponent). Hence, 33.35 would overflow, just like when you put 32 ounces of water in a 24-ounce cup.

_____________________________________________________

Leave a Reply Cancel reply