Herong's Tutorial Notes On C# - Part B
Dr. Herong Yang, Version 2.02

Binary Representation of 'float' and 'double' Values

Part:   1  2   3 

(Continued from previous part...)

Rule 7: When the exponent component store all 1s, the fraction component is reserved for arithmetic operation uses.

For negative values, everything is identical with the positive numbers, except the sign bit.

The constant, 127, used in exponent part of the expression is called the bias. For double-precision standard, the bias is 1023.

The IEEE 754 standards are also called binary floating point number standards, because:

  • The number is expressed in binary format.
  • The binary point is floated around.

Finally, let's see a program I wrote to convert a "float" number into the IEEE 754 expression format:

using System;
public class IeeeFloat {
   private const int fraction_size = 23;
   private const int bias = 127;

   private float original_value;
   private float value;
   private int sign; 
   private long exponent;
   private int lead;
   private int[] fraction;
   
   public IeeeFloat(float v) {
      original_value = v;
      value = original_value;
      	
      sign = 1; 
      lead = 0; 
      exponent = 0;
      fraction = new int[fraction_size];
      
      int i;
      for (i=0; i<fraction_size; i++) {
         fraction[i] = 0;
      }
      
      // of course, working on the sign first
      if (value==0.0f) {
      	 sign = 1;
      } else if (value==-0.0f) { // not sure if this detects the -0
      	 sign = -1;
      	 value = -1.0f*value;
      } else if (value<0.0f) {
      	 sign = -1;
      	 value = -1.0f*value;
      }      	
   	
      if (value>0.0f) {      
         // now, the exponent part
         while (value>=2.0f) {
      	    exponent++;
      	    value = value/2.0f;
         }
         while (value<1.0f && exponent>-bias) {
            exponent--;
            value = value*2.0f;
         }
         
         // the implict leading bit
         if (value>=1.0f) {
            value = value-1.0f;
            value = value*2.0f;
            lead = 1;
         } else 
            lead = 0;
         }

         // time for the fraction part
         for (i=0; i<fraction_size; i++) {
            if (value>=1.0f) {
               fraction[i] = 1;
               value = value-1.0f;
            } else {
               fraction[i] = 0;
            }
            value = value*2.0f;
         }
      }
   }

   public string toBinaryExpression() {
      string str = "";
      if (sign<0) str = str + "-";
      str = str + "b(" + lead;
      str = str + ".";
      for (int i=0; i<fraction_size; i++) {
	 str = str + fraction[i];
      }
      str = str + ")*2**(" + exponent + ")";
      return str;
   }

(Continued on next part...)

Part:   1  2   3 

Dr. Herong Yang, updated in 2002
Herong's Tutorial Notes On C# - Part B - Binary Representation of 'float' and 'double' Values