# Primitive Data Types: Floating Point Types

## ICSE Computer Applications

Floating Point Types are used to store fractional numbers or real numbers. Before we look into the data types present in this group, we need to first understand a few challenges associated with storing fractional numbers in computer’s memory. Take the example of square root of 2 and 8 divided by 3:

```√2 = 1.41421356237309.......

8 / 3 = 2.66666666666.......
```

The results contain an infinite number of digits after the decimal. We can’t store infinite number of digits, after all the computer memory is limited. So, what do we do?

You must have encountered this in your maths class and the general solution here is to round off, maybe keep 2 or 3 or 4 digits after the decimal and discard the rest. In doing so we lose accuracy. Maybe just keeping 3 or 4 digits is fine for the maths assignment but in real-world applications accuracy is quite important. Take space or chip designing fields, here we need a very high degree of accuracy. A tenth of a millimetre is also super important.

To address this issue of accuracy, all hardware and programming languages use the IEEE 754 standard for storing floating point numbers. IEEE 754 proposes a way to store floating point numbers at varying precision levels. The complete details of the standard are beyond the scope of this course. We will just look at an overview of IEEE 754 which should be enough for understanding float and double data types.

## IEEE 754 Overview

In IEEE 754 standard, the idea is to compose the fractional number of two parts:

• A significand that contains the number’s digits.
• An exponent that says where the decimal point is placed relative to the significand.

### Single and Double Precision Formats

The 2 things of the standard that we are interested in is the single and double precision formats. Single precision format uses a total of 32 bits to represent the fractional number. Out of that 32 bits, 24 are used to represent the significand and 8 bits are used to represent the exponent.

Double precision format uses a total of 64 bits to represent the fractional number. Out of that 53 bits are used to represent the significand and 11 bits are used to represent the exponent.

The key take away about Single and Double precision format is this. Double precision format stores fractional numbers at higher accuracy than single precision format as more bits are used for the exponent compared to single precision.

## float

The float data type stores a number in single precision format. It has a size of 32 bits given it stores the number in single precision format. float is useful when you need to store a fractional number with around 6-7 total digits of precision.

Let’s look at a BlueJ program to see float in action.

```public class FloatDatatypeDemo
{
public void demoFloat() {
float f1 = 148.7623F;
System.out.println("Value of f1 is " + f1);

float f2 = 148.7623549f;
System.out.println("Value of f2 is " + f2);
}
}```

Here is the output of this program

As you can see in the output, we are able to store `148.7623` in `f1` without losing any precision. `float` can represent `148.7623` accurately as it has 7 digits in total so it is within the range of single precision format that `float` uses to store its values.

We lost precision while storing `148.7623549` in `f2`. `148.7623549` has 10 digits in total which exceeds the number of digits that `float` can represent accurately. So, Java rounded off `148.7623549` to the nearest value that can be accurately represented by `float` and stored `148.76236` in f2.

## double

The double data type stores a number in double precision format. It has a size of 64 bits. It can store a fractional number with 15-16 total digits of precision.

Building upon our previous program where we stored `148.7623549` in `float` and lost precision, this time let's try to store it in a `double`.

```public class FloatDatatypeDemo
{
public void demoFloat() {
float f1 = 148.7623F;
System.out.println("Value of f1 is " + f1);

double f2 = 148.7623549;
System.out.println("Value of f2 is " + f2);
}
}```

The output confirms that we can store `148.7623549` in `double` without losing precision.