THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Technical content about Microsoft data technologies. All opinions expressed are purely my own and do not reflect positions of my employer or associates.

IsCloseEnough(): I Ain’t Afraid of No Float

Friends, in this year of the Ghostbusters revival, I would like to talk about some paranormal phenomena: floating point value migration. The problem statement here is simple:

  1. We have a large collection of numbers in one system
  2. We copy those numbers to another system
  3. Wanting to be sure the copy was successful, we compare the two sets of numbers using a test
  4. Naively, we would assume the numbers in the two systems should be “exactly” the same
  5. Huge numbers of the values are different and fail the test

What’s going on? There’s a 95.6999999999% chance that floating point numbers are what is going on, and if you aren’t afraid of floating point numbers, well, a little fear might be good.

Review: Floating Point Drift

All numbers stored digitally are, by definition, rounded to the nearest single value that the digital representation can store. Sometimes that rounded value is precisely the original value, and sometimes it’s just a tiny fraction different than the original. The number 8 can be represented exactly and conveniently in a digital form, while the number 1/3 might have to be stored as 0.333333, which is almost 1/3, but not precisely 1/3. We do this as a matter of course all day long, and most of the time it works without too much effort, because the rounding increment is small relative to the number we are trying to store.

Consider this illustration. The red arrow represents exactly a real number we intend to store, and the black number lines represent the resolution or coarseness of two possible kinds of digital representations, A and B, of that real number:


When the number is stored, it will have to be adjusted to one or the other black tick on the number lines, because it’s not possible to store the real, exact value. The density of those tick marks depends on the digital representation (data type) of the number.

Importantly, if that number’s representation is changed to another data type, say from A to B, then it may shift slightly because it has to be a value at the resolution of B. If we move it to a third representation, it might shift again. How this value relates to the original is not something you can test any longer using equality. This might all seem academic unless you have really had to process data in this way, where it changes in flight, and explain that baffling change to users or managers, etc. This has real consequences if, for example, you are moving data from a database to a file and back, or representing it in an application but storing it in a file, or comparing floating point numbers in a hadoop system to those in a relational database.

Informally, let’s call this phenomenon “drift.” Drift happens both in computations with digital values, in storing them, and potentially in transporting them from system to system.

Decimal or Float

For context, let’s compare three representations: Decimal, Floating Point and String. Imagine these as variations of those two number lines A and B.

A decimal is typically a data type modeled in the computer in such a way that a fixed precision of decimal numbers can be stored and retrieved with exact precision. That is, if I declare a decimal number to store 5 digits, with three decimal places, I can store 12.345 and when I retrieve it I will get precisely 12.345 back again. This seems like it would be the obvious best thing to do for most cases, but decimal has tradeoffs:

  1. It’s expensive to do this, because the natural representation in computers for numbers is base 2 and not base 10. This means it takes a lot more memory and storage to devise a way to precisely store decimal numbers having a lot of digits.
  2. Computation of this representation is dramatically slower than computation of binary numbers.
  3. The range (minimum to maximum value) of a decimal is very limited compared to other representations, as we’ll see in a minute.

A float is typically a base 2 / binary representation of a number consisting of a fixed-width series of binary values (mantissa) together with an exponent and a sign. There is a standard IEEE 754 that details how to store a floating point number as bits, which makes working with them more consistent across languages and operating systems.

The thing that probably trips most people up when using floats is that as humans we look at numbers and reason about numbers all in base 10, and it’s easy to forget that the computer is actually, constantly storing and processing them in base 2. It’s never working in base 10 on floats. It just shows us a base 10 approximation on our displays. It is as if the computer is working with one of those number lines in our diagram but only shows the values to us using the other one, because we humans can read 1.234 but not 00111111100111011111001110110110. (For the record, that is almost, but not exactly, 1.234.) We have a “feel” for numbers like 0.3333333 and understand that the decimal representation there is imprecise, but we have no similar frame of reference for base 2.

Microsoft has some good advice about choosing between these types for SQL Server, which in my experience not very many people seem to heed:

“Approximate numeric data types do not store the exact values specified for many numbers; they store an extremely close approximation of the value. For many applications, the tiny difference between the specified value and the stored approximation is not noticeable. At times, though, the difference becomes noticeable. Because of the approximate nature of the float and real data types, do not use these data types when exact numeric behavior is required, such as in financial applications, in operations involving rounding, or in equality checks. Instead, use the integer, decimal, money, or smallmoney data types.”

Having a value portion and an exponent portion, floating point numbers look a lot like scientific notation; they have this behavior where they can store a huge range of values, but the precision of those values goes down as the magnitude goes up. This happens because for all values there is a fixed number of significant digits. For a lot of applications, that’s a good thing, and it’s what enables the large range of possible values.

So why use a float?

  1. They are compact
  2. They are fast for computation
  3. They can store a much larger range of values than implementations of decimal

Floating point numbers are great for the right applications. They work well in computer graphics for games and applications like CAD and 3D modeling, and for sensor data. They work poorly for financial calculations.

A string representation of a number is much less well defined, because there are so many formats. In general it’s a text/alphabetic series of characters (so unicode or ascii, etc.) limited to the text characters 1234567890,.E+- and probably a few more. Strings are very much in play in a migration if the data originates from, or passes through, or ends in, files. The relationship between a string representation of a number in a file and a float is basically undefined.

I Don’t Believe You

Here are real examples that might blow your mind a little. These use PowerShell, but trust me, this happens everywhere.

PS C:\> [float]$mynum = 0.1


PS C:\> # OK but what is it really?


PS C:\> # No, I mean REALLY?
$mynum.ToString( "E18" )


PS C:\> # Wha? OK, let's look at the real, real representation
$mybytes = [BitConverter]::GetBytes( $mynum )

[array]::reverse( $mybytes )

$mybits = $mybytes | % { [Convert]::ToString( $_, 2 ).PadLeft( 8,'0') }

$mybits -join " "

00111101 11001100 11001100 11001101

PS C:\> # Here's how round trips to other types can get sideways
$mynum.ToString( "E18" )
[double]$myothernum = [Double]::Parse($mynum.ToString( "E18" ))


PS C:\> # But this is all 0.1, right?
$mynum -eq $myothernum


PS C:\> # Here's a different example
[float]$example = 1.55555555


PS C:\> $example.ToString()


PS C:\> # What's that *really* though?
$example.ToString( "E20" )


PS C:\> # 8?! Can rounding help me?

[Math]::Round( $example, 10 )


PS C:\> [Math]::Round( $example, 9 )


PS C:\> [Math]::Round( $example, 8 )


PS C:\> [Math]::Round( $example, 7 )


PS C:\> # What just happened
[Math]::Round( $example, 6 )


PS C:\> # There is a "round trip" string format in .net
$example.ToString( "R" )


PS C:\> # But it's not the default
[float]$example2 = $example.ToString()


PS C:\> # So
$example -eq $example2


PS C:\> $example.ToString( "R" )
$example2.ToString( "R" )

PS C:\> 

If you want to play with other examples there’s this fantastic page where you can input decimal numbers and see what floating point does with them.

OK, OK. So What Can We Do About It?

If you face this problem with a data migration, I suggest these precautions:

Testing for equality is folly

The first step is to admit there’s a problem. If you are moving floating point data between systems, understand that they probably will not be the same afterward unless the process uses fierce rigor in preserving the floating point representation through serialization, conversion and so on. Plan for them to be different, and plan time to grapple with that.

Examine the path for a value from one system to the other

Part of understanding where the variations may be could be helped by seeing the exact path, and where the transformations from one numeric representation to another happen. For example, if you are going from one SQL Server to another through SSIS, and you have the ability to be totally rigorous with keeping floating point values in the same representation, things may turn out closer. If you are serializing values to a file and moving to Hadoop, then investigate how much the numbers change.

Rounding probably won’t work

Two things are tempting here: checking that the difference in values is small enough ( abs( a – b ) < something ) and trying to round the values and compare after rounding. It’s worth trying those things, but know that there will still be cases where that doesn’t work.

The first usually fails because the “drift” gets progressively larger as numbers increase in size. If you know the numbers are expected to be in a certain range, then perhaps you can settle on some value for the limit of the difference, but that cannot be generalized for all values that can be stored in a float or double.

The second often fails because the result of rounding a float is still quite volatile for many real cases, and doesn’t end with equality. Consider using rounding on values like 0.045555 and 0.04444 – because 5 is a boundary for rounding up or down, the difference between rounded versions of those numbers, 0.05 and 0.04, might be much greater than the difference between the unrounded values, because the larger one increases in value and the smaller one decreases.

Look at the canonical test to compare floating point values

There is a standard methodology used in applications to compare floating point values with a variable relative difference “epsilon.” Although it has variations, this basic technique is in use for graphics, and so on, and it works.

Except then understand that might not work either, because it is engineered to compare two values on the same system that have the same representation, not two values where one value has been run through a migration process where the “drift” to which is was subjected is undetermined. Was it converted to a string and back? How? It’s very likely that the migrated values have drifted past what a test like this considers equal, but are still within range of what your application would accept as the same (e.g. 1.5555556 vs 1.5555555)

Examine the edge cases in your data

Many of the problems with the comparison are around specific, sensitive points in the numbers, which you can locate and test around. Some of these are zero, very small numbers near zero, repeating decimals, and numbers right on significant boundaries like 1.299999999 to 1.300000000. Plan on an iterative process where you test and examine a subset of the data with different comparison methods and look at what values are failing. Try to find patterns in those and establish what difference is acceptable or not.

Develop a two or three stage test for IsCloseEnough()

Ultimately comparing values across the systems may take a function that does a series of different comparisons to test whether the values are close enough to some business-determined limit. These could include:

  1. Compare only a subset of significant digits and the exponent to validate that the numbers are close enough. For example, the numbers 1.2345612 E 005 and 1.2345634 E 005 could be compared by checking that the first 6 digits of the value match, ignoring the last two, and compare the exponents. This could fail for some cases like 1.9999999 E 005 and 2.0000000 E 005, which are close without matching significant digits
  2. If the first test fails, fall into a second test that evaluates the difference between the numbers using a the sliding “epsilon” technique, to see if they are close enough even though the first test failed.
  3. etc.

If you have a significant quantity of data to check, ensure there is some method to validate subsets of the data, and that it reports the values and a reason why they failed, so that the test method can be revised and hone in on the best fitting algorithm.

Further and much more rigorous reading:

Published Tuesday, July 26, 2016 4:41 PM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Mike Decuir said:

Instead of comparing the absolute difference between a and b, you could instead compare the fractional difference, represented as abs((a-b)/a).

You could then represent 6 significant figures of accuracy as abs((a-b)/a) <= 0.000001.

Going a bit further, you could wrap everything in log base 10 and ensure that the order of magnitude of error is lower than -6.

July 26, 2016 7:27 PM

silk said:^Eescort.html^Eescort.html^Eescort.html^Eescort.html

February 9, 2019 8:05 AM

Megha Patil said:

Dear Independent Mumbai escorts lover welcome to Megha Patil personal dating website, I hope you are very well and miss you for relationship with me, because i am top and high profile escorts girl service provider in Mumbai.

March 28, 2019 12:20 AM

Gurgaon Escort Agency said:

The main reason is on the basis that sex is the desire of every male and sexuality blinds the male, so the Gurgaon call girls have devised a solution to provide sex to the man.

March 26, 2020 4:17 AM

Gurgaon Escort Service said:

If you are looking for a good call girls in Gurgaon, then you have come to the right place, we can provide you call girls at a cheap and good rate, for this you can contact us on our web site or our number.

March 28, 2020 10:04 PM

Escort Service In Gurgaon With Whatsapp Number said:

April 10, 2020 4:22 AM

Mumbai Escorts said:

Mumbai escorts are the most reliable escort services agency where you can easily find milfs, brunette, Russian escorts, Female girls and ladies here. Mumbai Call Girls provides top-class model Call girl and air hostess for Customers who are need of escort service in Mumbai.


May 11, 2020 10:08 PM

Escorts Service in Mumbai said:

You’re a gentleman that wishes an exquisitely indecent and gratifying secret getaway. Why tiptoe around hidden goals, want, and desires, as although they don’t remember. They do, that's why you've discovered Sakshi; the perfect accomplice for a luxuriously satiating erotic escapade. In case you’re drawn to women that captivate in public in addition to private, are smart and formidable at coronary heart, then Aileen the precise girl for you.

May 12, 2020 5:40 AM

escortgirlin mumbai said:

Book High Profile Call Girls in Mumbai 5 star Hotels with Female Escorts Photos, to book Top Look Models Call Girl in Mumbai.How to Get Escort Value for Money from Escorts Service In Mumbai, First of all, you have to verify that escorts girls are real and not fake, How Safe is Mumbai Escort Call Girls for Erotic Pleasure Outside Also. Visit

May 21, 2020 12:00 AM

escortgirlin mumbai said:

May 21, 2020 12:00 AM

escortgirlin mumbai said:

I Am Kajal Sharma. I Am An Model Escorts in Mumbai . I Deal In Mumbai Independent Escorts Service. I Am Very Hot And Sexy. Being A Mumbai Escort Girl I Have Five Years’ Experience. That’s Why I Understand The Each Personal Needs Of My Every Client. My Service Charge Is Low And Service Is Super. You Can Avail My VIP Mumbai Escorts Service At Your Home Or In Hotel. I Am Comfortable To Provide Mumbai Escorts Service In Local And Outside Also. Visit

May 21, 2020 5:21 AM

escortgirlin mumbai said:

My Name Is Namita. I Am An Independent Mumbai Model. I Am 23 Years Old. I Am Avery Hot And Sexy, Do you want to enjoy unlimited fun so my name is Namita and I will be your host for the lovely night spend your night with me. I am very gorgeous and sexy with god figures and pretty faces. You can call me anytime you want as well. It does not matter if it’s early morning or late afternoon. And the best thing is, you will get my services at a much cheaper price. So don’t waste your time hurry up and books are service tonight.Visit My Website

May 21, 2020 5:22 AM

Escorts Nights said:

Gurgaon, formerly known as Bombay, is a city of dream, aspirations, struggle, and success. There are people all over the world, visiting this place almost each and every day to make a living out of it.

May 23, 2020 1:31 AM

alinachopra nagpur said:">">">">">">

May 28, 2020 7:27 AM

Rihana Khan said:

Gurgaon escort services are known to be one of the best in the country. Once you fall for them, you will keep coming back for more.

June 1, 2020 11:14 PM

Bangalore Call Girls said:

Bangalore escort agency assures you the private hot and desi fun service with Bangalore escorts along with exciting call girls facility available 24/7 on contact number.

June 2, 2020 12:07 AM

escortgirlin mumbai said:

To meet with the most beautiful and sexy girls in Mumbai, just come to the Andheri Escort Service and take all the real fun and pleasure with these escorts.

July 24, 2020 6:10 AM

ishaloveblr said:

Bangalore Call Girls - We have been pioneers in the industry of Call Girls. So if you people have been always looking for India's best Call Girls in Bangalore then you have arrived an excellent place where you would get pretty cute and chubby list of call girl who are eager to meet you in your private place.

August 26, 2020 3:40 AM

Mumbai Escorts said:

Mumbai Escorts:

Mumbai Call Girls:

Mumbai Female Escorts:

Mumbai Escort Services:

Mumbai Foreign Escorts:

Mumbai Russian Escorts:

Bangalore Escorts:

Bangalore Call Girl:

Bangalore Russian Escorts:

September 2, 2020 2:13 AM

Leave a Comment


This Blog


Privacy Statement