One of the most common techniques authors use to keep their readers interested is to leave them with a cliff-hanger. It’s what I did when I finished part 4 of my series on the bin packing problem – never intending to leave you all hanging over a cliff for almost three years, though that is exactly what happened. My apologies to everyone who has been checking my blog on a daily basis all that time, in the idle hope of finally learning that faster method I promised.

For those of you had have forgotten what I wrote in the previous parts, or who have never read them before, here are a few quick links:

- The first post includes an explanation of the bin-packing solution, sets up some sample tables and test data, and establishes a baseline that all other solutions have to be compared against – for both performance (lowest speed) and efficiency (lowest number of bins – sessions in the case of the chosen sample).
- The second post introduces several techniques to increase packing efficiency, though at the cost of reduced performance.
- In the third post, I investigate several ways to improve performance, and find out exactly how much they decrease packing efficiency.
- The fourth post investigates how a completely set-based solution is guaranteed to find the best possible solution for bin-packing problems that are limited enough that you could do it by hand, but falls apart completely when you want to scale.

**Changed hardware, changed performance**

Three years have passed since I last worked on this series, and my test environment has changed considerably. My old laptop has been replaced by a newer one, with two disk drives, more memory and a faster, 64-bit processor. And I have also upgraded my DBMS to SQL Server 2008 R2, Service Pack 1. This has invalidated all my previous measurements, so I decided to first repeat the performance tests for the shortlist of relevant solutions that I included in part 4 – except for OrderDesc, since I found in part 4 that this should not have been listed as a relevant solution at all. I have executed each version ten times and calculated the average execution time from those test results. In most cases the execution times were fairly close; only the baseline had a larger variety. The table below lists the results of my tests:

If you compare the table above to the table I posted in part 4, you’ll notice that all algorithms got faster by 13 to 20 percent. The overall performance boost is a great confirmation that the hard-earned cash I invested in my new laptop was not wasted. The difference in percentage performance gain suggests that extensive testing of all algorithms might yield some surprises; some of the algorithms previously excluded from the list above might have to be added again – but I don’t expect the difference to ever be more than a few percent. I decided not to spend the extra time that would have been needed for this full investigation. You are of course free to do so yourself, all the required code is still available from my blog. But after seeing the performance of the algorithm I’ll describe in this blog post, you’ll probably understand why I am not that interested anymore in whether any of the other cursor-based algorithms now happens to be one or two percent faster than those listed above.

**All at once or one at a time?**

One of the more common best practices for SQL Server is to avoid using cursors and other iterative solution, and use set-based logic instead – and only choose an iterative solution if you are 100% sure that you have run into one of the very few situation where a set-based solution will not work. While I do endorse this best practice in general, it has a down side: it makes people believe that either set-based (process all at once) or iterative (process one row at a time, usually with cursors) are the only alternatives.

They are not. There are more options available. One of these alternatives that I have found to be highly useful in a few situations is what I have dubbed “set-based iteration”.

**Set-based iteration: the perfect blend?**

If you characterize set-based processing as “using a single query that processes all rows at once”, and iterative processing as “using a loop that processes one row per execution”, then you can characterize set-based iteration as “using a loop that processes many rows per execution”. So you have a set-based query that processes many (but not all!) rows, that is enclosed in a loop to repeat that query as often as needed. The challenge here is to find a form where the set-based query does not take too much time, yet processes as many rows as possible so that the number of iterations remains low.

For the bin packing problem, this means that instead of filling one bucket at a time (as we did in the various cursor-based solutions), we’ll fill many buckets at once. The amount of buckets to fill should be as high as possible, but not so high that we end up taking more buckets than needed, as the goal was to use as little buckets as possible. The only problem here is that we don’t know in advance how many buckets we will end up needing. But we *do* know the minimum amount that will be required anyway – if the maximum seating capacity of the examination room is 100 students and there are 1742 students registered, we can be absolutely sure that there is no way we will ever pack those students in 17 sessions; we know for sure that we will need *at least* 18 sessions. So instead of opening one session and assigning registrations one at a time to it, we can now create 18 sessions at once, assign 18 registrations to those sessions at a time, and repeat this until either all registrations are assigned or all sessions are full – and if at that point we are still left with unassigned registrations, then the distribution of registration sizes was apparently unlucky and we need one or more extra sessions to assign the remaining registrations to; this is done by simply repeating the process for only the unassigned registrations.

**The algorithm in detail**

The exact algorithm I use in my “set-based iteration” solution for the bin packing problem needs some explaining, so I decided to use some sample data and some pretty (ahem) pictures to illustrate the various steps. To keep it simple, I limit the bucket capacity to 10, and I pack 9 packages, three of size 6, three of size 5, and the last three of sizes 3, 2, and 1. To find the minimum number of bins required, we calculate the total size of all packages (3 * 6 + 3 * 5 + 3 + 2 + 1 = 39) and divide by bin size, rounding all fractions up (39 / 10 = 3.9 à 4 bins). So we immediately create 4 empty bins.

To assign packages to these bins, we find the threshold (the largest available capacity in the current range of bins – 10 for now, since all bins are still empty), rank the bins by descending available capacity, rank the packages that don’t exceed the threshold by descending size, and then assign packages to bins based on equal rank – but only if the package does actually fit in the bin with the same rank. This is illustrated below.

The threshold calculation, the ranking of bins by remaining capacity, and the test that a package fits the bin with the same rank may all seem pretty pointless when assigning the first bunch of packages to the bins. But the same code is reused for later iterations and then these are all important ingredients, as you will see in a bit.

After assigning these first four packages, the remaining capacity of all bins is recalculated and the process repeats – the threshold is calculated (now 5, since that is the largest remaining capacity). None of the remaining packages exceeds this threshold, so all remaining packages are ranked by size; all bins are ranked by remaining capacity, and packages are once more assigned to bins based on equal ranks, as illustrated below:

As you can see, package F and bin 1 are both ranked 2 in their respective orderings, so they are assigned to each other, but package F exceeds the remaining capacity of bin 1, so this combination is discarded, as indicated by the dotted arrow. The other packages all do fit in their assigned bins, so packages E, G, and H are assigned to bins 4, 2, and 3 respectively.

On the third iteration of this step, bin 1 has the highest remaining capacity, so the threshold is set to 4. Package F at size 5 exceeds this threshold; this package won’t any of the remaining bins, so this package is exempted from the process until we have finished filling the current batch of bins and start a new series of bins.

The bins are ranked by descending size. The packages (or rather, the single remaining package) is ranked as well, and then assigned to the bin with the same rank, as illustrated in the figure below:

A fourth iteration of this process doesn’t cause any new changes. There is only one package left to assign and it exceeds the threshold (that is now even down to 3), so the iteration stops here; all 4 bins that were assigned at the start of the process have been filled as far as possible with the available packages.

Not all packages have been assigned to a bin, though. Apparently, the sizes of the available packages were distributed such that it was not possible to distribute them to only four bins; an extra bin is needed. So the whole process starts again from scratch, using only the single remaining package: calculate total size (5), divide by bin size and round up to find the minimum required number of additional bins (1), rank both the single bin for this batch and the single package, then assign the package to the bin. The end result can be seen below:

**The implementation**

I won’t spend much time on the T-SQL implementation of this algorithm. You can find the full code in “SetBasedIter.sql”, which is part of the ZIP file I attached to this blog post. I included comments where I thought they might be relevant. The T-SQL I used in this code uses several features that many won’t use on a regular basis, so just looking at this code and trying to understand how it works should already present a learning opportunity.

**But how does it perform?**

At the end of the day, the only thing we’re interested in are the results. So I executed this procedure a total of ten times, and the average execution time is only 6,741 ms. That is an improvement of almost 90% over the baseline, and over 75% faster than FillThenNext, the previous fastest solution. And unlike FillThenNext, the SetBasedIter algorithm does not pay for its increased performance by lower efficiency. With my standard set of test data, SetBasedIter packs all registrations in a total of 19,293 sessions. Better than the baseline or the two fastest cursor-based algorithms, but admittedly not as good as the two slower cursor-based algorithms FillThenSearchDesc and Order50FirstB. But given the performance difference, I expect most companies to be willing to accept that 2% efficiency loss for the 80% performance gain.

**Too good to be true?**

You probably know the saying “when something sounds too good to be true, it probably is”. Well, this is the exception. Saving 75% on ever the fastest solution does sound too good to be true, but all my tests show that it is actually true. Could that be a sign that this algorithm is not “too good to be true” at all? That it is, in fact, still not good enough? Well, to me it definitely isn’t. In the next part of this series, I will show how some smart changes in the code of the set-based iteration algorithm can reduce its execution time even further. And I will also investigate how the various algorithms scale, for an algorithm that is the winner for this test-set but scales exponentially could quickly become a loser when the company increases its business – and I like to know that kind of stuff **before** it happens!

So stay tuned for the sixth part of this series. And I promise, it won’t be another three-year wait this time!

## Comments

## Dwain.C said:

Mr. Kornelis,

Recently someone pointed me to this series of articles and I was fascinated by the problem. After reading the first few articles, I decided to take a shot at a solution myself.

I've been able to create a set-based iteration method that uses a different algorithm than yours and generates 94% utilization in about 7.5 seconds over the data you provided.

My next step will be to download your solution, benchmark it on my equipment against my solution and to see if there's a way to create a hybrid that runs even faster.

If you would be interested in corresponding with me on this (I would), I'm a regular poster on sqlservercentral.com under Dwain.C, or you can visit my guest columnist page here: http://www.sqlservercentral.com/Authors/Articles/Dwain_Camps/1444841/

Hope to chat soon.